Jacobs Technion – Cornell Institute: VoterFraud2020

Jacobs Technion-Cornell Institute: VoterFraud2020. “We are making publicly available VoterFraud2020, a multi-modal Twitter dataset with 7.6M tweets and 25.6M retweets from 2.6M users that includes key phrases and hashtags related to voter fraud claims between October 23rd and December 16th. The dataset also includes the full set of links and YouTube videos shared in these tweets, with data about their spread in different Twitter sub-communities.”

Datamation: The Huge Data Problems That Prevented A Faster Pandemic Response

Datamation: The Huge Data Problems That Prevented A Faster Pandemic Response. “Early in the year, when it first became clear we faced a pandemic, technology companies worldwide stepped up and pledged resources that should have been able to find the best ways to balance economic impact and safety. But while the response was faster than the 1918 Flu Virus response, it wasn’t that much faster, and tons of mistakes should have been avoidable given we have massive modeling capability. At the heart of the problem wasn’t the lack of data; it was the inability to get to that data and analyze it in a timely basis. Let’s talk about what went wrong and what companies and governments should be doing to speed up the response, so the next outbreak isn’t as catastrophic.”

Pacific Northwest National Laboratory: New Machine Learning Tool Tracks Urban Traffic Congestion

Pacific Northwest National Laboratory: New Machine Learning Tool Tracks Urban Traffic Congestion. “Currently, publicly available traffic information at the street level is sparse and incomplete. Traffic engineers generally have relied on isolated traffic counts, collision statistics and speed data to determine roadway conditions. The new tool uses traffic datasets collected from UBER drivers and other publicly available traffic sensor data to map street-level traffic flow over time. It creates a big picture of city traffic using machine learning tools and the computing resources available at a national laboratory.”

BBC: Norway funds satellite map of world’s tropical forests

BBC: Norway funds satellite map of world’s tropical forests. “A unique satellite dataset on the world’s tropical forests is now available for all to see and use. It’s a high-resolution image map covering 64 countries that will be updated monthly. Anyone who wants to understand how trees are being managed will be able to download the necessary information for analysis – for free.”

InsideSources: InsideSources Presents New Searchable COVID Database For Citizens, Journalists

InsideSources: InsideSources Presents New Searchable COVID Database For Citizens, Journalists. “InsideSources presents the ‘COVID-19 Accountability Library,’ a free, searchable database of hundreds of thousands of unique data points on the COVID-19 pandemic. These statements, quotes and comments come from prominent American and international figures. And they are all easily searched in this new online library.”

National Library of New Zealand: Papers Past data has been set free

National Library of New Zealand: Papers Past data has been set free . “Papers Past is the National Library’s fully text searchable website containing over 150 newspapers from New Zealand and the Pacific, as well as magazines, journals and government reports. As a result of the data being released, people can now access the data from 78 New Zealand newspapers from the Albertland Gazette to the Victoria Times, all published before 1900. The data itself consists of the METS/ALTO XML files for each issue. The XML files sit in the back of Papers Past and are what allows you to locate keywords within articles.”

Technology Networks: Database Offers Access to 200 Million Immune Sequences From COVID-19 Patients

Technology Networks: Database Offers Access to 200 Million Immune Sequences From COVID-19 Patients. “Across the world, many laboratories are conducting research relating to the SARS-CoV-2 virus, whether it be to understand the pathophysiology of COVID-19, or to develop robust diagnostics and efficacious therapeutics for the disease. As such, the pandemic has highlighted the critical importance of data sharing within the scientific community. The iReceptor Plus consortium, a European Union (EU)- and Canadian-funded project, has gathered 200 million T and B cell receptor sequences from COVID-19 patients – it is the largest repertoire of its kind. The sequencing data is open source and available online through the iReceptor Gateway.”

VentureBeat: Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing

VentureBeat: Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing. “Mozilla today released the latest version of Common Voice, its open source collection of transcribed voice data for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. Common Voice now contains over 7,226 total hours of contributed voice data in 54 different languages, up from 1,400 hours across 18 languages in February 2019.”

Selected Datasets: A New Library of Congress Collection (Library of Congress)

Library of Congress: Selected Datasets: A New Library of Congress Collection. “Friends, data wranglers, lend me your ears; The Library of Congress’ Selected Datasets Collection is now live! You can now download datasets of the Simple English Wikipedia, the Atlas of Historical County Boundaries, sports economic data, half a million emails from Enron, and urban soil lead abatement from this online collection. This initial set of 20 datasets represents the public start of an ongoing collecting program tied to the Library’s plan to support emerging styles of data-driven research, such as text mining and machine learning.”

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure. “The Covid-19 Exposure Indices, created by Berkeley Haas Asst. Prof. Victor Couture and researchers from Yale, Princeton, the University of Chicago, and the University of Pennsylvania in collaboration with location data company PlaceIQ, is aimed at academic investigators studying the spread of the pandemic. The data sets allow researchers to visualize how people can potentially be exposed to those infected with the virus, based on cell-phone movements to and from businesses and other locations where a great deal of the exposure happens.”

FierceBiotech: Life science companies combine to form COVID-19 research database

FierceBiotech: Life science companies combine to form COVID-19 research database. “A group of major CRO, life science, data analytics, publishing and healthcare companies joined forces to release a pro bono research database to build up and integrate a central hub on the latest data out for COVID-19. On the technical side, it’s a secure repository of HIPAA-compliant, de-identified and limited patient-level data sets that will be ‘made available to public health and policy researchers to extract insights to help combat the COVID-19 pandemic,’ according to the group.”

Bing Blogs: Bing delivers new COVID-19 experiences including partnership with GoFundMe to help affected businesses

Bing Blogs: Bing delivers new COVID-19 experiences including partnership with GoFundMe to help affected businesses. “Bing has already released a full-page map tracker of case details by geographic area. Now, those working in academia and research can access our data on cases by geographic area at bing.com/covid/dev or on GitHub. This dataset is pulled from publicly-available sources like the World Health Organization, Centers for Disease Control, and more. We then aggregate the data and add latitude and longitude information to it, to make it easier for you to use. Since COVID-19 data is constantly evolving, we have a 24 hour delay so we can ensure the stability of the data that we include. This data is available for non-commercial, public use geared towards medical researchers, government agencies, and academic institutions.”

Analytics India: A Beginner’s Guide To Using Google Colab

Analytics India: A Beginner’s Guide To Using Google Colab. “We are all familiar with the pop-up alerts of ‘memory-error’ while trying to work with a large dataset of machine learning (ML) or deep learning algorithms on Jupyter notebooks. On top of that, owning a decent GPU from an existing cloud provider has remained out of bounds due to the financial investment it entails. The machines at our disposal, unfortunately, do not have the unlimited computational ability. But the wait is finally over as we can now build large ML models without selling our properties. The credit goes to Google for launching the Colab – an online platform that allows anyone to train models with large datasets, absolutely free.”

EdScoop: Researchers publish social media data early for pandemic response

EdScoop: Researchers publish social media data early for pandemic response. “To help represent the spread and impact of the coronavirus pandemic, researchers at the Georgia State University on Monday released a data set of more than 140 million tweets related to COVID-19 as a resource for the global research community. The work is part of research that collects and tracks social media chatter to understand mobility patterns during natural disasters, but researchers decided to release their data before finalizing their own results to assist other researchers studying the current pandemic.”