VentureBeat: Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing

VentureBeat: Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing. “Mozilla today released the latest version of Common Voice, its open source collection of transcribed voice data for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. Common Voice now contains over 7,226 total hours of contributed voice data in 54 different languages, up from 1,400 hours across 18 languages in February 2019.”

Selected Datasets: A New Library of Congress Collection (Library of Congress)

Library of Congress: Selected Datasets: A New Library of Congress Collection. “Friends, data wranglers, lend me your ears; The Library of Congress’ Selected Datasets Collection is now live! You can now download datasets of the Simple English Wikipedia, the Atlas of Historical County Boundaries, sports economic data, half a million emails from Enron, and urban soil lead abatement from this online collection. This initial set of 20 datasets represents the public start of an ongoing collecting program tied to the Library’s plan to support emerging styles of data-driven research, such as text mining and machine learning.”

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure. “The Covid-19 Exposure Indices, created by Berkeley Haas Asst. Prof. Victor Couture and researchers from Yale, Princeton, the University of Chicago, and the University of Pennsylvania in collaboration with location data company PlaceIQ, is aimed at academic investigators studying the spread of the pandemic. The data sets allow researchers to visualize how people can potentially be exposed to those infected with the virus, based on cell-phone movements to and from businesses and other locations where a great deal of the exposure happens.”

FierceBiotech: Life science companies combine to form COVID-19 research database

FierceBiotech: Life science companies combine to form COVID-19 research database. “A group of major CRO, life science, data analytics, publishing and healthcare companies joined forces to release a pro bono research database to build up and integrate a central hub on the latest data out for COVID-19. On the technical side, it’s a secure repository of HIPAA-compliant, de-identified and limited patient-level data sets that will be ‘made available to public health and policy researchers to extract insights to help combat the COVID-19 pandemic,’ according to the group.”

Bing Blogs: Bing delivers new COVID-19 experiences including partnership with GoFundMe to help affected businesses

Bing Blogs: Bing delivers new COVID-19 experiences including partnership with GoFundMe to help affected businesses. “Bing has already released a full-page map tracker of case details by geographic area. Now, those working in academia and research can access our data on cases by geographic area at bing.com/covid/dev or on GitHub. This dataset is pulled from publicly-available sources like the World Health Organization, Centers for Disease Control, and more. We then aggregate the data and add latitude and longitude information to it, to make it easier for you to use. Since COVID-19 data is constantly evolving, we have a 24 hour delay so we can ensure the stability of the data that we include. This data is available for non-commercial, public use geared towards medical researchers, government agencies, and academic institutions.”

Analytics India: A Beginner’s Guide To Using Google Colab

Analytics India: A Beginner’s Guide To Using Google Colab. “We are all familiar with the pop-up alerts of ‘memory-error’ while trying to work with a large dataset of machine learning (ML) or deep learning algorithms on Jupyter notebooks. On top of that, owning a decent GPU from an existing cloud provider has remained out of bounds due to the financial investment it entails. The machines at our disposal, unfortunately, do not have the unlimited computational ability. But the wait is finally over as we can now build large ML models without selling our properties. The credit goes to Google for launching the Colab – an online platform that allows anyone to train models with large datasets, absolutely free.”

EdScoop: Researchers publish social media data early for pandemic response

EdScoop: Researchers publish social media data early for pandemic response. “To help represent the spread and impact of the coronavirus pandemic, researchers at the Georgia State University on Monday released a data set of more than 140 million tweets related to COVID-19 as a resource for the global research community. The work is part of research that collects and tracks social media chatter to understand mobility patterns during natural disasters, but researchers decided to release their data before finalizing their own results to assist other researchers studying the current pandemic.”

Los Angeles Times: To aid coronavirus fight, The Times releases database of California cases

Los Angeles Times: To aid coronavirus fight, The Times releases database of California cases. “In an effort to aid scientists and researchers in the fight against COVID-19, The Times has released its database of California coronavirus cases to the public.To follow the virus’ spread, The Times is conducting an independent survey of dozens of local health agencies across the state. The effort, run continually throughout the day, supplies the underlying data for this site’s coronavirus tracker.”

BusinessWire: Free Accelerated Data Transfer Software for COVID-19 Researchers (PRESS RELEASE)

BusinessWire: Free Accelerated Data Transfer Software for COVID-19 Researchers (PRESS RELEASE). “High-performance data transfer software that can move files ranging from megabytes to terabytes among research institutions, cloud providers, and personal computers at speeds many times faster than traditional software…. Available immediately for an initial 90-day license; requests to extend licenses will be evaluated on a case-by-case basis to facilitate continued research.”

Phys .org: How to quickly and efficiently identify huge gene data sets to help coronavirus research

Phys .org: How to quickly and efficiently identify huge gene data sets to help coronavirus research. “Thanks to the advancement of sequencing technology, it’s possible to produce massive amounts of genome sequence data on various species. It’s crucial to examine pan-genomic data—the entire set of genes possessed by all members of a particular species—particularly in areas like bacteria and virus research, investigation of drug resistance mechanisms and vaccine development. For example, why is the coronavirus resistant to common drugs? Can big data help to rapidly identify the characteristics of such novel virus strains? A group of researchers supported by the EU-funded PANGAIA project is now tackling this challenge by developing methods for comparing gigantic gene data sets.”

USC Viterbi School of Engineering: USC Researchers Release Public Coronavirus Twitter Set for Academics

USC Viterbi School of Engineering: USC Researchers Release Public Coronavirus Twitter Set for Academics. “Researchers at the USC Viterbi School of Engineering Information Sciences Institute (ISI) and the Department of Computer Science have released a public coronavirus twitter dataset for scholars. Emilio Ferrara and Kristina Lerman, the principal researchers on this project, have a history of studying social media and bots to understand how misinformation, fear and influence spread online.”

Phys .org: ‘Data feminism’ examines problems of bias and power that beset modern information

Phys .org: ‘Data feminism’ examines problems of bias and power that beset modern information. “Suppose you would like to know mortality rates for women during childbirth, by country, around the world. Where would you look? One option is the WomanStats Project, the website of an academic research effort investigating the links between the security and activities of nation-states, and the security of the women who live in them.”

Phys .org: With 30,000 surveys, researchers build the go-to dataset for smallholder farms

Phys .org: With 30,000 surveys, researchers build the go-to dataset for smallholder farms . “Top-down projects for improving the lives of poor farmers were often unsuccessful because they didn’t systematically consider the diverse rural households survive and thrive. To tap this local knowledge, scientists and development agencies began surveying households to assure that research and development schemes were on target. But the surveys were not designed to be compared with one another, lacking what scientists call ‘interoperability’—meaning one organization’s household surveys could not be compared with another’s. For big-picture analysis, much of the data was of little use.”

Analytics India Magazine: 10 Face Datasets To Start Facial Recognition Projects

Analytics India Magazine: 10 Face Datasets To Start Facial Recognition Projects. “One of the major research areas, facial recognition has been adopted by governments and organisations for a few years now. Leading phone makers like Apple, Samsung, among others, have been integrating this technology into their smartphones for providing maximum security to the users. As per research, facial recognition technology is expected to grow and reach $9.6 billion by 2020. In this article, we list down 10 face datasets which can be used to start facial recognition projects.”

Nature: Find a home for every imaging data set

Nature: Find a home for every imaging data set. “Services such as [the Electron Microscopy Public Image Archive] give researchers a central location in which to store, share and access a rapidly expanding corpus of biological images. “The data aren’t just one picture any more,” says Joshua Vogelstein, a neurostatistician at Johns Hopkins University in Baltimore, Maryland. Movies, 3D images and microscope-based screening data can take up gigabytes or terabytes of storage, and can’t be e-mailed back and forth in the same way as individual TIFF or JPEG files. Moreover, grant agencies and journals increasingly require scientists to make their data available to all, but don’t necessarily offer to host them. EMPIAR and its kin fill that gap, and often provide a digital object identifier or other citation so researchers can get credit for their data.”