The Register: MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

The Register: MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. “The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT’s cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.”

TechCrunch: Aclima and Google release a new air quality data set for researchers to investigate California pollution

TechCrunch: Aclima and Google release a new air quality data set for researchers to investigate California pollution. “As part of the Collision from Home conference, Aclima chief executive Davida Herzl released a new data set made in conjunction with Google. Free to the scientific community, the data is the culmination of four years of data collection and aggregation resulting in 42 million air quality measurements throughout the state of California.”

Centers for Medicare & Medicaid Services: Medicare COVID-19 Data Release Blog

Centers for Medicare & Medicaid Services: Medicare COVID-19 Data Release Blog. “Today, the Centers for Medicare & Medicaid Services (CMS) released preliminary data on COVID-19 derived from Medicare claims. The data provides a highly instructive picture of the impact of COVID-19 on the Medicare population, further confirming a number of long understood patterns in the disease such as the elevated risk for seniors with underlying health conditions.”

CNET: Your face mask selfies could be training the next facial recognition tool

CNET: Your face mask selfies could be training the next facial recognition tool. “Your face mask selfies aren’t just getting seen by your friends and family — they’re also getting collected by researchers looking to use them to improve facial recognition algorithms. CNET found thousands of face-masked selfies up for grabs in public data sets, with pictures taken directly from Instagram.”

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure. “The Covid-19 Exposure Indices, created by Berkeley Haas Asst. Prof. Victor Couture and researchers from Yale, Princeton, the University of Chicago, and the University of Pennsylvania in collaboration with location data company PlaceIQ, is aimed at academic investigators studying the spread of the pandemic. The data sets allow researchers to visualize how people can potentially be exposed to those infected with the virus, based on cell-phone movements to and from businesses and other locations where a great deal of the exposure happens.”

FierceBiotech: Life science companies combine to form COVID-19 research database

FierceBiotech: Life science companies combine to form COVID-19 research database. “A group of major CRO, life science, data analytics, publishing and healthcare companies joined forces to release a pro bono research database to build up and integrate a central hub on the latest data out for COVID-19. On the technical side, it’s a secure repository of HIPAA-compliant, de-identified and limited patient-level data sets that will be ‘made available to public health and policy researchers to extract insights to help combat the COVID-19 pandemic,’ according to the group.”

Analytics India: A Beginner’s Guide To Using Google Colab

Analytics India: A Beginner’s Guide To Using Google Colab. “We are all familiar with the pop-up alerts of ‘memory-error’ while trying to work with a large dataset of machine learning (ML) or deep learning algorithms on Jupyter notebooks. On top of that, owning a decent GPU from an existing cloud provider has remained out of bounds due to the financial investment it entails. The machines at our disposal, unfortunately, do not have the unlimited computational ability. But the wait is finally over as we can now build large ML models without selling our properties. The credit goes to Google for launching the Colab – an online platform that allows anyone to train models with large datasets, absolutely free.”

EdScoop: Researchers publish social media data early for pandemic response

EdScoop: Researchers publish social media data early for pandemic response. “To help represent the spread and impact of the coronavirus pandemic, researchers at the Georgia State University on Monday released a data set of more than 140 million tweets related to COVID-19 as a resource for the global research community. The work is part of research that collects and tracks social media chatter to understand mobility patterns during natural disasters, but researchers decided to release their data before finalizing their own results to assist other researchers studying the current pandemic.”

Los Angeles Times: To aid coronavirus fight, The Times releases database of California cases

Los Angeles Times: To aid coronavirus fight, The Times releases database of California cases. “In an effort to aid scientists and researchers in the fight against COVID-19, The Times has released its database of California coronavirus cases to the public.To follow the virus’ spread, The Times is conducting an independent survey of dozens of local health agencies across the state. The effort, run continually throughout the day, supplies the underlying data for this site’s coronavirus tracker.”

BusinessWire: Free Accelerated Data Transfer Software for COVID-19 Researchers (PRESS RELEASE)

BusinessWire: Free Accelerated Data Transfer Software for COVID-19 Researchers (PRESS RELEASE). “High-performance data transfer software that can move files ranging from megabytes to terabytes among research institutions, cloud providers, and personal computers at speeds many times faster than traditional software…. Available immediately for an initial 90-day license; requests to extend licenses will be evaluated on a case-by-case basis to facilitate continued research.”

Phys .org: How to quickly and efficiently identify huge gene data sets to help coronavirus research

Phys .org: How to quickly and efficiently identify huge gene data sets to help coronavirus research. “Thanks to the advancement of sequencing technology, it’s possible to produce massive amounts of genome sequence data on various species. It’s crucial to examine pan-genomic data—the entire set of genes possessed by all members of a particular species—particularly in areas like bacteria and virus research, investigation of drug resistance mechanisms and vaccine development. For example, why is the coronavirus resistant to common drugs? Can big data help to rapidly identify the characteristics of such novel virus strains? A group of researchers supported by the EU-funded PANGAIA project is now tackling this challenge by developing methods for comparing gigantic gene data sets.”

USC Viterbi School of Engineering: USC Researchers Release Public Coronavirus Twitter Set for Academics

USC Viterbi School of Engineering: USC Researchers Release Public Coronavirus Twitter Set for Academics. “Researchers at the USC Viterbi School of Engineering Information Sciences Institute (ISI) and the Department of Computer Science have released a public coronavirus twitter dataset for scholars. Emilio Ferrara and Kristina Lerman, the principal researchers on this project, have a history of studying social media and bots to understand how misinformation, fear and influence spread online.”

Phys .org: ‘Data feminism’ examines problems of bias and power that beset modern information

Phys .org: ‘Data feminism’ examines problems of bias and power that beset modern information. “Suppose you would like to know mortality rates for women during childbirth, by country, around the world. Where would you look? One option is the WomanStats Project, the website of an academic research effort investigating the links between the security and activities of nation-states, and the security of the women who live in them.”

Phys .org: With 30,000 surveys, researchers build the go-to dataset for smallholder farms

Phys .org: With 30,000 surveys, researchers build the go-to dataset for smallholder farms . “Top-down projects for improving the lives of poor farmers were often unsuccessful because they didn’t systematically consider the diverse rural households survive and thrive. To tap this local knowledge, scientists and development agencies began surveying households to assure that research and development schemes were on target. But the surveys were not designed to be compared with one another, lacking what scientists call ‘interoperability’—meaning one organization’s household surveys could not be compared with another’s. For big-picture analysis, much of the data was of little use.”

Analytics India Magazine: 10 Face Datasets To Start Facial Recognition Projects

Analytics India Magazine: 10 Face Datasets To Start Facial Recognition Projects. “One of the major research areas, facial recognition has been adopted by governments and organisations for a few years now. Leading phone makers like Apple, Samsung, among others, have been integrating this technology into their smartphones for providing maximum security to the users. As per research, facial recognition technology is expected to grow and reach $9.6 billion by 2020. In this article, we list down 10 face datasets which can be used to start facial recognition projects.”