Texas Advanced Computing Center: Disaster Database Is Go-To Hub For Natural Hazard Information

New-to-me, from Texas Advanced Computing Center: Disaster Database Is Go-To Hub For Natural Hazard Information. “The Seattle mega-quake scenario is one of hundreds of data sets published on DesignSafe, a database for natural disaster information created by researchers at The University of Texas at Austin that has changed how planners, builders, policymakers and engineers prepare for and respond to hurricanes, tornadoes, earthquakes and more. The data repository gives researchers the ability to formally publish data sets related to natural disaster studies in the same way research papers are published in journals, giving them an accessible digital home.”

Indiana University: ERI launches platform to boost accessibility of environmental change data

Indiana University: ERI launches platform to boost accessibility of environmental change data. “This fall, Indiana University’s Environmental Resilience Institute (ERI), part of IU’s Prepared for Environmental Change Grand Challenge initiative, launched the ERI Data Platform, an open-data tool that allows users to explore environmental change data in new ways. The platform gives users the ability to overlay national, global, and Indiana-specific datasets, add new data, and navigate to geographic areas of interest.”

StateScoop: Minneapolis’ new website ‘turns us all into data scientists,’ CIO says

StateScoop: Minneapolis’ new website ‘turns us all into data scientists,’ CIO says. “Minneapolis DataSource contains dashboards for four categories of public data, including elections, public health, community safety, and housing and development. But [city CIO Fadi] Fadhil said the city is working to include more categories and dashboards through ‘constant automation’ of data collection around the city.”

National Library of New Zealand: Papers Past data has been set free

National Library of New Zealand: Papers Past data has been set free . “Papers Past is the National Library’s fully text searchable website containing over 150 newspapers from New Zealand and the Pacific, as well as magazines, journals and government reports. As a result of the data being released, people can now access the data from 78 New Zealand newspapers from the Albertland Gazette to the Victoria Times, all published before 1900. The data itself consists of the METS/ALTO XML files for each issue. The XML files sit in the back of Papers Past and are what allows you to locate keywords within articles.”

The Next Web: COVID-19 made your data set worthless. Now what?

The Next Web: COVID-19 made your data set worthless. Now what?. “The COVID-19 pandemic has perplexed data scientists and creators of machine learning tools as the sudden and major change in consumer behavior has made predictions based on historical data nearly useless. There is also very little point in trying to train new prediction models during the crisis, as one simply cannot predict chaos. While these challenges could shake our perception of what artificial intelligence really is (and is not), they might also foster the development of tools that could automatically adjust.”

Nature: Migrating big astronomy data to the cloud

Nature: Migrating big astronomy data to the cloud. “Astronomers typically work by asking observatories for time on a telescope and downloading the resulting data. But as the amount of data that telescopes produce grows, well, astronomically, old methods can’t keep pace. The Vera C. Rubin Observatory in Chile is geared up to collect 20 terabytes per night as part of its 10-year Legacy Survey of Space and Time (LSST), once it becomes operational in 2022. That’s as much as the Sloan Digital Sky Survey — which created the most detailed 3D maps of the Universe so far — collected in total between 2000 and 2010.”

Phys .org: An open-source data platform for researchers studying archaea

Phys .org: An open-source data platform for researchers studying archaea. “To foster scientific exchange and to advance discovery, biologists in the School of Arts & Sciences led by postdoc Stefan Schulze and professor Mecky Pohlschroder have launched the Archaeal Proteome Project (ArcPP), a web-based database to collect and make available datasets to further the work of all scientists interested in archaea, a domain of life composed of microorganisms that can dwell anywhere from deep-sea vents to the human gut.”

Bing Blogs: Extracting Covid-19 insights from Bing search data

Bing Blogs: Extracting Covid-19 insights from Bing search data . “As is true for many other topics, search engine query logs may be able to give insight into the information gaps associated with Covid-19…. We are pleased to announce that we have already made Covid-19 query data freely available on GitHub as the Bing search dataset for Coronavirus intent, with scheduled updates every month over the course of the pandemic. This dataset includes explicit Covid-19 search queries containing terms such as corona, coronavirus, and covid, as well as implicit Covid-19 queries that are used to access the same set of web page search results (using the technique of random walks on the click graph).”

Selected Datasets: A New Library of Congress Collection (Library of Congress)

Library of Congress: Selected Datasets: A New Library of Congress Collection. “Friends, data wranglers, lend me your ears; The Library of Congress’ Selected Datasets Collection is now live! You can now download datasets of the Simple English Wikipedia, the Atlas of Historical County Boundaries, sports economic data, half a million emails from Enron, and urban soil lead abatement from this online collection. This initial set of 20 datasets represents the public start of an ongoing collecting program tied to the Library’s plan to support emerging styles of data-driven research, such as text mining and machine learning.”

The Register: MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

The Register: MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. “The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT’s cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.”

TechCrunch: Aclima and Google release a new air quality data set for researchers to investigate California pollution

TechCrunch: Aclima and Google release a new air quality data set for researchers to investigate California pollution. “As part of the Collision from Home conference, Aclima chief executive Davida Herzl released a new data set made in conjunction with Google. Free to the scientific community, the data is the culmination of four years of data collection and aggregation resulting in 42 million air quality measurements throughout the state of California.”

Centers for Medicare & Medicaid Services: Medicare COVID-19 Data Release Blog

Centers for Medicare & Medicaid Services: Medicare COVID-19 Data Release Blog. “Today, the Centers for Medicare & Medicaid Services (CMS) released preliminary data on COVID-19 derived from Medicare claims. The data provides a highly instructive picture of the impact of COVID-19 on the Medicare population, further confirming a number of long understood patterns in the disease such as the elevated risk for seniors with underlying health conditions.”

CNET: Your face mask selfies could be training the next facial recognition tool

CNET: Your face mask selfies could be training the next facial recognition tool. “Your face mask selfies aren’t just getting seen by your friends and family — they’re also getting collected by researchers looking to use them to improve facial recognition algorithms. CNET found thousands of face-masked selfies up for grabs in public data sets, with pictures taken directly from Instagram.”

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure

Berkeley Haas: Open-source smartphone database offers a new tool for tracking coronavirus exposure. “The Covid-19 Exposure Indices, created by Berkeley Haas Asst. Prof. Victor Couture and researchers from Yale, Princeton, the University of Chicago, and the University of Pennsylvania in collaboration with location data company PlaceIQ, is aimed at academic investigators studying the spread of the pandemic. The data sets allow researchers to visualize how people can potentially be exposed to those infected with the virus, based on cell-phone movements to and from businesses and other locations where a great deal of the exposure happens.”

FierceBiotech: Life science companies combine to form COVID-19 research database

FierceBiotech: Life science companies combine to form COVID-19 research database. “A group of major CRO, life science, data analytics, publishing and healthcare companies joined forces to release a pro bono research database to build up and integrate a central hub on the latest data out for COVID-19. On the technical side, it’s a secure repository of HIPAA-compliant, de-identified and limited patient-level data sets that will be ‘made available to public health and policy researchers to extract insights to help combat the COVID-19 pandemic,’ according to the group.”