University of Michigan: Open source platform enables research on privacy-preserving machine learning

University of Michigan: Open source platform enables research on privacy-preserving machine learning. “The biggest benchmarking data set to date for a machine learning technique designed with data privacy in mind has been released open source by researchers at the University of Michigan. Called federated learning, the approach trains learning models on end-user devices, like smartphones and laptops, rather than requiring the transfer of private data to central servers.”

Scientific Data: The Multilingual Picture Database

Scientific Data: The Multilingual Picture Database . “In this paper we present the Multilingual Picture (Multipic) database, containing naming norms and familiarity scores for 500 coloured pictures, in thirty-two languages or language varieties from around the world. The data was validated with standard methods that have been used for existing picture datasets. This is the first dataset to provide naming norms, and translation equivalents, for such a variety of languages; as such, it will be of particular value to psycholinguists and other interested researchers. The dataset has been made freely available.”

LitHub: How Empirical Databases Have Changed Our Understanding of Early American Slavery

LitHub: How Empirical Databases Have Changed Our Understanding of Early American Slavery. “In historical scholarship during the early 21st century, some of these new methods and tools of truth-seeking have been put to work on a large scale in the history of slavery and race in America. Among the most important and useful of these tools are the careful construction of empirical databases. Increasingly, this work has been done by teams of scholars, who combine traditional sources with digital methods on a new scale.”

Monterey Herald: Monterey Bay Aquarium shares a treasure trove of data about young white sharks

Monterey Herald: Monterey Bay Aquarium shares a treasure trove of data about young white sharks. “The Monterey Bay Aquarium and its collaborators have released a cache of data about great white sharks they’ve been collecting for over 20 years. Earlier this month, an international team of scientists and aquarists led by John O’Sullivan, the director of collections at the Monterey Bay Aquarium and Chris Lowe of CSU Long Beach published a dataset… containing decades’ worth of information about juvenile white sharks. Researchers all over the world can now use the data to help them understand where white sharks go during their seasonal migrations, what ocean conditions they prefer and how they interact with other fish.”

NIWA: Easy access to environmental research data

National Institute of Water and Atmospheric Research (NIWA): Easy access to environmental research data. “New Zealand’s seven Crown Research Institutes (CRIs) have created the National Environmental Data Centre (NEDC) website to make the environmental information held by CRIs more accessible to all New Zealanders. The datasets include a huge range of information from climate and atmosphere, freshwater, land and oceans, including biodiversity and geological data.”

Press release: Big data in geochemistry for international research (University of Göttingen)

University of Göttingen: Press release: Big data in geochemistry for international research. ” Large data sets are playing an increasingly important role in solving scientific questions in geochemistry. Now the University of Göttingen has inherited GEOROC, the largest geochemical database for rocks and minerals from the Max Planck Institute for Chemistry (Mainz). The database has been revised and modernised in its structure and made available to its global users in a new form. The ‘GEOROC’ database, the largest global data collection of rock and mineral compositions, currently contains analyses from over 20,000 individual publications (the oldest dating back to 1883) from 614,000 samples. Together, these data represent almost 32 million individual analytical values.”

Our most dangerous streets: Huge new collision database points to Toronto’s postwar suburbs (Toronto Star)

Toronto Star: Our most dangerous streets: Huge new collision database points to Toronto’s postwar suburbs. “A Star analysis of a huge new database of Toronto traffic collisions is shining a bright spotlight on a distinctly suburban problem. The new data set, much larger and more complete than any previously available records, offers a comprehensive account of nearly 500,000 collisions reported to Toronto police between 2014 and 2021, most mapped to the nearest intersection.”

Butterfly Conservation: Database brings together all known ecological facts about UK butterflies and moths for the first time

Butterfly Conservation: Database brings together all known ecological facts about UK butterflies and moths for the first time. “Butterfly Conservation and the UK Centre for Ecology & Hydrology have worked together on the database, which has collated information that previously existed in a wide range of sources such as field guides, books and journals. Until now, most of this information wasn’t available in a single location nor in a digital format. The new database has brought this information into one usable, digital resource. This involved many months of inputting data from books into spreadsheets, categorising data, and condensing the data into a suitable format for use in data analysis software such as R.”

MIT Sloan Management Review: The Data Boom Is Here — It’s Just Not Evenly Distributed

MIT Sloan Management Review: The Data Boom Is Here — It’s Just Not Evenly Distributed. “As Big Tech becomes evermore powerful thanks to the vast troves of data that the major platforms have collected, and innovation becomes increasingly data-driven, entrepreneurs and enterprises may find it difficult to seize new opportunities. Keeping the engine of innovation running will require access not only to capital but to data as well.”

Journal of Cultural Analytics: Shakespeare and Company Project Data Sets

Thanks to Esther S. for giving me a heads-up on this one. Journal of Cultural Analytics: Shakespeare and Company Project Data Sets. “This article describes three data sets from the Shakespeare and Company Project. The data sets provide information about Shakespeare and Company, Sylvia Beach’s bookshop and lending library in interwar Paris. The first data set focuses on the members of the lending library. The second, on the books that circulated in the lending library. The third, on the events—borrows, purchases, subscriptions, renewals, deposits, reimbursements—that connected members and books. Together, the three data sets promise to address and bridge concerns in modernist studies, the digital humanities, and the public humanities. Work on the data sets began in 2014. The first two versions of the data sets were released in 2020 and 2021, respectively. The current version, 1.2, was released in 2022. Over forty people have contributed to the data sets.”

American Astronomical Society: New Tool Launches for Astronomy Software Users

American Astronomical Society: New Tool Launches for Astronomy Software Users. “Astronomers rely on scientific software to analyze data sets and model complex astrophysical objects and phenomena. But as the collection of astronomy-related software grows, it becomes increasingly difficult for scientists to discover relevant packages for data analysis, determine which software version was used in a specific study, or provide credit to the developer of the software used for a scientific discovery. Asclepias combines different platforms to make these tasks possible.”