BusinessWire: Leafly Launches Unique Data-Sharing Program to Advance Cannabis Research (PRESS RELEASE)

BusinessWire: Leafly Launches Unique Data-Sharing Program to Advance Cannabis Research (PRESS RELEASE). “Leafly is the informed way to shop for weed and its cannabis data library includes tens of thousands of cannabinoid and terpene strain profiles from the Leafly-Certified Labs Program, subjective strain effects from consumer reviews, and cannabis popularity metrics.”

Inside Precision Medicine: Data Trove Released by Seattle Alzheimer’s Disease Brain Cell Atlas

Inside Precision Medicine: Data Trove Released by Seattle Alzheimer’s Disease Brain Cell Atlas. “Neuroscientists at the Allen Institute for Brain Science and their collaborators have released their first research data set on Alzheimer’s disease, in which they categorized cell types based on gene activity. The team hope this approach could ultimately identify new targets for better therapies.”

Bureau of Transportation Statistics: BTS Updates Datasets to National Transportation Atlas Database

Bureau of Transportation Statistics: BTS Updates Datasets to National Transportation Atlas Database. “The U.S. Department of Transportation’s Bureau of Transportation Statistics today released its summer 2022 update to the National Transportation Atlas Database (NTAD), a set of nationwide geographic databases of transportation facilities, networks, and associated infrastructure.”

Seeing the light: researchers develop new AI system using light to learn associatively (University of Oxford)

University of Oxford: Seeing the light: researchers develop new AI system using light to learn associatively . “Researchers at Oxford University’s Department of Materials, working in collaboration with colleagues from Exeter and Munster have developed an on-chip optical processor capable of detecting similarities in datasets up to 1,000 times faster than conventional machine learning algorithms running on electronic processors.”

Big data on small languages: Release of the DoReCo online database (Informationsdienst Wissenschaft)

Informationsdienst Wissenschaft: Big data on small languages: Release of the DoReCo online database. “On July 29, linguists working all over the world will gather in Berlin at Leibniz-Zentrum Allgemeine Sprachwissenschaft to celebrate the online release of the DoReCo data base, which provides access to audio recordings from more than 50 languages, along with their transcriptions, translations, and detailed linguistic analyses. Admission to the hybrid event is free, but registration is required.”

Scientific Data: The Multilingual Picture Database

Scientific Data: The Multilingual Picture Database . “In this paper we present the Multilingual Picture (Multipic) database, containing naming norms and familiarity scores for 500 coloured pictures, in thirty-two languages or language varieties from around the world. The data was validated with standard methods that have been used for existing picture datasets. This is the first dataset to provide naming norms, and translation equivalents, for such a variety of languages; as such, it will be of particular value to psycholinguists and other interested researchers. The dataset has been made freely available.”

MIT Technology Review: Inside a radical new project to democratize AI

MIT Technology Review: Inside a radical new project to democratize AI. “Unlike other, more famous large language models such as OpenAI’s GPT-3 and Google’s LaMDA, BLOOM (which stands for BigScience Large Open-science Open-access Multilingual Language Model) is designed to be as transparent as possible, with researchers sharing details about the data it was trained on, the challenges in its development, and the way they evaluated its performance. OpenAI and Google have not shared their code or made their models available to the public, and external researchers have very little understanding of how these models are trained.”

Flinders University: Historical dataset could help scientists better understand sharks

Flinders University: Historical dataset could help scientists better understand sharks. “For the first time, the longest-running historical record of human-shark interactions in Australia is now accessible online. This follows a growing trend to make scientific datasets accessible, maximising the use and impact of the data. Taronga’s Australian Shark-Incident Database (ASID) describes more than 1000 shark-human interactions that have occurred in Australia over the past 230 years.”

National Science Foundation: Citizen science project analyzes data to model treetop snowpack and predict melt

National Science Foundation: Citizen science project analyzes data to model treetop snowpack and predict melt. “Thousands of volunteers categorized 13,600 images from remote U.S. locations into images that showed snow on tree branches, images that didn’t, and images that were inconclusive. In the future, the dataset could be used to train machine learning in analyzing the images.”

Scientific Data: A Global Building Occupant Behavior Database

Scientific Data: A Global Building Occupant Behavior Database . “This paper introduces a database of 34 field-measured building occupant behavior datasets collected from 15 countries and 39 institutions across 10 climatic zones covering various building types in both commercial and residential sectors. This is a comprehensive global database about building occupant behavior.”

VentureBeat: Roboflow expands open-source datasets for better computer vision AI models

VentureBeat: Roboflow expands open-source datasets for better computer vision AI models. “In an effort to help developers more easily benefit from labeled datasets and machine learning models for computer vision, Roboflow today announced an expansion of its datasets and AI models as part of its Roboflow Universe initiative, which could well be one of the largest such open-source repositories available. Roboflow claims that it now has over 90,000 datasets that include over 66 million images in the Roboflow Universe service launched in August 2021.”

Heriot Watt University: New project helps Amazon create dataset to advance multilingual language understanding research

Heriot Watt University: New project helps Amazon create dataset to advance multilingual language understanding research. “Researchers at the National Robotarium, hosted by Heriot-Watt University and the University of Edinburgh, have created a Spoken Language Understanding Resource Package (SLURP) aimed at making it easier for AI and machines to understand spoken questions and commands from humans. One of the items included in the package is an open dataset in English spanning 18 domains. Amazon recently localised and translated the English-only SLURP dataset into 50 typologically diverse languages, creating a new multilingual dataset called MASSIVE.”

News@Northeastern: Want To Understand The Impact Of The Covid-19 Pandemic On Boston? Northeastern Researchers Have Built A Database

News@Northeastern: Want To Understand The Impact Of The Covid-19 Pandemic On Boston? Northeastern Researchers Have Built A Database. “Sudden disruptions to society were immediately apparent: School closures, business shutdowns, new—and in some cases, unprecedented—public health policies. But other pandemic impacts remain hidden, locked away in datasets and public records not yet meaningfully analyzed. The determination to uncover that data—and make it widely available—led a group of Northeastern researchers to construct a ‘data-support system’ from multiple information sources in and around the city of Boston that, when combined, paint a portrait of how communities and neighborhoods were impacted by the pandemic, with particular emphasis on communities and neighborhoods of color.”

SlashGear: Study Shows Robots Using Internet-Based AI Exhibit Racist And Sexist Tendencies

SlashGear: Study Shows Robots Using Internet-Based AI Exhibit Racist And Sexist Tendencies. “A new study claims robots exhibit racist and sexist stereotyping when the artificial intelligence (AI) that powers them is modeled on data from the internet. The study, which researchers say is the first to prove the concept, was led by Johns Hopkins University, the Georgia Institute of Technology, and the University of Washington, and published by the Association for Computing Machinery (ACM).”