Texas A&M: Big data-derived tool facilitates closer monitoring of recovery from natural disasters

Texas A&M: Big data-derived tool facilitates closer monitoring of recovery from natural disasters. “By analyzing peoples’ visitation patterns to essential establishments like pharmacies, religious centers and grocery stores during Hurricane Harvey, researchers at Texas A&M University have developed a framework to assess the recovery of communities after natural disasters in near real time. They said the information gleaned from their analysis would help federal agencies allocate resources equitably among communities ailing from a disaster.”

Navigating the unCHARTed: web tool explores public sequencing data for cancer research (Morgridge Institute for Research)

Morgridge Institute for Research: Navigating the unCHARTed: web tool explores public sequencing data for cancer research. “In the past, traditional RNA sequencing methods were limited to bulk gene expression profiles averaging thousands of cells; but the development of single-cell RNA sequencing technology has helped cancer biologists better understand the specific mechanisms that lead to tumor heterogeneity and drug resistance. However, these large, complex datasets are often difficult to navigate. Morgridge Postdoctoral Fellow Matthew Bernstein developed a web tool to explore these public datasets and facilitate analysis for cancer researchers.”

The Drive: Use Our New Tool To Explore Thousands Of FAA Drone And Unidentified Aircraft Incident Reports

The Drive: Use Our New Tool To Explore Thousands Of FAA Drone And Unidentified Aircraft Incident Reports. “We are excited to announce the launch of our new interactive tool that maps and makes searchable thousands of unmanned aircraft system (UAS) and unidentified aircraft incident reports. The vast dataset is drawn from information compiled by the Federal Aviation Administration. Some of the reports are highly unusual, going far beyond typical low-altitude drone mishaps.”

University of New Orleans: Literature Professor Jacinta Saffold Uses Digital Humanities Projects To Explore Black Peoples’ Influence on Pop Culture

University of New Orleans: Literature Professor Jacinta Saffold Uses Digital Humanities Projects To Explore Black Peoples’ Influence on Pop Culture. “When the coronavirus pandemic forced courses to be delivered online, University of New Orleans African American literature professor Jacinta Saffold created a research project aimed at keeping her students engaged while also conducting original research…. The result was a digital humanities dataset called, ‘The Hype Williams Effect Project,’ a literary compilation that helps document Black people’s influence on contemporary popular culture via the expansive career of hip hop music video director Harold ‘Hype’ Williams.” Professor Saffold is also working on ‘The Essence Book Project’ digital archive.

Google Blog: A Dataset for Studying Gender Bias in Translation

Google AI Blog: A Dataset for Studying Gender Bias in Translation. “To help facilitate progress against the common challenges on contextual translation (e.g., pronoun drop, gender agreement and accurate possessives), we are releasing the Translated Wikipedia Biographies dataset, which can be used to evaluate the gender bias of translation models. Our intent with this release is to support long-term improvements on ML systems focused on pronouns and gender in translation by providing a benchmark in which translations’ accuracy can be measured pre- and post-model changes.”

Phys .org: New AI-based tool can find rare cell populations in large single-cell datasets

Phys .org: New AI-based tool can find rare cell populations in large single-cell datasets. “Researchers at The University of Texas MD Anderson Cancer Center have developed a first-of-its-kind artificial intelligence (AI)-based tool that can accurately identify rare groups of biologically important cells from single-cell datasets, which often contain gene or protein expression data from thousands of cells. The research was published today in Nature Computational Science.”

South China Morning Post: China makes ‘world’s largest satellite image database’ to train AI better

South China Morning Post: China makes ‘world’s largest satellite image database’ to train AI better. “A satellite imaging database containing detailed information of more than a million locations has been launched in China to help reduce artificial intelligence’s errors when identifying objects from space, the Chinese Academy of Sciences said on Wednesday. The fine-grained object recognition in high-resolution remote sensing imagery (FAIR1M) database was tens or even hundreds of times larger than similar data sets used in other countries, it said.”

Johns Hopkins University: Next-generation database will democratize access to massive amounts of turbulence data

Johns Hopkins University: Next-generation database will democratize access to massive amounts of turbulence data. “Led by Johns Hopkins University, a team of 10 researchers from three institutions is using a new $4 million, five-year grant from the National Science Foundation to create a next-generation turbulence database that will enable groundbreaking research in engineering and the atmospheric and ocean sciences. This powerful tool will let researchers from all over the world access data from some of the largest world-class numerical simulations of turbulent flows. Such simulations are very costly and their outputs are traditionally very difficult to share among researchers due to the data sets’ massive size.”

The Program Era Project: Limning the depths of the Iowa Writers’ Workshop’s literary influence (University of Iowa)

University of Iowa: The Program Era Project: Limning the depths of the Iowa Writers’ Workshop’s literary influence. “The Program Era Project, or PEP, uses data visualization and other computer-assisted methods to track the aesthetic and cultural influence of the Workshop since its founding in 1936. In particular, writers affiliated with the Workshop, both as alumni and/or professors, have gone on to found or teach at many other creative writing programs around the nation…. The PEP, supported by the Digital Scholarship and Publishing Studio at UI Libraries, has compiled extensive datasets that track those networks of Workshop-affiliated writers.”

NARA: NARA Datasets on the AWS Registry of Open Data

NARA: NARA Datasets on the AWS Registry of Open Data. “The metadata index for the 1940 Census dataset is 251 megabytes, and all of the 3.7 million images from the population schedules, the enumeration district maps, and the enumeration district descriptions total over 15 terabytes. This dataset reflects the 1940 Census records that are also available on NARA’s 1940 Census website and in the National Archives Catalog.”

#Election2020: the first public Twitter dataset on the 2020 US Presidential election (PubMed)

PubMed: #Election2020: the first public Twitter dataset on the 2020 US Presidential election. “The study of online chatter is paramount, especially in the wake of important voting events like the recent November 3, 2020 U.S. Presidential election and the inauguration on January 21, 2021. Limited access to social media data is often the primary obstacle that limits our abilities to study and understand online political discourse. To mitigate this impediment and empower the Computational Social Science research community, we are publicly releasing a massive-scale, longitudinal dataset of U.S. politics- and election-related tweets. This multilingual dataset encompasses over 1.2 billion tweets and tracks all salient U.S. political trends, actors, and events from 2019 to the time of this writing.”

University of Warwick: World’s largest public scenario database for testing and assuring safe Autonomous Vehicle deployments

University of Warwick: World’s largest public scenario database for testing and assuring safe Autonomous Vehicle deployments. “The Safety PoolTM Scenario Database, the largest public repository of scenarios for testing autonomous vehicles in the world, has been launched today by WMG at the University of Warwick, and Deepen AI. The database provides a diverse set of scenarios in different operational design domains (ODDs i.e. operating conditions) that can be leveraged by governments, industry and academia alike to test and benchmark Automated Driving Systems (ADSs) and use insights to inform policy and regulatory guidelines.”