UPI: Massive data-sharing effort to help doctors diagnose rare diseases across Europe

UPI: Massive data-sharing effort to help doctors diagnose rare diseases across Europe. “Doctors and medical researchers in Europe have undertaken a massive data-sharing project they hope will aid the diagnosis of rare disease. In a series of papers, published Tuesday in the European Journal of Human Genetics, researchers demonstrated how reanalysis of genomic and phenotypic data from patients with rare diseases — when combined with wide-scale data sharing — can increased the odds of accurate diagnosis.”

BetaKit: Biobox Analytics Launches Platform To Help Scientists Analyze Genomic Data

BetaKit: Biobox Analytics Launches Platform To Help Scientists Analyze Genomic Data. “Founded in 2019 by a trio of University of Toronto graduate students including [Christopher] Li, Hamza Farooq, and Julian Mazzitelli, BioBox offers a subscription-based data analytics platform for scientists working with next-generation sequencing data. The startup’s platform allows researchers to analyze genomic information.”

Pulse: S. Korea to build national bio big data library by 2028

Pulse: S. Korea to build national bio big data library by 2028. “South Korea will spend some 1 trillion won ($891 million) for six years from 2023 on collecting health-related big data from patients by disease and volunteers and establishing a national digital library on health data by 2028. Finance Minister Hong Nam-ki said Wednesday the government will establish the so-called Bio Data Dam by 2028 by collecting biohealth information from 1 million people, including some 400,000 patients.”

From Avocet to Zebra Finch: big data study finds more than 50 billion birds in the world (Phys .org)

Phys .org: From Avocet to Zebra Finch: big data study finds more than 50 billion birds in the world. “There are roughly 50 billion individual birds in the world, a new big data study by UNSW Sydney suggests—about six birds for every human on the planet. The study—which bases its findings on citizen science observations and detailed algorithms—estimates how many birds belong to 9700 different bird species, including flightless birds like emus and penguins.”

Phys .org: New AI-based tool can find rare cell populations in large single-cell datasets

Phys .org: New AI-based tool can find rare cell populations in large single-cell datasets. “Researchers at The University of Texas MD Anderson Cancer Center have developed a first-of-its-kind artificial intelligence (AI)-based tool that can accurately identify rare groups of biologically important cells from single-cell datasets, which often contain gene or protein expression data from thousands of cells. The research was published today in Nature Computational Science.”

Digiday : Facebook is ‘not a researchers-friendly space’ say academics encountering roadblocks to analyzing its 2020 election ad data

Digiday: Facebook is ‘not a researchers-friendly space’ say academics encountering roadblocks to analyzing its 2020 election ad data. “Facebook is providing academic researchers with a massive data haul revealing how political ads during last year’s U.S. elections were targeted to people on the platform. However, researchers have been held up by an arduous process to access the data and worry the information is insufficient to provide meaningful analysis of how Facebook’s ad platform was used —and potentially misused — leading up to the election.”

South China Morning Post: China makes ‘world’s largest satellite image database’ to train AI better

South China Morning Post: China makes ‘world’s largest satellite image database’ to train AI better. “A satellite imaging database containing detailed information of more than a million locations has been launched in China to help reduce artificial intelligence’s errors when identifying objects from space, the Chinese Academy of Sciences said on Wednesday. The fine-grained object recognition in high-resolution remote sensing imagery (FAIR1M) database was tens or even hundreds of times larger than similar data sets used in other countries, it said.”

The Program Era Project: Limning the depths of the Iowa Writers’ Workshop’s literary influence (University of Iowa)

University of Iowa: The Program Era Project: Limning the depths of the Iowa Writers’ Workshop’s literary influence. “The Program Era Project, or PEP, uses data visualization and other computer-assisted methods to track the aesthetic and cultural influence of the Workshop since its founding in 1936. In particular, writers affiliated with the Workshop, both as alumni and/or professors, have gone on to found or teach at many other creative writing programs around the nation…. The PEP, supported by the Digital Scholarship and Publishing Studio at UI Libraries, has compiled extensive datasets that track those networks of Workshop-affiliated writers.”

NARA: NARA Datasets on the AWS Registry of Open Data

NARA: NARA Datasets on the AWS Registry of Open Data. “The metadata index for the 1940 Census dataset is 251 megabytes, and all of the 3.7 million images from the population schedules, the enumeration district maps, and the enumeration district descriptions total over 15 terabytes. This dataset reflects the 1940 Census records that are also available on NARA’s 1940 Census website and in the National Archives Catalog.”

#Election2020: the first public Twitter dataset on the 2020 US Presidential election (PubMed)

PubMed: #Election2020: the first public Twitter dataset on the 2020 US Presidential election. “The study of online chatter is paramount, especially in the wake of important voting events like the recent November 3, 2020 U.S. Presidential election and the inauguration on January 21, 2021. Limited access to social media data is often the primary obstacle that limits our abilities to study and understand online political discourse. To mitigate this impediment and empower the Computational Social Science research community, we are publicly releasing a massive-scale, longitudinal dataset of U.S. politics- and election-related tweets. This multilingual dataset encompasses over 1.2 billion tweets and tracks all salient U.S. political trends, actors, and events from 2019 to the time of this writing.”

VentureBeat: MIT study finds ‘systematic’ labeling errors in popular AI benchmark datasets

VentureBeat: MIT study finds ‘systematic’ labeling errors in popular AI benchmark datasets. “The field of AI and machine learning is arguably built on the shoulders of a few hundred papers, many of which draw conclusions using data from a subset of public datasets. Large, labeled corpora have been critical to the success of AI in domains ranging from image classification to audio classification. That’s because their annotations expose comprehensible patterns to machine learning algorithms, in effect telling machines what to look for in future datasets so they’re able to make predictions. But while labeled data is usually equated with ground truth, datasets can — and do — contain errors.”

Health Analytics: NIH Funds National Project to Promote COVID-19 Data Sharing

Health Analytics: NIH Funds National Project to Promote COVID-19 Data Sharing. “UC hospitals have received a $500,000 grant from NIH to enable COVID-19 data sharing on a national scale, allowing collaborations among researchers, providers, and patients. Led by the University of California, Irvine (UCI), leaders will manage a transfer of UC data on COVID-19 cases into the National COVID Cohort Collaborative’s (N3C) centralized data resource at the NIH’s National Center for Advancing Translational Sciences.”

Nature: Large socio-economic, geographic and demographic disparities exist in exposure to school closures

Nature: Large socio-economic, geographic and demographic disparities exist in exposure to school closures. “This study introduces and analyses a U.S. School Closure and Distance Learning Database that tracks in-person visits to the vast majority of K–12 public schools in the United States from January 2019 through December 2020. Specifically, we measure year-over-year change in visits to each school throughout 2020 to determine whether the school is engaged in distance learning after the onset of the pandemic.”

Vanderbilt: Vanderbilt scientists sketch rare star system using more than a century of astronomical observations

Vanderbilt: Vanderbilt scientists sketch rare star system using more than a century of astronomical observations. “Vanderbilt astronomers have painted their best picture yet of an RV Tauri variable—a rare type of stellar binary, in which two stars orbit each other within a sprawling disk of dust. To sketch its characteristics, the scientists mined a 130-year dataset that spans the widest range of light yet collected for one of these systems, from radio waves to X-rays.”