Simon Willison: Tracking PG&E outages by scraping to a git repo

Simon Willison: Tracking PG&E outages by scraping to a git repo. “PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires. As it happens, I’ve been scraping and recording PG&E’s outage data every 10 minutes for the past 4+ months. This data got really interesting over the past two days! The original data lives in a GitHub repo (more importantly in the commit history of that repo).”

Harvard Business Review: Most Analytics Projects Don’t Require Much Data

Harvard Business Review: Most Analytics Projects Don’t Require Much Data. “n their headlong rush into advanced data science, big data, machine learning, and artificial intelligence, too many companies have ignored ‘small data.’ This is a huge miss. The relative ease, ubiquity, and power of small data projects carry profound implications for all employees, managers, and leaders at all levels, in every department, in every organization.”

National Institutes of Health: Five Petabytes of Sequence Read Archive Data Now in the Cloud

National Institutes of Health: Five Petabytes of Sequence Read Archive Data Now in the Cloud. “The National Center for Biomedical Information (NCBI) at the National Library of Medicine (NLM) recently moved the five petabytes of public SRA data to the cloud with support from the National Institutes of Health (NIH) Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative. These data include a variety of genomes, gene expression data, and more.”

University of Colorado Boulder: Anyone can look up school data with new online tool

University of Colorado Boulder: Anyone can look up school data with new online tool. “The database, first made available online in 2016 in a format designed mainly for researchers, is built from 350 million reading and math test scores from 3rd to 8th grade students during 2008-2016 in every public school in the nation. It also includes district-level measures of racial and socioeconomic composition, segregation patterns, and other educational conditions.”

National Library of Medicine: Enhancing Data Sharing, One Dataset at a Time

National Library of Medicine: Enhancing Data Sharing, One Dataset at a Time. “The National Institutes of Health (NIH) has an ambitious vision for a modernized, integrated biomedical data ecosystem. How we plan to achieve this vision is outlined in the NIH Strategic Plan for Data Science, and the long-term goal is to have NIH-funded data be findable, accessible, interoperable, and reusable (FAIR). To support this goal, we have made enhancing data access and sharing a central theme throughout the strategic plan.”

MIT News: A comprehensive catalogue of human digestive tract bacteria

MIT News: A comprehensive catalogue of human digestive tract bacteria. “The human digestive tract is home to thousands of different strains of bacteria. Many of these are beneficial, while others contribute to health problems such as inflammatory bowel disease. Researchers from MIT and the Broad Institute have now isolated and preserved samples of nearly 8,000 of these strains, while also clarifying their genetic and metabolic context.”

National Library of Scotland: Data Foundry launched

National Library of Scotland: Data Foundry launched. “The new Data Foundry site presents Library collections as data in a machine-readable format, widening the scope for digital research and analysis. Techniques like content mining and image analysis can now be carried out using the Library’s collections.”