National Library of New Zealand: Papers Past data has been set free

National Library of New Zealand: Papers Past data has been set free . “Papers Past is the National Library’s fully text searchable website containing over 150 newspapers from New Zealand and the Pacific, as well as magazines, journals and government reports. As a result of the data being released, people can now access the data from 78 New Zealand newspapers from the Albertland Gazette to the Victoria Times, all published before 1900. The data itself consists of the METS/ALTO XML files for each issue. The XML files sit in the back of Papers Past and are what allows you to locate keywords within articles.”

ReliefWeb: Google and FAO launch new Big Data tool for all

ReliefWeb: Google and FAO launch new Big Data tool for all. “Earth Map is an innovative and free-to-use Web-based tool to provide efficient, rapid, inexpensive and analytically cogent insights, drawn from satellites as well as [Food and Agriculture Organization]’s considerable wealth of agriculturally relevant data, with a few clicks on a computer. Earth Map has also been designed to empower and provide integrative synergies with the federated FAO’s Hand-in-Hand geospatial platform, a more comprehensive tool to provide Members, their partners and donors with the means to identify and execute highly-targeted rural development initiatives with multiple goals ranging from climate adaptation and mitigation to socio-economic resilience.”

The Hustle: The company that wants to preserve our data for 500+ years

The Hustle: The company that wants to preserve our data for 500+ years . “Deep in the Norweigan arctic, on the ice-encrusted island of Spitsbergen, life stands still. The surrounding lands of the Svalbard archipelago are sparse and desolate. It is a place where there is a 1:10 polar bear to human ratio, where the sun doesn’t rise for 4 months per year, and the northern lights dance across the sky. But on the side of a mountain in Spitsbergen, there’s an abandoned coal mine. And inside — some 250 meters below the Earth’s surface — you’ll find a steel vault that contains an archive of film encoded with hundreds of thousands of open-source projects from around the world.”

The Next Web: COVID-19 made your data set worthless. Now what?

The Next Web: COVID-19 made your data set worthless. Now what?. “The COVID-19 pandemic has perplexed data scientists and creators of machine learning tools as the sudden and major change in consumer behavior has made predictions based on historical data nearly useless. There is also very little point in trying to train new prediction models during the crisis, as one simply cannot predict chaos. While these challenges could shake our perception of what artificial intelligence really is (and is not), they might also foster the development of tools that could automatically adjust.”

Data Theatre: Why the Digital Dashboards of Dominic Cummings may not help with COVID (Martin Robbins)

Martin Robbins: Data Theatre: Why the Digital Dashboards of Dominic Cummings may not help with COVID. “The tech industry is an increasingly metrics- and data-obsessed culture. This isn’t necessarily a bad thing: product managers who expose themselves to user research studies and engagement analytics will tend to make smarter decisions, on average, then those who ignore them. The problem, as with any technique or approach, is when data becomes the end rather than the means; when teams and managers start to develop cargo-cult attitudes toward it.”

Explainer: What do political databases know about you? (MIT Technology Review)

MIT Technology Review: Explainer: What do political databases know about you?. “American citizens are inundated with political messages—on social networks, in their news feeds, through email, text messages, and phone calls. It’s not an accident that people get bombarded: political groups prefer a ‘multimodal’ voter contact strategy, where they use many platforms and multiple attempts to persuade a citizen to engage with their cause or candidate. An ad is followed by an email, which is followed by a text message—all designed to reinforce the message. These strategies are employed by political campaigns, political action committees, advocacy groups, and nonprofits alike. These different groups are subject to very different rules and regulations, but they all rely on capturing and devouring data about millions of people in America.”

Phys .org: Big data delivers important new tool in conservation decision making

Phys .org: Big data delivers important new tool in conservation decision making. “The Harry Butler Institute has collaborated with researchers around the world to develop a new tool to inform conservation decisions across Europe. The research is poised to have a direct and immediate impact—on both science and practice.”

Online Journalism Blog: Here are the angles journalists use most often to tell the stories in data

Online Journalism Blog: Here are the angles journalists use most often to tell the stories in data. “In my data journalism teaching and training I often talk about common types of stories that can be found in datasets — so I thought I would take 100 pieces of data journalism and analyse them to see if it was possible to identify how often each of those story angles is used. I found that there are actually broadly seven core data story angles. Many incorporate other angles as secondary dimensions in the storytelling (a change story might go on to talk about the scale of something, for example), but all the data journalism stories I looked at took one of these as its lead.”

Nature: Migrating big astronomy data to the cloud

Nature: Migrating big astronomy data to the cloud. “Astronomers typically work by asking observatories for time on a telescope and downloading the resulting data. But as the amount of data that telescopes produce grows, well, astronomically, old methods can’t keep pace. The Vera C. Rubin Observatory in Chile is geared up to collect 20 terabytes per night as part of its 10-year Legacy Survey of Space and Time (LSST), once it becomes operational in 2022. That’s as much as the Sloan Digital Sky Survey — which created the most detailed 3D maps of the Universe so far — collected in total between 2000 and 2010.”

New York Times: Hoping to Understand the Virus, Everyone Is Parsing a Mountain of Data

New York Times: Hoping to Understand the Virus, Everyone Is Parsing a Mountain of Data. “Six months since the first cases were detected in the United States, more people have been infected by far than in any other country, and the daily rundown of national numbers on Friday was a reminder of a mounting emergency: more than 73,500 new cases, 1,100 deaths and 939,838 tests, as well as 59,670 people currently hospitalized for the virus. Americans now have access to an expanding set of data to help them interpret the coronavirus pandemic.”

Technology Networks: Database Offers Access to 200 Million Immune Sequences From COVID-19 Patients

Technology Networks: Database Offers Access to 200 Million Immune Sequences From COVID-19 Patients. “Across the world, many laboratories are conducting research relating to the SARS-CoV-2 virus, whether it be to understand the pathophysiology of COVID-19, or to develop robust diagnostics and efficacious therapeutics for the disease. As such, the pandemic has highlighted the critical importance of data sharing within the scientific community. The iReceptor Plus consortium, a European Union (EU)- and Canadian-funded project, has gathered 200 million T and B cell receptor sequences from COVID-19 patients – it is the largest repertoire of its kind. The sequencing data is open source and available online through the iReceptor Gateway.”

VentureBeat: Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing

VentureBeat: Mozilla Common Voice updates will help train the ‘Hey Firefox’ wakeword for voice-based web browsing. “Mozilla today released the latest version of Common Voice, its open source collection of transcribed voice data for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. Common Voice now contains over 7,226 total hours of contributed voice data in 54 different languages, up from 1,400 hours across 18 languages in February 2019.”

Bing Blogs: Extracting Covid-19 insights from Bing search data

Bing Blogs: Extracting Covid-19 insights from Bing search data . “As is true for many other topics, search engine query logs may be able to give insight into the information gaps associated with Covid-19…. We are pleased to announce that we have already made Covid-19 query data freely available on GitHub as the Bing search dataset for Coronavirus intent, with scheduled updates every month over the course of the pandemic. This dataset includes explicit Covid-19 search queries containing terms such as corona, coronavirus, and covid, as well as implicit Covid-19 queries that are used to access the same set of web page search results (using the technique of random walks on the click graph).”

Selected Datasets: A New Library of Congress Collection (Library of Congress)

Library of Congress: Selected Datasets: A New Library of Congress Collection. “Friends, data wranglers, lend me your ears; The Library of Congress’ Selected Datasets Collection is now live! You can now download datasets of the Simple English Wikipedia, the Atlas of Historical County Boundaries, sports economic data, half a million emails from Enron, and urban soil lead abatement from this online collection. This initial set of 20 datasets represents the public start of an ongoing collecting program tied to the Library’s plan to support emerging styles of data-driven research, such as text mining and machine learning.”

TechCrunch: Aclima and Google release a new air quality data set for researchers to investigate California pollution

TechCrunch: Aclima and Google release a new air quality data set for researchers to investigate California pollution. “As part of the Collision from Home conference, Aclima chief executive Davida Herzl released a new data set made in conjunction with Google. Free to the scientific community, the data is the culmination of four years of data collection and aggregation resulting in 42 million air quality measurements throughout the state of California.”