Scroll: What a fossil revolution can tell us about the history of ‘big data’

Scroll: What a fossil revolution can tell us about the history of ‘big data’. “…far from spending his time climbing dangerous cliffs and digging up dinosaurs, Jack Sepkoski spent most of his career in front of a computer, building what would become the first comprehensive database on the fossil record of life. The analysis that he and his colleagues performed revealed new understandings of phenomena such as diversification and extinction and changed the way that palaeontologists work. But he was about as different from Indiana Jones as you can get. The intertwining tales of my father and his discipline contain lessons for the current era of algorithmic analysis and artificial intelligence and points to the value-laden way in which we see data.”

The Next Web: Turning big data into sound

The Next Web: Turning big data into sound. “A collaboration between two professors – one of music and one of engineering – at Virginia Tech resulted in the creation of a new platform for data analysis that makes it possible to understand data better by turning it into sound. This is a pioneering approach to studying spatially distributed data which instead of placing information into a visual context to show patterns or correlations – meaning, data visualization – uses an aural environment to leverage the natural affordances of the space and the user’s location within the sound field.”

Phys .org: Using Twitter to discover how language changes

Phys .org: Using Twitter to discover how language changes. “Scientists at Royal Holloway, University of London, have studied more than 200 million Twitter messages to try and unravel the mystery of how language evolves and spreads. The aim of the research was to consider if the spread of language is similar to how genes pass from person-to-person. The team investigated whether language transmission, when people have a conversation, happens in a similar way to when genes are transmitted from a parent to a child.”

Introducing Onomics: Create and Embed Data Tables (Priceonomics)

Courtesy of Patron and all-around good egg Glenn M, from Priceonomics: Introducing Onomics: Create and Embed Data Tables. “We currently create most of our charts using Excel, but the formatting for tables is inconsistent and a lot of information is lost when you use an image of a chart and instead of an embedded version. In the past we’ve tried custom D3 tables (absolutely beautiful, but hard to maintain over time and require programming knowledge to create) and Google Spreadsheets (not suited to pretty tables or adding your logo). Why is it so hard to make a nice looking data table So, today we launch Onomics, our tool for creating and embedding data tables based on the D3 data visualization library. You can give it a try here and play around with sample data.”

Kaylin Walker: Tidy Text Mining Beer Reviews

Kaylin Walker: Tidy Text Mining Beer Reviews. “BeerAdvocate.com was scraped for a sample of beer reviews, resulting in a dataset of 31,550 beers and their brewery, beer style, ABV, total numerical ratings, number of text reviews, and a sample of review text. Review text was gathered only for beers with at least 5 text reviews. A minimum of 2000 characters of review text were collected for those beers, with total length ranging from 2000 to 5000 characters.”

Digital Scholarship Resource Guide: Text analysis (part 4 of 7) (Library of Congress)

Library of Congress: Digital Scholarship Resource Guide: Text analysis (part 4 of 7). “Clean OCR, good metadata, and richly encoded text open up the possibility for different kinds of computer-assisted text analysis. With instructions from humans (“code”), computers can identify information and patterns across large sets of texts that human researchers would be hard-pressed to discover unaided. For example, computers can find out which words in a corpus are used most and least frequently, which words occur near each other often, what linguistic features are typical of a particular author or genre, or how the mood of a plot changes throughout a novel. Franco Moretti describes this kind of analysis as ‘distant reading’, a play on the traditional critical method ‘close reading’. Distant reading implies not the page-by-page study of a few texts, but the aggregation and analysis of large amounts of data.”

Harvard Business Review: How the Data That Internet Companies Collect Can Be Used for the Public Good

Harvard Business Review: How the Data That Internet Companies Collect Can Be Used for the Public Good. “We live in a quantified era. It is estimated that 90% of the world’s data was generated in the last two years — from which entirely new inferences can be extracted and applied to help address some of today’s most vexing problems. In particular, the vast streams of data generated through social media platforms, when analyzed responsibly, can offer insights into societal patterns and behaviors. These types of behaviors are hard to generate with existing social science methods. All this information poses its own problems, of complexity and noise, of risks to privacy and security, but it also represents tremendous potential for mobilizing new forms of intelligence.”