Northeastern: Northeastern Library Pioneers New Methods Of Big Data Scholarship In Effort To Digitize History. “Dan Cohen has a vision for the future of the Northeastern library. Cohen, dean of the libraries and vice provost for information collaboration, wants to transform Northeastern’s vast archive of print and photographic data into a standardized digital form that will allow scholars to use modern Big Data techniques to analyze 300-year-old information. And now, thanks to a grant from the National Endowment for the Humanities, Cohen will have the resources he needs to transform that vision into reality.”
Simon Willison: Analyzing US Election Russian Facebook Ads . “Two interesting data sources have emerged in the past few weeks concerning the Russian impact on the 2016 US elections. FiveThirtyEight published nearly 3 million tweets from accounts associated with the Russian ‘Internet Research Agency’—see my article and searchable tweet archive here. Separately, the House Intelligence Committee Minority released 3,517 Facebook ads that were reported to have been bought by the Russian Internet Research Agency as a set of redacted PDF files.” Mr. Willison created some tools for exploring the data, as well as creating ancillary utilities.
The Conversation: The math behind Trump’s tweets. “Given the volume of Trump’s tweets and their potential political relevance, we thought it would be revealing and novel to use mathematical methods to analyze the web of interactions formed by his most frequently used keywords.”
ZDNet: Dropbox still has questions to answer after claims of improper data sharing. “In case you missed it, the highlights of a research study by Northwestern University published on Harvard Business Review revealed Dropbox had given them ‘access to project-folder-related data’ over a two-year period from about 400,000 users across 1,000 universities. The researchers initially claimed Dropbox gave them raw data, which they anonymized, but their report was updated after ZDNet reported Monday that Dropbox said it anonymized the data before handing it over.”
Newswise: Berkeley Lab-Developed Digital Library is a Game Changer for Environmental Research. “… storing, accessing and incorporating environmental data into models is challenging due to the diversity of the datasets, which include measurement of properties associated with bedrock, groundwater, soils, vegetation and atmospheric compartments of environmental systems. Now accessing archival data generated by environmental field, experimental and modeling activities has gotten much easier with the April 1 launch of ESS-DIVE (Environmental System Science – Data Infrastructure for a Virtual Ecosystem)—a digital archive that serves as a repository for hundreds of U.S. Department of Energy (DOE)-funded research projects under the agency’s Environmental System Science umbrella, which includes the Subsurface Biogeochemical Research and Terrestrial Ecosystem Sciences programs. The digital library also serves datasets that were previously stored in DOE’s Carbon Dioxide Information Analysis Center archive.”
Signal: Social Media Helps Detect Nuclear Agreement Violations. “Researchers at North Carolina (NC) State University have developed a new computational model that draws on normally incompatible data sets, such as satellite imagery and social media posts, to answer questions about what is happening in targeted locations. The model identifies violations of nuclear nonproliferation agreements. The data can include traditional sources, such as Geiger counter readings or multispectral data from satellite imagery, but many may be nontraditional and diverse, including Flickr and Twitter posts.”
Calvin News: Calvin Prof Using AI To Hear Whisper In Twitter’s Whirlwind. “When looking at Twitter, computer science professor Keith Vander Linden formerly saw noise: a continuous roar of chaotic 280-character messages. From this tumult, however, he now discerns meaningful patterns: ‘if you look at enough tweets,’ says Vander Linden, ‘with the right kind of statistical models, you can derive a signal from that, you can find out information about what people are saying about stuff, and from that you can infer what they are thinking.'”