Virginia Memory: The Elephant In The Room: Artificial Intelligence Used To Process Governor Tim Kaine’s E-Mails. “How do you eat an elephant? One bite at a time. For the past seven years, that’s how we’ve been tackling the task of processing the 1.5 million e-mails transferred to the Library of Virginia in 2010 as part of the electronic records of outgoing Governor Tim Kaine. When Kaine announced his candidacy for the U.S. Senate in 2011, the Library challenged itself to make the Kaine administration’s e-mail records available for research in time for the 2012 election. What did that entail? Basically, we had to figure out how to separate whatever portion of those 1.5 million e-mails shouldn’t be included in our online collection—either because they aren’t records of enduring value (think e-mails announcing doughnuts in the break room) or because they contain sensitive materials such as attorney-client privileged communications, privacy-protected information, or operational security details.”
Forbes: Preserving Online News In An Ephemeral Web: A Look At Four Months Of Global Digital Journalism. “What might it look like to more systematically assess the longevity of online news, recrawling every single monitored news article after 24 hours and after one week? That was the vision behind GDELT’s open Global Difference Graph, which launched at the end of August last year. Over the last four months it has recrawled 88 million online news articles spanning all countries and 65 languages. Using Google’s BigQuery platform, summarizing this massive change dataset takes just a single line of SQL and less than 6 seconds to quantify at planetary scale the lifespan of an online news article today.”
State Archives of North Carolina: $1.1M grant from Mellon Foundation will facilitate advances in email curation . “The University of North Carolina at Chapel Hill has received a grant for $1.1 million from The Andrew W. Mellon Foundation for a project to develop a toolset that will enable institutions to more quickly and efficiently process emails included in born-digital collections. The UNC School of Information and Library Science (SILS) is partnering with the State Archives of North Carolina under the NC Department of Natural and Cultural Resources (NC DNCR) for the two-year project, which will launch in January. The Review, Appraisal, and Triage of Mail (RATOM) project’s goals are particularly significant for organizations, including libraries, archives and museums (LAMs), that need to provide public access to records while protecting private information.”
Nieman Lab: Building a digital hospice. “Earlier this year, I talked with some people about setting up a new publication. We had a specific focus, a budget, and a great list of potential collaborators. What we didn’t have is a shared vision for what would be the end of the publication. How would we know that it is time to throw in the towel? And then what? It wasn’t so much that my cohort and I disagreed about what to do at the end, but that we had no answers for these questions. It was reason enough to table the discussion.”
KTVA: Anchorage Museum archiving memes, social media posts from earthquake. “The 1964 earthquake was documented in newspaper headlines, letters and photographs shot on film. After the Nov. 30 quake, historians are using words and images from social media to document the disaster. Aaron Leggett, a curator with the Anchorage Museum, said staff started collecting online items for their archive an hour after the quake hit.”
Stars and Stripes: CDs, faxes make comeback as military file-sharing service taken offline. “The shuttering of a widely used military file-sharing service last month has left the services without an online option for transferring sensitive unclassified files, so they’re turning to CDs, DVDs, postal mail and even fax machines.” Remember sneakernet?
Library of Congress: The United States Congressional Web Archive now includes content for the 113th and 114th Congresses.. “The Library of Congress Web Archiving Program is dedicated to providing reliable access to historical web content from the legislative branch. To that end, the Library has just released an update to the United States Congressional Web Archive. The archive, which includes member sites from the House and Senate, as well as House and Senate Committee websites, now includes content for the 113th and 114th Congresses. The archive has also added subject facets for the 105th and 106th Congresses to enhance access to the older content in the archive.”