The Conversation: The Internet Archive has been fighting for 25 years to keep what’s on the web from disappearing – and you can help

The Conversation: The Internet Archive has been fighting for 25 years to keep what’s on the web from disappearing – and you can help. “You may not realize portions of the internet are constantly disappearing. As librarians and archivists, we strengthen collective memory by preserving materials that document the cultural heritage of society, including on the web. You can help us save the internet, too, as a citizen archivist.”

A Million Squandered: The “Million Dollar Homepage” as a Decaying Digital Artifact (John Bowers)

John Bowers at Harvard: A Million Squandered: The “Million Dollar Homepage” as a Decaying Digital Artifact. “While most of the graphical elements on the Million Dollar Homepage are promotional in nature, it seems safe to say that the buying craze was motivated by a deeper fixation on the site’s perceived importance as a digital artifact. A banner at the top of the page reads ‘Own a Piece of Internet History,’ a fair claim given the coverage that it received in the blogosphere and in the popular press….But to what extent has this history been preserved? Does the Million Dollar Homepage represent a robust digital artifact 12 years after its creation, or has it fallen prey to the ephemerality common to internet content?” If you want an exhibit A to the problems of digital impermanence and linkrot, READ THIS.

Irish Times: Digital Irish content in danger of disappearing, specialist warns

Irish Times: Digital Irish content in danger of disappearing, specialist warns. “Ger Wilson, head of digital collections at the National Library of Ireland, said that with its research showing that as much as 50 per cent of website content can disappear within a year, it is ‘highly likely’ that some critical material has already disappeared. She was speaking following the issuing of a tender notice by the library to carry out an extensive crawl of Irish-registered domains later this year. This is part of an attempt to archive the Irish web so that historians of the future will be able to see what the Irish internet looked like in 2017.”

Forbes: Why We Need To Archive The Web In Order To Preserve Twitter

Forbes: Why We Need To Archive The Web In Order To Preserve Twitter. “As social media has become an ever-more central medium through which global society communicates, there has been considerable discussion about just how libraries and archives can work to preserve these walled gardens in the same way that web archives like the Internet Archive have worked to preserve the open web. Twitter in particular has been a keen focus of the social archiving community due to its streaming APIs and default public nature of most communications sent through the platform. Indeed, in 2010 the Library of Congress received a donation of the entire historical backfile of Twitter and continues to archive all public tweets through present day. Is this doomsday archive by itself truly sufficient to fully preserve Twitter for future generations?” Great article. Not particularly encouraging, but great.

Open Source: LinkArchiver automatically submits links to the Internet Archive

Open Source: LinkArchiver automatically submits links to the Internet Archive. “The internet is forever, except when it isn’t. “Link rot”—where once-valid links to websites become broken over time as pages move or sites go offline—is a real problem for people who try to do research online. The Internet Archive helps solve this problem by making submitted content available in the ‘Wayback Machine.’ The difficulty, of course, is getting people to remember to submit links for archival.”

Internet Archive Chairman Brewster Kahle: The web is ‘not fun and games any more’ (Recode)

Recode: Internet Archive Chairman Brewster Kahle: The web is ‘not fun and games any more’. “Brewster Kahle, the entepreneur-turned-chairman of the Internet Archive, has a George Orwell saying on his mind: ‘If we allow those who control the present to control the past, then they control the future.’ This thought, pulled from Orwell’s ‘Nineteen Eighty-Four,’ guides today’s work at the nonprofit Archive, which turned 20 years old last fall. The average life of a web page is 100 days, Kahle said on the latest Recode Decode, hosted by Kara Swisher, and ‘most of the best of the web is already off the web.’”

WIRED: Diehard Coders Just Rescued NASA’s Earth Science Data

WIRED: Diehard Coders Just Rescued NASA’s Earth Science Data. “Groups like DataRefuge and the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASA’s earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And they’re keeping track of what’s already been removed—because yes, the pruning has already begun.”

Quartz: Guerrilla archivists developed an app to save science data from the Trump administration

Quartz: Guerrilla archivists developed an app to save science data from the Trump administration. “The data rescue movement is growing up fast: What started as a project coordinated through group spreadsheets in Google Docs now has a workflow formalized through a custom-built app designed specifically for this purpose by [Brendan] O’Brien and Daniel Allan, a computational scientist at a national lab (Allan preferred not to indicate a specific lab, and emphasized his participation was in his free time and not on behalf of his employer). Eventually, anyone with ten minutes to spare will be able to open the app, check what government URLs have yet to be archived, see whether those can be simply fed into the Internet Archive (or needs more technical attention to scrape and download any data), and ‘attack a quick data set’ from their couch, O’Brien says. The archiving could be remote, and perpetual.”

Supreme Court of Canada Creates Archive for Cited Sources

The Supreme Court of Canada has begun an archive of online sources cited by it. “…the Office of the Registrar of the SCC has located and archived the content of most online sources that had been cited by the Court between 1998 and 2016. These sources were captured with a content as close as possible to the original content cited.” From now on, sources will be captured and archived immediately.

Internet Archive: If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine

Internet Archive: If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine. “In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web. There are several ways to save pages and whole sites so that they appear in the Wayback Machine. Here are 6 of them.”

Internet Archive Launches New Chrome Extension to Fight Linkrot, Digital Impermanence

The Internet Archive has launched what looks like a great Chrome extension for its Wayback Machine. “By using the ‘Wayback Machine’ extension for Chrome, users are automatically offered the opportunity to view archived pages whenever any one of several error conditions, including code 404, or “page not found,” are encountered. If those codes are detected, the Wayback Machine extension silently queries the Wayback Machine, in real-time, to see if an archived version is available. If one is available, a notice is displayed via Chrome, offering the user the option to see the archived page.”

Wikipedia, Internet Archive Team Up to Fix One Million Broken Links

Wikipedia and the Internet Archive have teamed up. “The Internet Archive, the Wikimedia Foundation, and volunteers from the Wikipedia community have now fixed more than one million broken outbound web links on English Wikipedia. This has been done by the Internet Archive’s monitoring for all new, and edited, outbound links from English Wikipedia for three years and archiving them soon after changes are made to articles. This combined with the other web archiving projects, means that as pages on the Web become inaccessible, links to archived versions in the Internet Archive’s Wayback Machine can take their place. This has now been done for the English Wikipedia and more than one million links are now pointing to preserved copies of missing web content.”