Library of Congress: Library of Congress Releases Data for Free Download and Discovery. “The Library of Congress announced today its third release of records in its online catalog for free bulk download for research and discovery. The release supports the Library’s effort to continuously expand open access to its vast collections. This MARC (Machine Readable Cataloging Records) release surpasses previous releases and adds more than 200,000 new records to the existing 25 million record database.”
GPO: GPO Makes Available Statute Compilations In USLM XML Format. “The U.S. Government Publishing Office (GPO) and its legislative data partners in the U.S. House of Representatives and the U.S. Senate have made Statute Compilations available in USLM XML format. This new format makes documents easier to use, read, and download. The public can access the compilations on GPO’s trusted digital repository govinfo, the one-stop site for information published by the Government.” USLM stands for United States Legislative Markup. You can read the schema user guide here.
The Register: Google Groups kills RSS support without notice. “Google has either turned off RSS support in Google Groups without telling anyone, or has failed to notice that RSS in Groups no longer functions. RSS, which stands for either RDF Site Summary or Really Simple Syndication, is an open content syndication protocol. It allows people to subscribe to feeds from websites and receive syndicated content from them through an app capable of reading XML-based data.”
National Library of New Zealand: Papers Past data has been set free . “Papers Past is the National Library’s fully text searchable website containing over 150 newspapers from New Zealand and the Pacific, as well as magazines, journals and government reports. As a result of the data being released, people can now access the data from 78 New Zealand newspapers from the Albertland Gazette to the Victoria Times, all published before 1900. The data itself consists of the METS/ALTO XML files for each issue. The XML files sit in the back of Papers Past and are what allows you to locate keywords within articles.”
Search Engine Roundtable: Google Adds New Books Schema Markup With Buy E-Book Links. “Google has added a new schema markup type for books in their developer center. Aaron Bradley posted about it on Google+ explaining that this was just added on December 3, 2016. It is currently a closed beta…”
The New York Times: The Future of the Past: Modernizing the New York Times Archive “In 2014, we launched a redesign of our entire digital platform that gave readers a more modern, fluid, and mobile-friendly experience through improvements such as faster performance, responsive layouts, and dynamic page rendering. While our new design upgraded reader experience for new articles, engineering and resource challenges prevented us from migrating previously published articles into this new design…. Today we are thrilled to announce that, thanks to a cross-team migration effort, nearly every article published since 2004 is available to our readers in the new and improved design.” Lots of great “under the hood” stuff in this article.
The IRS has made available a huge amount of form 990 data. The data are available on Amazon Web Services as a public data set, and it looks like it’s XML, so this is not a tool you can go searching through. It’s more like a back end of data for an API or something. Or you if were doing something small scale, you could probably use Google Sheets to importXML certain data, if you were able to use the URL standards to build the URL for the data you wanted.
Archives.gov has a new read/write API. “The dataset for our catalog API contains all archival descriptions, authority records, digitized records (the images, videos, and so on) and their file metadata, all NARA web pages, and public contributions (tags, transcriptions, and comments). The API will allow developers to retrieve all of this metadata in specified formats (JSON or XML) for any given record or search results set.” A read API is pretty great, but wow, a read-write API?