ProPublica: How People Are Using Our Chicago Parking Ticket Data in Their Research. “A few of them pointed me to aspects of the data that we had not addressed in our coverage. Kevin Lobo, a management consultant, explained how he analyzed the behavior of the Chicago police officers who wrote the most tickets. Wesley Skogan, a professor emeritus at Northwestern University’s Institute for Policy Research, mused about the placement of parking meters throughout the city. Lots of people showed me their charts. The work I saw was rigorous, creative and heartening for the practice of sharing journalistic resources with the public at no cost.”
Geo Awesomeness: CARTO boosts public geospatial data with Google BigQuery. “When location intelligence platform CARTO built its Data Observatory, the chief idea was to create an up-to-date index of location data. The recently released Data Observatory 2.0 takes that vision forward to provide Data Scientists with a scalable platform full of rich data in the format they really need it in! CARTO is now hosting geospatial datasets on Google Cloud’s BigQuery public datasets program.” You can learn more about CARTO via this article from TechCrunch.
Nature: A global wildfire dataset for the analysis of fire regimes and fire behaviour. “Here, we present and test a data mining work flow to create a global database of single fires that allows for the characterization of fire types and fire regimes worldwide. This work describes the data produced by a data mining process using MODIS burnt area product Collection 6 (MCD64A1). The entire product has been computed until the present and is available under the umbrella of the Global Wildfire Information System (GWIS).”
Mozilla Blog: Mozilla and BMZ Announce Cooperation to Open Up Voice Technology for African Languages. “Today, Mozilla and the German Ministry for Economic Cooperation and Development (BMZ) have announced to join forces in the collection of open speech data in local languages, as well as the development of local innovation ecosystems for voice-enabled products and technologies. The initiative builds on the pilot project, which our Open Innovation team and the Machine Learning Group started together with the organization ‘Digital Umuganda’ earlier this year. The Rwandan start-up collects language data in Kinyarwanda, an African language spoken by over 12 million people. Further languages in Africa and Asia are going to be added.”
Library of Congress: In the Library’s Web Archives: 1,000 U.S. Government PowerPoint Slide Decks. “PowerPoint presentations have become a nearly ubiquitous form of communication document in the digital era. At the most basic level, PowerPoint files present a sequence of slides containing text, images and multimedia. Today, we are excited to share out a dataset of 1,000 random slide decks from U.S. government websites, collected via the Library of Congress Web Archive, such as the presentation on transporting hazardous materials in Figure 1.”
Medium: Analysis of Google Political Ads using BigQuery. “Hello everyone, this is my first article on Medium. I have been interested in data science and analytics while working on my Masters project. I have tried my hand with different beginner datasets to learn some of the basics of Python, SQL, and other languages. However, I felt that repeating the same exercises got boring after a while, and I started losing interest in the subject. Then I got a hold of Google Cloud Services and the BigQuery platform.”
Library of Congress: In the Library’s Web Archives: Dig If You Will the Pictures. “The Digital Content Management section has been working on a project to extract and make available sets of files from the Library’s significant Web Archives holdings. This is another step to explore the Web Archives and make them more widely accessible and usable. Our aim in creating these sets is to identify reusable, ‘real world’ content in the Library’s digital collections, which we can provide for public access. The outcome of the project will be a series of datasets, each containing 1,000 files of related media types selected from .gov domains. We will announce and explore these datasets here on The Signal, and the data will be made available through LC Labs.”