Global Investigative Journalism Network: Video Resources for Data Investigations . “For 20 years, GIJN conferences have helped spread data journalism around the world. Our last Global Investigative Journalism Conference — GIJC21, held in November — was no different. GIJN’s first fully online conference featured a full track of data workshops and panels, ranging from analysis with spreadsheets and SQL to programming with R and Python, from tips on scraping and cleaning to data visualization and social network mapping. The sessions were led by a team of all-star trainers from seven countries. This is the second installment of GIJC21 videos, which until now have been available only to conference attendees.” All the videos I spot-checked had captions.
Towards Data Science: A short guide to analyzing public data from Google BigQuery. “In the following paragraphs, we’ll walk through a step by step process of working with Google BigQuery and churn out a nice analysis along the way. Please note, that the scope of BigQuery is quite wide, but I will start with its most basic use, which is accessing public datasets and querying it on R (without downloading on my disk).” You will also need basic knowledge of SQL.
Towards Data Science: Using Google Trends data to leverage your predictive model. “Using Google Trends data in predictive models has some pitfalls. This article describes a way of making Google Trends data usable for a model to deliver breakthrough results by the hands-on example of predicting the success of movies by using Google Search volume.”
Analytics India: 10 Free eBooks Beginners Should Read Before Diving Into Data Science. “There is no dearth of books for Data Science which can help get one started and build a career in the field. But before you begin, getting a preliminary overview of these subjects is a wise and crucial thing to do. A healthy dose of eBooks on big data, data science and R programming is a great supplement for aspiring data scientists.”
Quartz: What’s the best way to learn the programming language R? (Preferably, for free). “As data becomes an ever larger part of work, for many people spreadsheets just are not enough. Programs like Microsoft Excel and Google Sheets are powerful tools, but they have limitations in terms of the amount of data you can work with, the kind of analyses you can do, and the types of charts you can make. When data users reach these limitations, the obvious next step is learning a programming language.”
Knight Center: Learning materials for popular online course on programming language R are now available
Knight Center: Learning materials for popular online course on programming language R are now available. “An online course on the complex programming language R recently ended with more than 3,300 registered students from 131 countries and all instructional materials for the course are now available. The materials are available to the general public and will act as an ongoing resource for those who are interested in learning more about R.”
David Strom: Researching The Twitter Data Feed. “A new book by UCLA professor Zachary Steinert-Threkeld called Twitter as Data is available online free for a limited time, and I recommend you download a copy now. While written mainly for academic social scientists and other researchers, it has a great utility in other situations. Zachary has been working with analyzing Twitter data streams for several years, and basically taught himself how to program enough code in Python and R to be dangerous.”
Kaylin Walker: Tidy Text Mining Beer Reviews. “BeerAdvocate.com was scraped for a sample of beer reviews, resulting in a dataset of 31,550 beers and their brewery, beer style, ABV, total numerical ratings, number of text reviews, and a sample of review text. Review text was gathered only for beers with at least 5 text reviews. A minimum of 2000 characters of review text were collected for those beers, with total length ranging from 2000 to 5000 characters.”
Like Twitter? Like R? You’ll like this tutorial: How to set up a Twitter bot using R. “To operate this account, we wrote an R script for reading the current number of R packages on CRAN every hour and automatically sending a Tweet with the status quo. All we needed to make this work was the Twitter API as well as the twitteR package.”
I know some of you out there are big R fans. Here’s a free guide to using R for text mining. “Jilia Silge and David Robinson are both dab hands at using R to analyze text, from tracking the happiness (or otherwise) of Jane Austen characters, to identifying whether Trump’s tweets came from him or a staffer. If you too would like to be able to make statistical sense of masses of (possibly messy) text data, check out their book Tidy Tidy Text Mining with R, available free online and soon to be published by O’Reilly.”
The Bureau of Economic Analysis (BEA) is launching a new tool to provide quick access to economic statistics for the US and Europe. “The tool is being built on the statistical programming language called ‘R’ that taps into BEA’s and Eurostat’s huge databases and provides analysts, researchers, economists, data-savvy entrepreneurs and others quick access to economic statistics – requiring only a few lines of code to do so. GDP, disposable income and employment by industry and by geographic region are among the key economic statistics that will be available as part of the new data tool.”
Like R? Like text mining? Here ya go. “I am so pleased to announce that tidytext 0.1.2 is now available on CRAN. This release of tidytext, a package for text mining using tidy data principles by Dave Robinson and me, includes some bug fixes and performance improvements, as well as some new functionality….I am even more excited to publicly announce the book that Dave and I have been working on.”
Sean Hackett is building a database of results from MMA (mixed martial arts) fights. And, lucky for us, he’s writing about how he’s doing it. Read the posts from the bottom up. The first one is “Building a large database of MMA fight results I: scraping with rvest” (Rvest is a simple Web scraper for R.)
Data wonks, scraping wonks: The Next Web has a lovely overview of Python and R. “At Springboard, we pair mentors with learners in data science. We often get questions about whether to use Python or R – and we’ve come to a conclusion thanks to insight from our community of mentors and learners.” Yes, it’s a bit commercial, but it’s also packed with information.
You hear a lot about Twitter scraping, less about Facebook scraping. Joshua Rosenberg pointed me his blog about an experiment scraping Facebook interactions for the remaining presidential candidates. “I came across this post on how to scrape data from Facebook pages for statistical analysis, and was motivated to give it a try. After thinking about which pages (including pages for educational organizations and communities) would be interesting to analyze, I looked at interactions with 2016 United States Presidential candidate’s Facebook pages. While the files referenced in the post used Python, you could certainly do the same in R, but I had been looking for a chance to try out Python.” He’s made the scraped data and resulting charts available on GitHub.