Hongkiat: 10 Best Web Scraping Tools to Extract Online Data. “Web Scraping tools are specifically developed for extracting information from websites. They are also known as web harvesting tools or web data extraction tools. These tools are useful for anyone trying to collect some form of data from the Internet. Web Scraping is the new data entry technique that don’t require repetitive typing or copy-pasting.”
How-To Geek: How to Scrape a List of Topics from a Subreddit Using Bash. “Reddit offers JSON feeds for each subreddit. Here’s how to create a Bash script that downloads and parses a list of posts from any subreddit you like. This is just one thing you can do with Reddit’s JSON feeds.”
MakeUseOf: What Is Web Scraping? How to Collect Data From Websites. “Think of a type of data and you can probably collect it by scraping the web. Real estate listings, sports data, email addresses of businesses in your area, and even the lyrics from your favorite artist can all be sought out and saved by writing a small script.” This article has a couple of good examples, but it’s mostly an overview (this is not meant as a criticism; it’s an incredibly broad topic that nobody could cover in one article!)
Graham Cluley: Facebook knew for years scammers were harvesting users’ details with phone number searches. Did nothing. “Facebook ignored a widely-known privacy flaw for years, allowing scammers, spammers, and other malicious parties to scoop up virtually all users’ names and profile details. As I explained way back in 2012, when I was writing for the Sophos Naked Security blog, simply entering someone’s phone number or email address into Facebook’s search box would perform a reverse look-up and tell you who it belonged to, with any information they shared publicly on their Facebook profile.”
Techdirt: Court Says Scraping Websites And Creating Fake Profiles Can Be Protected By The First Amendment. “It’s no secret that the Computer Fraud and Abuse Act (CFAA) is a mess. Originally written by a confused and panicked Congress in the wake of the 1980s movie War Games, it was supposed to be an ‘anti-hacking’ law, but was written so broadly that it has been used over and over again against any sort of ‘things that happen on a computer.’ It has been (not so jokingly) referred to as ‘the law that sticks,’ because when someone has done something “icky” using a computer, if no other law is found to be broken, someone can almost always find some weird way to interpret the CFAA to claim it’s been violated. The two most problematic parts of the CFAA are the fact that it applies to ‘unauthorized access’ or to ‘exceeding authorized access’ on any ‘computer… which is used in or affecting interstate or foreign commerce or communications.’ In 1986 that may have seemed limited. But, today, that means any computer on the internet. Which means basically any computer.”
Wolfram Blog: Web Scraping with the Wolfram Language, Part 1: Importing and Interpreting. “Do you want to do more with data available on the web? Meaningful data exploration requires computation—and the Wolfram Language is well suited to the tasks of acquiring and organizing data. I’ll walk through the process of importing information from a webpage into a Wolfram Notebook and extracting specific parts for basic computation.” oo!
Kaylin Walker: Tidy Text Mining Beer Reviews. “BeerAdvocate.com was scraped for a sample of beer reviews, resulting in a dataset of 31,550 beers and their brewery, beer style, ABV, total numerical ratings, number of text reviews, and a sample of review text. Review text was gathered only for beers with at least 5 text reviews. A minimum of 2000 characters of review text were collected for those beers, with total length ranging from 2000 to 5000 characters.”