Technical .ly: Volunteer data scrapers helped Philadelphia Lawyers for Social Equity preserve client court records

Technical .ly: Volunteer data scrapers helped Philadelphia Lawyers for Social Equity preserve client court records. “As the first state to implement the Clean Slate Law in 2018, Pennsylvania committed to sealing millions of criminal records. The law was enacted to remove educational and vocational disadvantages for people with eligible records, including those associated with certain misdemeanors and people found not guilty in court. While the law cleared barriers to housing, education and employment for individuals across the state, it indirectly created new technological barriers for Philadelphia Lawyers for Social Equity (PLSE).”

CNET: Facebook sues developer over alleged data scraping abuse

CNET: Facebook sues developer over alleged data scraping abuse. “The social network announced on Thursday that it was filing a lawsuit against Mohammad Zaghar and his website, Massroot8, claiming that the service was grabbing Facebook users’ data without permission. The lawsuit filed in the northern district of California alleged that Zaghar’s website offered its customers the ability to scrape data from their Facebook friends — including their phone numbers, gender, date of birth and email addresses.”

Towards Data Science: How to Scrape Google Shopping Prices with Web Data Extraction

Towards Data Science: How to Scrape Google Shopping Prices with Web Data Extraction. “Google Shopping is a good start to market your online business and convert more sales. However, if you’re a newcomer, it is essential to watch and learn how your competitors brand and market their products from Google Shopping by using a web data extraction tool (web scraping tool).”

Digital Inspiration: How to Scrape Reddit with Google Scripts

Digital Inspiration: How to Scrape Reddit with Google Scripts. “Here’s Google script that will help you download all the user posts from any subreddit on Reddit to a Google Sheet. And because we are using pushshift.io instead of the official Reddit API, we are no longer capped to the first 1000 posts. It will download everything that’s every posted on a subreddit.”

Towards Data Science: How to Scrape Tweets From Twitter

Towards Data Science: How to Scrape Tweets From Twitter. “This tutorial is meant to be a quick straightforward introduction to scraping tweets from Twitter in Python using Tweepy’s Twitter API or Dmitry Mottl’s GetOldTweets3. To provide direction for this tutorial I decided to focus on scraping through two avenues: scraping a specific user’s tweets and scraping tweets from a general text search.”

Hackaday: Think You Know cURL? Care To Prove It?

Hackaday: Think You Know cURL? Care To Prove It?. “Do you happen to remember a browser-based game ‘You Can’t JavaScript Under Pressure’? It presented coding tasks of ever-increasing difficulty and challenged the player to complete them as quickly as possible. Inspired by that game, [Ben Cox] re-implemented it as You Can’t cURL Under Pressure!”

The Digital Pulpit: A Nationwide Analysis of Online Sermons (Pew)

Pew (PEW PEW PEW PEW PEW PEW PEW!): The Digital Pulpit: A Nationwide Analysis of Online Sermons. “Frequent churchgoers may have a good sense of what kind of sermons to expect from their own clergy: how long they usually last, how much they dwell on biblical texts, whether the messages lean toward fire and brimstone or toward love and self-acceptance. But what are other Americans hearing from the pulpits in their congregations?” The methodology was as fascinating to me as the research.

Make Tech Easier: How to Use a Data-Scraping Tool to Extract Data from Webpages

Make Tech Easier: How to Use a Data-Scraping Tool to Extract Data from Webpages. “If you’re copying and pasting things off webpages and manually putting them in spreadsheets, you either don’t know what data scraping (or web scraping) is, or you do know what it is but aren’t really keen on the idea of learning how to code just to save yourself a few hours of clicking. Either way, there are a lot of no-code data-scraping tools that can help you out, and Data Miner’s Chrome extension is one of the more intuitive options.”

BBC: Sham news sites make big bucks from fake views

BBC: Sham news sites make big bucks from fake views. “There are 350 million registered domain names on the internet. Experts say it’s impossible to count how many are sham news sites. But just like legitimate websites, they earn money from the major tech companies that pay them to display ads.”

Government Technology: Vermont Attorneys Leverage Open Source Expungement Plug-In

Government Technology: Vermont Attorneys Leverage Open Source Expungement Plug-In. “A Vermont Code for America brigade, Code for BTV, designed a Google Chrome extension to scrape data from criminal dockets found on the state’s legacy court database to autofill expungement and record sealing petitions.”

Codementor: How to Extract Google Maps Coordinates

Codementor: How to Extract Google Maps Coordinates. “Have you ever thought you can make money by knowing how many restaurants there are in a square mile? There is no free lunch, however, if you know how to use Google Maps, you can extract and collect restaurant’s GPS and store them in your own database. With that information on hand and some math calculations, you are off to creating a big data online service. In this article, I will show you how to quickly extract Google Maps coordinates with a simple and easy method.”

MakeUseOf: The Scrapestack API Makes It Easy to Scrape Websites for Data

MakeUseOf: The Scrapestack API Makes It Easy to Scrape Websites for Data. “Finding it time-consuming to visit all your favorite websites and read everything that matters? One solution is a web scraper, a software tool that gathers information you need from other sites. We’re going to look at the scrapestack API, a web scraping service that you can subscribe to. Once set up, you can use scrapestack to grab whatever data you want from other sites.”

Simon Willison: Tracking PG&E outages by scraping to a git repo

Simon Willison: Tracking PG&E outages by scraping to a git repo. “PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires. As it happens, I’ve been scraping and recording PG&E’s outage data every 10 minutes for the past 4+ months. This data got really interesting over the past two days! The original data lives in a GitHub repo (more importantly in the commit history of that repo).”

Ars Technica: Web scraping doesn’t violate anti-hacking law, appeals court rules

Ars Technica: Web scraping doesn’t violate anti-hacking law, appeals court rules. “Scraping a public website without the approval of the website’s owner isn’t a violation of the Computer Fraud and Abuse Act, an appeals court ruled on Monday. The ruling comes in a legal battle that pits Microsoft-owned LinkedIn against a small data-analytics company called hiQ Labs.”

The Citizens Voice: Online database helps track down ‘Boozicorns’ available statewide

The Citizens’ Voice: Online database helps track down ‘Boozicorns’ available statewide. “Pennsylvania’s regulation of wine and spirits offers unique frustrations and, sometimes, opportunities. One of the opportunities is the ability to search the entire state for specific products through the state’s centralized web portal. Anyone who does this regularly will see how often odd lots of wine or spirits— a bottle or a few — may be tucked away somewhere, frustratingly, in Bryn Mawr or Cambria County. A wise-cracky techie created a program and website to scrape the data for those odd lots available in just a single store and dubbed them Boozicorns. ”