The Register: Facebook sues to shut down alleged Instagram clone maker over scraping and sharing personal info for cash . “Facebook on Thursday sued Ensar Sahinturk, a software developer based in Istanbul, Turkey, who is alleged to have built a network of sites that scrape data from Instagram to create Insta-clones.”
ZDNet: Facebook link preview feature used as a proxy in website-scraping scheme. “The technique consisted of using Facebook developer accounts to place calls to Facebook or Facebook Messenger API servers, requesting a link preview for pages a group wanted to scrape. Facebook would fetch the data, assemble it in a link preview, and return it to the data scrappers as an API response, ready to be ingested into the scrapper’s database.” Pretty sure they mean scrapers, but I’m not going to argue with ZDNet.
The Verge: Facebook wants the NYU Ad Observer to quit collecting data about its ad targeting. “Facebook wants a New York University research project to stop collecting data about the social platform’s political ad-targeting, The Wall Street Journal reported. The Ad Observatory, a project of NYU’s engineering school with more than 6,000 volunteers, uses its AdObserver browser extension to scrape data from political ads shown on Facebook. But Facebook says the program is violates its terms of service, which bar scraping.” NYU has published a brief response.
Washington Post: Chinese firm harvests social media posts, data of prominent Americans and military. “Biographies and service records of aircraft carrier captains and up-and-coming officers in the U.S. Navy. Real-time tweets originating from overseas U.S. military installations. Profiles and family maps of foreign leaders, including their relatives and children. Records of social media chatter among China watchers in Washington. Those digital crumbs, along with millions of other scraps of social media and online data, have been systematically collected since 2017 by a small Chinese company called Shenzhen Zhenhua Data Technology for the stated purpose of providing intelligence to Chinese military, government and commercial clients, according to a copy of the database that was left unsecured on the Internet and retrieved by an Australian cybersecurity consultancy.”
Techdirt: Clearview Hires Prominent First Amendment Lawyer To Argue For Its Right To Sell Scraped Data To Cops. “Clearview — the facial recognition company selling law enforcement agencies (and others) access to billions of photos and personal info scraped from the web — is facing lawsuits over its business model, which appears to violate some states’ data privacy laws. It’s also been hit with cease-and-desist requests from a number of companies whose data has been scraped…. Now, the company appears to be going on the offensive.”
MediaPost: LinkedIn Makes Final Plea For Supreme Court To Hear Battle Over Scraping. “A recent court ruling that requires LinkedIn to allow its site to be scraped by a potential competitor will prevent web companies from protecting their users’ privacy, LinkedIn argues in new Supreme Court papers.”
Technical .ly: Volunteer data scrapers helped Philadelphia Lawyers for Social Equity preserve client court records. “As the first state to implement the Clean Slate Law in 2018, Pennsylvania committed to sealing millions of criminal records. The law was enacted to remove educational and vocational disadvantages for people with eligible records, including those associated with certain misdemeanors and people found not guilty in court. While the law cleared barriers to housing, education and employment for individuals across the state, it indirectly created new technological barriers for Philadelphia Lawyers for Social Equity (PLSE).”
CNET: Facebook sues developer over alleged data scraping abuse. “The social network announced on Thursday that it was filing a lawsuit against Mohammad Zaghar and his website, Massroot8, claiming that the service was grabbing Facebook users’ data without permission. The lawsuit filed in the northern district of California alleged that Zaghar’s website offered its customers the ability to scrape data from their Facebook friends — including their phone numbers, gender, date of birth and email addresses.”
Towards Data Science: How to Scrape Google Shopping Prices with Web Data Extraction. “Google Shopping is a good start to market your online business and convert more sales. However, if you’re a newcomer, it is essential to watch and learn how your competitors brand and market their products from Google Shopping by using a web data extraction tool (web scraping tool).”
Digital Inspiration: How to Scrape Reddit with Google Scripts. “Here’s Google script that will help you download all the user posts from any subreddit on Reddit to a Google Sheet. And because we are using pushshift.io instead of the official Reddit API, we are no longer capped to the first 1000 posts. It will download everything that’s every posted on a subreddit.”
Towards Data Science: How to Scrape Tweets From Twitter. “This tutorial is meant to be a quick straightforward introduction to scraping tweets from Twitter in Python using Tweepy’s Twitter API or Dmitry Mottl’s GetOldTweets3. To provide direction for this tutorial I decided to focus on scraping through two avenues: scraping a specific user’s tweets and scraping tweets from a general text search.”
Pew (PEW PEW PEW PEW PEW PEW PEW!): The Digital Pulpit: A Nationwide Analysis of Online Sermons. “Frequent churchgoers may have a good sense of what kind of sermons to expect from their own clergy: how long they usually last, how much they dwell on biblical texts, whether the messages lean toward fire and brimstone or toward love and self-acceptance. But what are other Americans hearing from the pulpits in their congregations?” The methodology was as fascinating to me as the research.
Make Tech Easier: How to Use a Data-Scraping Tool to Extract Data from Webpages. “If you’re copying and pasting things off webpages and manually putting them in spreadsheets, you either don’t know what data scraping (or web scraping) is, or you do know what it is but aren’t really keen on the idea of learning how to code just to save yourself a few hours of clicking. Either way, there are a lot of no-code data-scraping tools that can help you out, and Data Miner’s Chrome extension is one of the more intuitive options.”
BBC: Sham news sites make big bucks from fake views. “There are 350 million registered domain names on the internet. Experts say it’s impossible to count how many are sham news sites. But just like legitimate websites, they earn money from the major tech companies that pay them to display ads.”