The Verge: Facebook’s justification for banning third-party researchers ‘inaccurate,’ says FTC

The Verge: Facebook’s justification for banning third-party researchers ‘inaccurate,’ says FTC. “When Facebook banned the personal accounts of academics researching ad transparency and misinformation on its platform this week, it justified the decision in part by saying it was only following rules set out by the Federal Trade Commission. But the FTC itself says this is ‘inaccurate’ and that its rules require no such action, reports The Washington Post.”

Search Engine Journal: How to Use Google Sheets for Web Scraping & Campaign Building

Search Engine Journal: How to Use Google Sheets for Web Scraping & Campaign Building. “According to Google’s support page, IMPORTXML ‘imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.’ Essentially, IMPORTXML is a function allows you to scrape structured data from webpages — no coding knowledge required. For example, it’s quick and easy to extract data such as page titles, descriptions, or links, but also more complex information.”

Engadget: Facebook disables accounts of NYU team looking into political ad targeting

Engadget: Facebook disables accounts of NYU team looking into political ad targeting. “Before the US election last year, a team of researchers from New York University’s engineering school launched a project to gather more data on political ads. In particular, the team wanted to know how political advertisers choose the demographic their ads target and don’t target. Shortly after the project called the NYU Ad Observatory went live, however, Facebook notified the researchers that their efforts violate its terms of service related to bulk data collection. Now, the social network has announced that it has ‘disabled the accounts, apps, Pages and platform access associated with NYU’s Ad Observatory Project and its operators…’”

BBC: How your personal data is being scraped from social media

BBC: How your personal data is being scraped from social media. “Name, location, age, job role, marital status, headshot? The amount of information people are comfortable with posting online varies. But most people accept that whatever we put on our public profile page is out in the public domain. So, how would you feel if all your information was catalogued by a hacker and put into a monster spreadsheet with millions of entries, to be sold online to the highest paying cyber-criminal?”

Motherboard: Hackers Scrape 90,000 GETTR User Emails, Surprising No One

Motherboard: Hackers Scrape 90,000 GETTR User Emails, Surprising No One. “On Tuesday, a user of a notorious hacking forum posted a database that they claimed was a scrape of all users of GETTR, the new social media platform launched last week by Trump’s former spokesman Jason Miller, who pitched it as an alternative to ‘cancel culture.’ The data seen by Motherboard includes email addresses, usernames, status, and location.”

Techdirt: Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites

Techdirt: Clearview Forbids Users From Scraping Its Database Of Images It Scraped From Thousands Of Websites . “Clearview called out Google’s apparent hypocrisy on the subject of site scraping when Google sent a cease-and-desist demanding it stop harvesting images and data from Google’s online possessions. But Clearview is apparently unable to recognize its own hypocrisy. While it’s cool with site scraping when it can benefit from it, it frowns upon others perpetrating this ‘harm’ on its own databases.”

Hongkiat: 5 Best Web Scraping Tools to Extract Online Data

Hongkiat: 5 Best Web Scraping Tools to Extract Online Data. “These software look for new data manually or automatically, fetching the new or updated data and storing them for your easy access. For example, one may collect info about products and their prices from Amazon using a scraping tool. In this post, we’re listing the use cases of web scraping tools and the top 10 web scraping tools to collect information, with zero codings.”

Washington: Recipeasly promised to ‘fix’ online recipes. After critics called it theft, the site shut down.

Washington Post: Recipeasly promised to ‘fix’ online recipes. After critics called it theft, the site shut down.. “Lisa Lin can understand why home cooks might be interested in Recipeasly. The website allows users to collect their favorite recipes from around the Internet in one convenient location, sort of like an online recipe box. But as the founder of Healthy Nibbles, a seven-year-old website featuring hundreds of recipes, Lin doesn’t like how Recipeasly has marketed itself or how it developed a product without any apparent buy-in from the food bloggers and recipe developers who could be most affected by it.”

Pete Warden: How screen scraping and TinyML can turn any dial into an API

Pete Warden: How screen scraping and TinyML can turn any dial into an API. “I’ve already heard from multiple teams who have legacy hardware that they need to monitor, in environments as varied as oil refineries, crop fields, office buildings, cars, and homes. Some of the devices are decades old, so until now the only option to enable remote monitoring and data gathering was to replace the system entirely with a more modern version. This is often too expensive, time-consuming, or disruptive to contemplate. Pointing a small, battery-powered camera instead offers a lot of advantages. Since there’s an air gap between the camera and the dial it’s monitoring, it’s guaranteed to not affect the rest of the system, and it’s easy to deploy as an experiment, iterating to improve it.”

The Guardian: How technology unlocked the secretive power of ‘Queen’s consent’

The Guardian: How technology unlocked the secretive power of ‘Queen’s consent’. “Have you ever right-clicked on a webpage and pressed the ‘View Page Source’ button? You’ll see the HTML building blocks: the mark-up incantations used to build the page on your screen. The HTML focuses on presentation: what colour that text should be, how big that image should be, and so on. Web scraping is the art of transforming this semi-structured soup back into the structured data that produced it – in this case, who was speaking in which chamber at what time, and what did they say.”

ZDNet: Facebook link preview feature used as a proxy in website-scraping scheme

ZDNet: Facebook link preview feature used as a proxy in website-scraping scheme. “The technique consisted of using Facebook developer accounts to place calls to Facebook or Facebook Messenger API servers, requesting a link preview for pages a group wanted to scrape. Facebook would fetch the data, assemble it in a link preview, and return it to the data scrappers as an API response, ready to be ingested into the scrapper’s database.” Pretty sure they mean scrapers, but I’m not going to argue with ZDNet.

Washington Post: Chinese firm harvests social media posts, data of prominent Americans and military

Washington Post: Chinese firm harvests social media posts, data of prominent Americans and military. “Biographies and service records of aircraft carrier captains and up-and-coming officers in the U.S. Navy. Real-time tweets originating from overseas U.S. military installations. Profiles and family maps of foreign leaders, including their relatives and children. Records of social media chatter among China watchers in Washington. Those digital crumbs, along with millions of other scraps of social media and online data, have been systematically collected since 2017 by a small Chinese company called Shenzhen Zhenhua Data Technology for the stated purpose of providing intelligence to Chinese military, government and commercial clients, according to a copy of the database that was left unsecured on the Internet and retrieved by an Australian cybersecurity consultancy.”