Perplexity accused of scraping websites that explicitly blocked AI scraping [View all]
Source: TechCrunch
AI startup Perplexity is crawling and scraping content from websites that have explicitly indicated they dont want to be scraped, according to internet infrastructure provider Cloudflare.
On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and hide its crawling and scraping activities. The network infrastructure giant accused Perplexity of obscuring its identity when trying to scrape web pages in an attempt to circumvent the websites preferences, Cloudflares researchers wrote.
-snip-
Perplexity appears to be willingly circumventing these blocks by changing its bots user agent, meaning a signal that identifies a website visitor by their device and version type; as well as changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the internet, according to Cloudflare.
This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals, read Cloudflares post.
-snip-
Read more: https://techcrunch.com/2025/08/04/perplexity-accused-of-scraping-websites-that-explicitly-blocked-ai-scraping/
Cloudflare also said Perplexity has been using "a generic browser intended to impersonate Google Chrome on macOS."
Very crooked company. But then, I don't know of any generative AI company that isn't based on theft and deceit.
The AI bots are doing terrible damage to the internet. Including here at DU. As EarlG explained last week
https://www.democraticunderground.com/101316061
the downtime and update then were at least partly about the bot problem, especially AI scrapers.