Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Latest Breaking News

Showing Original Post only (View all)

highplainsdem

(57,883 posts)
Mon Aug 4, 2025, 12:40 PM Aug 4

Perplexity accused of scraping websites that explicitly blocked AI scraping [View all]

Source: TechCrunch

AI startup Perplexity is crawling and scraping content from websites that have explicitly indicated they don’t want to be scraped, according to internet infrastructure provider Cloudflare.

On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and hide its crawling and scraping activities. The network infrastructure giant accused Perplexity of obscuring its identity when trying to scrape web pages “in an attempt to circumvent the website’s preferences,” Cloudflare’s researchers wrote.

-snip-

Perplexity appears to be willingly circumventing these blocks by changing its bots “user agent,” meaning a signal that identifies a website visitor by their device and version type; as well as changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the internet, according to Cloudflare.

“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” read Cloudflare’s post.

-snip-

Read more: https://techcrunch.com/2025/08/04/perplexity-accused-of-scraping-websites-that-explicitly-blocked-ai-scraping/



Cloudflare also said Perplexity has been using "a generic browser intended to impersonate Google Chrome on macOS."

Very crooked company. But then, I don't know of any generative AI company that isn't based on theft and deceit.

The AI bots are doing terrible damage to the internet. Including here at DU. As EarlG explained last week

https://www.democraticunderground.com/101316061

the downtime and update then were at least partly about the bot problem, especially AI scrapers.
5 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Latest Discussions»Latest Breaking News»Perplexity accused of scr...