AI News HubLIVE
In-site rewrite4 min read

Perplexity accused of scraping websites that explicitly blocked AI scraping

Cloudflare published research alleging that AI startup Perplexity is ignoring robots.txt blocks and using deceptive methods to scrape content. Perplexity denies the claims.

SourceTechCrunch AIAuthor: Lorenzo Franceschi-Bicchierai

Perplexity accused of scraping websites that explicitly blocked AI scraping | TechCrunch

–:–:–:–

The first StrictlyVC of 2026 hits SF on April 30. Tickets are going fast. Register now.

Buy one Disrupt pass, and get the second at 50% off. Ends May 8. Register now.

Close

TechCrunch Desktop Logo

TechCrunch Mobile Logo

Latest

Startups

Venture

Apple

Security

AI

Apps

Events

Podcasts

Newsletters

Search

Submit

Site Search Toggle

Mega Menu Toggle

Topics

Latest

AI

Amazon

Apps

Biotech & Health

Climate

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

Security

Social

Space

Startups

TikTok

Transportation

Venture

More from TechCrunch

Staff

Events

Startup Battlefield

StrictlyVC

Newsletters

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us

Image Credits:Kimberly White/Getty Images for TechCrunch

Security

Perplexity accused of scraping websites that explicitly blocked AI scraping

Lorenzo Franceschi-Bicchierai

8:41 AM PDT · August 4, 2025

AI startup Perplexity is crawling and scraping content from websites that have explicitly indicated they don’t want to be scraped, according to internet infrastructure provider Cloudflare.

On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and hide its crawling and scraping activities. The network infrastructure giant accused Perplexity of obscuring its identity when trying to scrape web pages “in an attempt to circumvent the website’s preferences,” Cloudflare’s researchers wrote.

AI products like those offered by Perplexity rely on gobbling up large amounts of data from the internet, and AI startups have long scraped text, images, and videos from the internet many times without permission to make their products work. In recent times, websites have tried to fight back by using the web standard Robots.txt file, which tells search engines and AI companies which pages can be indexed and which shouldn’t, efforts that have seen mixed results so far.

Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent,” meaning a signal that identifies a website visitor by their device and version type, as well as changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the internet, according to Cloudflare.

“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” read Cloudflare’s post.

Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s blog post as a “sales pitch,” adding in an email to TechCrunch that the screenshots in the post “show that no content was accessed.” In a follow-up email, Dwyer claimed the bot named in the Cloudflare blog “isn’t even ours.”

Cloudflare said it first noticed the behavior after its customers complained that Perplexity was crawling and scraping their sites, even after they added rules on their Robots file and for specifically blocking Perplexity’s known bots. Cloudflare said it then performed tests to check and confirmed that Perplexity was circumventing these blocks.

Techcrunch event

This Week Only: Buy one pass, get the second at 50% off

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.

This Week Only: Buy one pass, get the second at 50% off

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.

San Francisco, CA | October 13-15, 2026

REGISTER NOW

“We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” according to Cloudflare.

The company also said that it has de-listed Perplexity’s bots from its verified list and added new techniques to block them.

Cloudflare has recently taken a public stance against AI crawlers. Last month, Cloudflare announced the launch of a marketplace allowing website owners and publishers to charge AI scrapers who visit their sites. Cloudflare’s chief executive Matthew Prince sounded the alarm at the time, saying AI is breaking the business model of the internet, particularly publishers. Last year, Cloudflare also launched a free tool to prevent bots from scraping websites to train AI.

This is not the first time Perplexity is accused of scraping without authorization.

Last year, news outlets, such as Wired, alleged Perplexity was plagiarizing their content. Weeks later, Perplexity’s CEO Aravind Srinivas was unable to immediately answer when asked to provide the company’s definition of plagiarism during an interview with TechCrunch’s Devin Coldewey at the Disrupt 2024 conference.

Topics

AI, Artificial Intelligence (AI), bots, cloudflare, LLMs, Perplexity, scraping, Security

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

Lorenzo Franceschi-Bicchierai

Senior Reporter, Cybersecurity

Lorenzo Franceschi-Bicchierai is a Senior Writer at TechCrunch, where he covers hacking, cybersecurity, surveillance, and privacy.

You can contact or verify outreach from Lorenzo by emailing [email protected], via encrypted message at +1 917 257 1382 on Signal, and @lorenzofb on Keybase/Telegram.

View Bio

May 27

Athens, Greece

StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone.

REGISTER NOW

Most Popular

As workers worry about AI, Nvidia’s Jensen Huang says AI is ‘creating an enormous number of jobs’

Lucas Ropek

Ouster’s new color lidar is coming to replace cameras

Sean O'Kane

This tiny, magnetic e-reader could stop you from doomscrolling

Amanda Silberling

Uber wants to turn its millions of drivers into a sensor grid for self-driving companies

Connie Loizos

Y Combinator alum Skio sells for $105M cash, only raised $8M, founder says

Julie Bort

Elon Musk testifies that xAI trained Grok on OpenAI models

Tim Fernholz

On the stand, Elon Musk can’t escape his own tweets

Tim Fernholz

Loading the next article

Error loading the next article

X

LinkedIn

Facebook

Instagram

youTube

Mastodon

Threads

Bluesky

TechCrunch

Staff

Contact Us

Advertise

Crunchboard Jobs

Site Map

Terms of Service

Privacy Policy

RSS Terms of Use

Code of Conduct

Anthropic

Elon Musk

Meta Earnings

Satya Nadella

Mythos

Tech Layoffs

ChatGPT

© 2026 TechCrunch Media LLC.