Perplexity accused of scraping websites that explicitly blocked AI scraping
Cloudflare published research alleging that AI startup Perplexity is ignoring robots.txt blocks and using deceptive methods to scrape content. Perplexity denies the claims.
Perplexity accused of scraping websites that explicitly blocked AI scraping | TechCrunch
–:–:–:–
The first StrictlyVC of 2026 hits SF on April 30. Tickets are going fast. Register now.
Buy one Disrupt pass, and get the second at 50% off. Ends May 8. Register now.
Close
TechCrunch Desktop Logo
TechCrunch Mobile Logo
Latest
Startups
Venture
Apple
Security
AI
Apps
Events
Podcasts
Newsletters
Search
Submit
Site Search Toggle
Mega Menu Toggle
Topics
Latest
AI
Amazon
Apps
Biotech & Health
Climate
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
Transportation
Venture
More from TechCrunch
Staff
Events
Startup Battlefield
StrictlyVC
Newsletters
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
Image Credits:Kimberly White/Getty Images for TechCrunch
Security
Perplexity accused of scraping websites that explicitly blocked AI scraping
Lorenzo Franceschi-Bicchierai
8:41 AM PDT · August 4, 2025
AI startup Perplexity is crawling and scraping content from websites that have explicitly indicated they don’t want to be scraped, according to internet infrastructure provider Cloudflare.
On Monday, Cloudflare published research saying it observed the AI startup ignore blocks and hide its crawling and scraping activities. The network infrastructure giant accused Perplexity of obscuring its identity when trying to scrape web pages “in an attempt to circumvent the website’s preferences,” Cloudflare’s researchers wrote.
AI products like those offered by Perplexity rely on gobbling up large amounts of data from the internet, and AI startups have long scraped text, images, and videos from the internet many times without permission to make their products work. In recent times, websites have tried to fight back by using the web standard Robots.txt file, which tells search engines and AI companies which pages can be indexed and which shouldn’t, efforts that have seen mixed results so far.
Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent,” meaning a signal that identifies a website visitor by their device and version type, as well as changing their autonomous system networks, or ASN, essentially a number that identifies large networks on the internet, according to Cloudflare.
“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” read Cloudflare’s post.
Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s blog post as a “sales pitch,” adding in an email to TechCrunch that the screenshots in the post “show that no content was accessed.” In a follow-up email, Dwyer claimed the bot named in the Cloudflare blog “isn’t even ours.”
Cloudflare said it first noticed the behavior after its customers complained that Perplexity was crawling and scraping their sites, even after they added rules on their Robots file and for specifically blocking Perplexity’s known bots. Cloudflare said it then performed tests to check and confirmed that Perplexity was circumventing these blocks.
Techcrunch event
This Week Only: Buy one pass, get the second at 50% off
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.
This Week Only: Buy one pass, get the second at 50% off
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.
San Francisco, CA | October 13-15, 2026
REGISTER NOW
“We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” according to Cloudflare.
The company also said that it has de-listed Perplexity’s bots from its verified list and added new techniques to block them.
Cloudflare has recently taken a public stance against AI crawlers. Last month, Cloudflare announced the launch of a marketplace allowing website owners and publishers to charge AI scrapers who visit their sites. Cloudflare’s chief executive Matthew Prince sounded the alarm at the time, saying AI is breaking the business model of the internet, particularly publishers. Last year, Cloudflare also launched a free tool to prevent bots from scraping websites to train AI.
This is not the first time Perplexity is accused of scraping without authorization.
Last year, news outlets, such as Wired, alleged Perplexity was plagiarizing their content. Weeks later, Perplexity’s CEO Aravind Srinivas was unable to immediately answer when asked to provide the company’s definition of plagiarism during an interview with TechCrunch’s Devin Coldewey at the Disrupt 2024 conference.
Topics
AI, Artificial Intelligence (AI), bots, cloudflare, LLMs, Perplexity, scraping, Security
When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.
Lorenzo Franceschi-Bicchierai
Senior Reporter, Cybersecurity
Lorenzo Franceschi-Bicchierai is a Senior Writer at TechCrunch, where he covers hacking, cybersecurity, surveillance, and privacy.
You can contact or verify outreach from Lorenzo by emailing [email protected], via encrypted message at +1 917 257 1382 on Signal, and @lorenzofb on Keybase/Telegram.
View Bio
May 27
Athens, Greece
StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone.
REGISTER NOW
Most Popular
As workers worry about AI, Nvidia’s Jensen Huang says AI is ‘creating an enormous number of jobs’
Lucas Ropek
Ouster’s new color lidar is coming to replace cameras
Sean O'Kane
This tiny, magnetic e-reader could stop you from doomscrolling
Amanda Silberling
Uber wants to turn its millions of drivers into a sensor grid for self-driving companies
Connie Loizos
Y Combinator alum Skio sells for $105M cash, only raised $8M, founder says
Julie Bort
Elon Musk testifies that xAI trained Grok on OpenAI models
Tim Fernholz
On the stand, Elon Musk can’t escape his own tweets
Tim Fernholz
Loading the next article
Error loading the next article
X
youTube
Mastodon
Threads
Bluesky
TechCrunch
Staff
Contact Us
Advertise
Crunchboard Jobs
Site Map
Terms of Service
Privacy Policy
RSS Terms of Use
Code of Conduct
Anthropic
Elon Musk
Meta Earnings
Satya Nadella
Mythos
Tech Layoffs
ChatGPT
© 2026 TechCrunch Media LLC.