Reddit Sues Perplexity AI, Alleging ‘Industrial-Scale’ Data Theft

The social platform accused Perplexity and its data partners of unlawfully harvesting user content to train AI systems.

4 min read

Oct 23, 2025

Social media platform Reddit has sued Perplexity AI in federal court on Wednesday, alleging that the artificial intelligence company and its data partners orchestrated an “ industrial-scale” scheme to scrape the platform’s user-generated content.

Reddit alleges that the other defendants: SerpApi, Oxylabs, and AWM Proxy, developed and sold tools specifically designed to break security measures protecting its content, enabling the large-scale scraping of Reddit data from search results.

The tools were allegedly built with the intention of bypassing two layers of protection: first, by evading Reddit’s own anti-scraping systems, and second, by circumventing Google’s controls to extract Reddit content directly from its search engine results.

The data companies operated as “data-scraping service providers” and “circumvented Google’s technological control measures and automatedly accessed, without authorization, almost three billion search engine results pages,” a copy of the lawsuit reads.

Reddit claims Perplexity used data from the three firms for its answer engine even after receiving a cease-and-desist letter in May 2024.

A representative from Perplexity responded and shared a full response, posted on Reddit.

Perplexity intentionally posted its response on Reddit “to illustrate a simple point: it’s a public Reddit link accessible to anyone, yet by the logic of Reddit’s lawsuit, if you refer to it in any way, they just might sue you too,” the representative told Decrypt.

Perplexity described the lawsuit as “a sad example of what happens when public data becomes a big part of a public company’s business model.”

“Reddit thinks that’s their right. But it is the opposite of an open internet,” Perplexity stated.

A representative from SerpApi told Decrypt they did not receive “any communication or service from Reddit” on the matter, adding that they “strongly disagree with Reddit's allegations” and intend to seek legal recourse.

“No company should claim ownership of public data that does not belong to them. It is possible that it is just an attempt to sell the same public data at an inflated price,” Denas Grybauskas, chief governance and strategy officer at Oxylabs, told Decrypt in an emailed statement.

Reddit similarly “made no attempt to speak” with Oxylabs, Grybauskas said.

Decrypt has reached out to Reddit, Google, and AWM Proxy for comment and will update this article should they respond.

A legal tangle

In cases like this, courts would need to look first at whether the terms of service from platforms like Reddit “explicitly addresses AI training, data scraping, and commercial use,” Andrew Rossow, public affairs attorney and director of strategic partnerships at video search and content intelligence platform Oriane, told Decrypt.

If a user agreed to terms that “grant the platform a broad, perpetual, royalty-free license to their content,” that license “generally governs the relationship between the user and the platform,” Rossow explained.

But it doesn’t “automatically grant the AI company a license” to do the same, unless the terms permitted the platform “to sublicense or sell the data for that purpose,” he added.

Courts would then have to “distinguish between the user's copyright in their expression (the text of the post) and the use of the content for data mining (extracting patterns, facts, and language models),” he explained.

Still, the supposed “knowledge” behind an LLM (large-language model) “is the product of millions of users' time, effort, and creative expression,” Rossow argued.

“Treating this human-generated content as a free, raw, undifferentiated resource is a form of labor exploitation that devalues online contributions,” Rossow opined, adding that AI companies need to “respect digital citizenship and community norms,” given how these are “the implicit and explicit rules of the digital public spaces they ingest.”

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Revolut Secures MiCA License in Cyprus—Is a Stablecoin Next?

Revolut, one of Europe’s most popular digital banks, has obtained a Markets in Crypto-Assets (MiCA) licence from the Cyprus Securities and Exchange Commission (CySEC). The licence could allow Revolut to provide and market crypto-asset services across all 30 markets in the European Economic Area (EEA) under the MiCA legislation, through a provision known as “passporting,” though it could still need to undergo additional scrutiny from individual EU states. In addition, Revolut announced it is set...

Trump Pardons Binance Founder Changpeng Zhao After Biden's 'War on Crypto'

U.S. President Donald Trump has pardoned Binance founder Changpeng Zhao, the White House confirmed to Decrypt on Thursday, marking the latest clemency for the digital asset industry from the crypto-friendly administration. Zhao, also known as "CZ," resigned his position as Binance's CEO in 2023 as part of a plea bargain that sent him to jail for four months in 2024 at a minimum security prison in Lompoc, California. The crypto mogul had pleaded guilty to violating U.S. laws against money launder...

How Tari Lets You Mine Crypto in ‘Less Than a Minute’

Crypto mining is the foundation upon which the infrastructure of decentralized finance is built—but it has a high barrier to entry, requiring specialist equipment and technical expertise. That runs counter to the whole ethos of crypto, according to DeFi project Tari—so for its own layer-1 blockchain, it’s developed a mining infrastructure that runs on any computer, and lets you start earning cryptocurrency in less than a minute using its Tari Universe software. “We actually clock at around 45 se...

News

Courses

Deep Dives

Coins

Videos