Social media site Reddit has filed a lawsuit against AI startup Anthropic for using the former’s site data without an agreement.

Reddit shares surged after the news of the lawsuit came out.

Why is Reddit suing Anthropic?

In the lawsuit, which was filed in San Francisco on Wednesday, the social media company said Anthropic has been training its AI models without obtaining permission or signing a partnership with the company.

Reddit further accused Anthropic of using the personal data of social media’s users. The company has been harmed by the unauthorized usage of its content, it added.

The company pointed out that Anthropic’s conduct contradicts its claims of how it sees itself as a “white knight of the AI industry.”

In the complaint, Reddit said it had tried and failed to reach a deal with Anthropic. It added that Anthropic’s bots have accessed its servers after blocking them.

Reddit added that “other giants in the AI space understand a respect Reddit’s rules,” and cited it as the reason for the site making agreements with OpenAI and Google to share its data to train their respective AI agents.

Anthropic recently in May unveiled its latest AI model, Claude 4.

Why AI companies crave Reddit’s data

The immense value of Reddit’s data for AI training cannot be overstated. With over two decades of user-generated content, Reddit has amassed an unparalleled archive of authentic human conversations across virtually every topic imaginable.

This “real-world” dialogue, often raw and unvarnished, is a goldmine for AI models seeking to understand nuanced language, slang, humor, and the informal ways humans communicate.

Unlike curated datasets or traditional news articles, Reddit’s content provides a unique blend of diverse perspectives, community-driven moderation (through upvotes and downvotes), and candid discussions.

This allows AI models to learn not just factual information but also sentiment, context, and the dynamic flow of human interaction.

For a large language model (LLM) striving for natural, conversational capabilities, the sheer volume and quality of Reddit’s discussions are invaluable for improving coherence, relevance, and the ability to respond to complex, open-ended queries.

Furthermore, Reddit’s structure, organized into thousands of subreddits dedicated to specific subjects, offers a well-categorized and topically rich data source.

This organized nature makes it easier for AI developers to target specific domains of knowledge and ensure their models gain expertise in niche areas.

In 2024, Reddit took steps to prevent AI models from scraping its website data. It had created a public content policy for its publicly accessible user data.

Previous deals: setting the precedent

Recognizing the immense value of its data, Reddit has actively pursued licensing agreements with major AI players.

These deals are crucial for Reddit’s revenue diversification following the company’s listing in 2024.

Reddit struck deals with OpenAI and Google, which will allow these companies to use the site’s data.

The social media company signed a roughly $60 million deal with Google in February 2024, allowing it access to Reddit for training the models like Gemini.

This partnership enables Google to leverage Reddit’s extensive discussions to enhance its search capabilities and train its LLMs on up-to-date, human-generated information.

Reddit signed a similar deal with OpenAI in May 2024. The deal also specified that OpenAI will become an advertising partner for Reddit.

OpenAI’s CEO, Sam Altman, was a board member of Reddit in the past. He still holds a stake in the company, and that stake is currently valued at over $1 billion.

On Wednesday, Reddit shares surged 7% to $118.81.

The post Reddit sues Anthropic for allegedly using site’s data without consent appeared first on Invezz