AI News HubLIVE
站内改写

Elsevier Sues Meta Over Use of Pirated Research Papers to Train AI

Elsevier, alongside other publishers and authors, has filed a lawsuit against Meta, accusing the company of using copyrighted research papers from pirate sites like Sci-Hub to train its Llama large language model. This marks the first time a major academic publisher has taken legal action over AI copyright infringement.

Article intelligence

EngineersAdvanced

Key points

  • Plaintiffs allege Meta used Common Crawl and pirated databases LibGen and Sci-Hub to access protected papers for Llama training.
  • Meta defends on grounds of fair use, citing transformative use precedent.
  • The case follows earlier rulings, such as allowing Anthropic to train on legally purchased books.

Why it matters

This matters because plaintiffs allege Meta used Common Crawl and pirated databases LibGen and Sci-Hub to access protected papers for Llama training.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

In a landmark escalation of the AI copyright wars, academic publishing giant Elsevier has filed a lawsuit against Meta, alleging that the tech company illegally scraped and copied millions of copyrighted research papers to train its Llama large language model. The suit, filed on May 5 in the U.S. District Court for the Southern District of New York, also names Meta CEO Mark Zuckerberg as a defendant.

Joining Elsevier are French publishing group Hachette, British publisher Macmillan, and American novelist and attorney Scott Turow, forming a coalition that includes some of the world’s largest publishers. Their core accusation is that Meta used two main sources of pirated content: the Common Crawl web scraping dataset, which likely includes substantial amounts of paywalled academic articles, and notorious pirate repositories LibGen and Sci-Hub, which offer free access to millions of copyrighted papers and textbooks.

Evidence for the lawsuit reportedly draws from internal Meta emails disclosed in a prior case, Kadrey v. Meta, where authors sued the company over similar claims. Meta has responded by asserting its belief that the use of copyrighted works for AI training constitutes “fair use” under U.S. copyright law. A Meta spokesperson stated that “AI is driving transformative innovation and courts have recognized that training AI on copyrighted content can be fair use.”

This case is significant as the first instance of a major academic publisher directly challenging an AI company over training data. Previous lawsuits have largely been filed by individual authors or news organizations, such as The New York Times, against AI developers. The outcome could set a precedent for how AI models can be trained on academic literature.

Notably, fair use defenses have succeeded in similar contexts. In 2025, a U.S. court ruled that Anthropic, the company behind Claude, could use legally purchased books to train its AI without explicit author permission, citing transformative use. That decision marked the first judicial recognition of AI training as a fair use activity. The current case will test whether that principle extends to pirated content used by tech giants.