
The New York Times Co. is asking a federal judge to deny OpenAI’s request to turn over reporters’ notes, interview memos, and other materials used by journalists to produce stories that the media company alleges were used to help train the tech company’s flagship artificial intelligence models.
Lawyers for the companies staked out their positions in dueling memos filed during the Fourth of July holiday week in U.S. District Court for the Southern District of New York, where the New York Times Co. filed a copyright infringement lawsuit against both OpenAI and its partner Microsoft in December 2023.
“OpenAI’s claim that it needs all ‘reporter’s notes, interview memos, records of materials cited, or other ‘files’ for each asserted work”—purportedly to determine whether The Times’s works are in fact protectable intellectual property—is unprecedented and turns copyright law on its head,” Times Co. lawyers wrote July 3.
In a July 1 memo, lawyers for OpenAI told U.S. District Judge Sidney H. Stein that this avenue of legal discovery is directly relevant to the Times’ copyright claims and to OpenAI’s potential defenses, including the doctrine of fair use.
“The Times can only assert infringement over those portions of the works that are (a) original to the author, and (b) owned or exclusively licensed to the Times,” the OpenAI lawyers wrote, adding that the materials are “necessary to determine whether and to what extent the Times is pursuing claims for infringement of works that are not protected, in part or in full, by copyrights the Times owns.”
The New York Times responded, in part, that “OpenAI cites no caselaw permitting such invasive discovery, and for good reason. It is far outside the scope of what’s allowed under the Federal Rules and serves no purpose other than harassment and retaliation for The Times’s decision to file this lawsuit.”
Microsoft does not appear to be publicly involved in this aspect of the dispute.
The suit has the potential to set a legal precedent for the use of public materials by AI and tech companies. The New York Times alleges that Microsoft and OpenAI wrongly used vast amounts of copyrighted material from the newspaper to train the large language models that power ChatGPT and other AI models.
Microsoft AI CEO Mustafa Suleyman recently made headlines for his comments at the Aspen Ideas Festival about the use of web content to train AI models.
“I think that with respect to content that’s already on the open web, the social contract of that content since the ’90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been ‘freeware,’ if you like. That’s been the understanding.
“There’s a separate category where a website or publisher or news organization had explicitly said, ‘Do not scrape will crew me for any other reason than indexing me so that other people can find that content.’ But that’s the gray area. And I think that’s going to work its way through the courts.”
In addition to pursuing unspecified financial damages, the suit seeks an injunction against Microsoft and OpenAI to halt the alleged practice of using the Times’ copyrighted material, and a court order for the destruction “of all GPT or other LLM models and training sets” that incorporate its copyrighted work.
See the full text of the latest New York Times and OpenAI memos below.
New York Times responds to OpenAI discovery request by GeekWire on Scribd
OpenAI request for New York Times reporters' source materials by GeekWire on Scribd