Stay informed with free updates
Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.
OpenAI says it has found evidence that Chinese artificial intelligence start-up DeepSeek used the US company’s proprietary models to train its own open-source competitor, as concerns grow over a potential breach of intellectual property.
The San-Francisco-based ChatGPT maker told the Financial Times it had seen some evidence of “distillation”, a technique used by developers to obtain better performance on smaller models by using outputs from larger, more capable models. This allows them to achieve similar results on specific tasks at a much lower cost.
OpenAI declined to comment further on details of its evidence. Its terms of service state users cannot “copy” any of its services or “use output to develop models that compete with OpenAI”.
DeepSeek’s release of its R1 reasoning model has surprised markets, as well as investors and technology companies in Silicon Valley, due to its impressive performance at cognitive tasks. Its models have attained high rankings and comparable results to leading US models.
One person close to OpenAI said that distillation was a common practice in the industry and highlighted that the company offers developers a way to do this using its own platform, but said: “The issue is when you are doing it to create your own model for your own purposes.”
DeepSeek did not immediately respond to a request for comment.
Earlier, President Donald Trump’s AI and crypto tsar David Sacks said “it is possible” that IP theft had occurred.
“There’s a technique in AI called distillation . . . when one model learns from another model [and] kind of sucks the knowledge out of the parent model,” Sacks told Fox News on Tuesday.
“And there’s substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI models, and I don’t think OpenAI is very happy about this,” Sacks added, although he did not provide evidence.
DeepSeek said it used just 2,048 Nvidia H800 graphics cards and $5.6mn to train its V3 model with 671bn parameters, a fraction of what OpenAI and Google spent to train comparably sized models. Some experts pointed out how the model generated responses that indicated it had been trained on outputs from OpenAI’s GPT-4, which would violate its terms of service.
Industry insiders say that, in reality, it is common practice for AI labs, both in China and the US, to use outputs from leading companies such as OpenAI.
Industry leaders such as OpenAI have invested in hiring people to teach their models how to produce responses that sound more human. This is expensive and labour-intensive, and industry insiders say it is common for smaller players to piggyback off their work.
“It is a very common practice for start-ups and academics to use outputs from human-aligned commercial LLMs, like ChatGPT, to train another model,” said Ritwik Gupta, a PhD candidate in AI at the University of California, Berkeley.
“That means you get this human feedback step for free. It is not surprising to me that DeepSeek supposedly would be doing the same. If they were, stopping this practice precisely may be difficult,” he added.
The practice also points to an emerging financial conundrum for frontier companies that are doing cutting-edge research in AI on how they defend their technical edge when other groups can piggyback off their models.
Chinese companies have quickly absorbed lessons from their US counterparts while innovating approaches to maximise their limited number of chips, making it cheaper to train and run the models.
“We know [China]-based companies — and others — are constantly trying to distil the models of leading US AI companies,” OpenAI added in a statement.
“We engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe as we go forward that it is critically important that we are working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to take US technology.”
OpenAI is currently battling allegations of its own copyright infringement from newspapers and content creators, including lawsuits from The New York Times and prominent authors, who accuse the company of training their models on their articles and books without permission.
Read the full article here