A new study by AI detection firm Copyleaks reveals that 74.2% of DeepSeek’s text shows striking stylistic similarities to OpenAI’s ChatGPT. The findings raise questions about whether DeepSeek used ChatGPT-generated outputs in its training.
Using advanced classifiers, the study found that while models like Claude, Gemini, and Llama were easily distinguishable, DeepSeek’s text closely resembled OpenAI’s.
Copyleaks’ Shai Nisan compared it to identifying a manuscript’s author based on writing style. While this similarity doesn’t prove DeepSeek is a direct derivative of ChatGPT, it suggests potential issues with DeepSeek’s development process.
If DeepSeek used OpenAI’s outputs without authorization, it could violate intellectual property rights. This situation highlights the need for clearer AI regulations and transparency in training data.
The study also has market implications for DeepSeek. The company has claimed to offer more efficient training solutions compared to expensive AI processors like Nvidia, but unauthorized use of OpenAI data could lead to significant financial and legal consequences.
Despite the possibility of shared datasets, Copyleaks’ study suggests deeper structural similarities between DeepSeek and ChatGPT. Nisan emphasized that AI models have distinct writing styles due to factors like architecture and fine-tuning methods.
The findings could influence the future of AI regulations, pushing for greater transparency and clearer IP protections.