What is DeepSeek?
DeepSeek is an AI model (a chatbot) that functions similarly to ChatGPT, enabling users to perform tasks like coding, reasoning and mathematical problem-solving. It is powered by the R1 model, which boasts 670 billion parameters, making it the largest open-source large language model as of Jan. 28, 2025.
DeepSeek developed two models: v3 and R1. DeepSeek’s R1 model excels in reasoning by producing responses incrementally, mimicking human thought processes. This approach reduces memory usage, making it more cost-effective than many competitors. DeepSeek stands out among AI-powered chatbots for its cost-efficiency. It is said to have cost just $6 million to develop, a fraction of the $100-million-plus price tag for OpenAI’s GPT-4.
Methods employed by DeepSeek to create this model remain uncertain. DeepSeek’s founder reportedly stockpiled Nvidia A100 chips, which have been sanctioned for export to China since September 2022, for high-end use in his AI system. This cache, potentially exceeding 50,000 units, coupled with less advanced but more affordable H800 chips at the lower end, reportedly led to the development of a powerful but lower-cost AI model.
With the ability to work concurrently with a subset of the model’s parameters and a training cost that is a fraction of the investment made by industry giants, DeepSeek has stood out among competitors such as ChatGPT, Google Gemini, Grok AI and Claude AI.
DeepSeek R1 has made its code open-source, though it still keeps the training data proprietary. This transparency allows for verification of the company’s claims. Moreover, the model’s computational efficiency promises faster and more affordable AI research, opening doors for broader exploration. This accessibility may also facilitate deeper investigations into the mechanics of large language models (LLMs).
Key architectural innovations of the DeepSeek-V2 model
DeepSeek-V2 introduces several key architectural advancements. It employs a novel MoE architecture and MLA attention mechanism.
Let’s learn more about these crucial components of the DeepSeek-V2 model:
- Mixture-of-experts (MoE) architecture: Used in DeepSeek, MoE architecture activates only a subset of the model’s parameters and concurrently minimizes the computational resources required for processing the query. In simple terms, it means that instead of having a single, massive neural network, the model consists of multiple smaller “expert” networks, each specializing in different input aspects. During processing, only a subset of these experts is activated for each input, making the computation more efficient.
- Multi-head latent attention (MLA): MLA is a novel attention mechanism that significantly reduces the memory footprint of the model. Traditional attention mechanisms require storing large amounts of information, which can be computationally expensive. MLA compresses this information into a smaller “latent” representation, allowing the model to process information more efficiently.
Moreover, the AI models enhance their performance through a trial-and-error learning paradigm, which is quite close to humans’ own way of learning.
DeepSeek’s ability to balance sophisticated AI capabilities with cost-effective development reflects a strategic approach that could influence the future of large language models. Also, the release of DeepSeek R1 has been described by Marc Andreessen as a “Sputnik moment” for US AI, signifying a significant challenge to American AI dominance.
A Sputnik moment refers to an event that triggers a sudden awareness of a technological or scientific gap between one country or entity and another, leading to a renewed focus on research, development and innovation.
Did you know? AI expert Tom Goldstein, a professor at Maryland University, estimated that ChatGPT costs around $100,000 daily and a staggering $3 million monthly to keep running. His calculations were based on the expenses associated with Azure Cloud, the platform that provides the necessary server infrastructure.
Who developed DeepSeek?
DeepSeek was founded in December 2023 by Liang Wenfeng, who launched the first large language model the following year. Liang, an alumnus of Zhejiang University with degrees in electronic information engineering and computer science, has emerged as a key figure in the AI industry worldwide.
Contrary to many Silicon Valley-based AI entrepreneurs, Liang has a notable background in finance. He is the CEO of High-Flyer, a hedge fund specializing in quantitative trading, which leverages AI to analyze financial data and make investment decisions. In 2019, High-Flyer became China’s first quant hedge fund to raise over 100 billion yuan ($13 million).
Liang established DeepSeek as a separate entity from High-Flyer, but the hedge fund remains a significant investor. DeepSeek primarily focuses on developing and deploying advanced artificial intelligence models, particularly LLMs.
Now called Sam Altman of China, Liang has been vocal about China’s need to innovate rather than imitate AI. In 2019, he emphasized the need for China to advance its quantitative trading sector to rival the US. He believed that the true challenge for Chinese AI was transitioning from imitation to innovation, a shift that required original thinking.
Why is everyone talking about DeepSeek?
The significance of DeepSeek lies in its potential to dramatically transform AI’s tech and financial landscape. When tech leaders in the US were busy investing in nuclear energy to keep their power-guzzling data centers running, DeepSeek achieved the same objectives without the fuss.
AI development consumes immense resources, exemplified by Meta’s $65-billion investment in developing technology. OpenAI CEO Sam Altman stated that the AI industry required trillions of dollars to develop advanced chips for energy-intensive data centers, a crucial component of such models.
DeepSeek demonstrates how at-par AI capabilities can be achieved with significantly lower costs and less sophisticated hardware. This breakthrough has challenged the prevalent idea that the development of AI models requires exorbitant investment.
The availability of AI models at a fraction of the cost and with less sophisticated chips can increase its usage by industries manifold, enhance productivity, and foster unprecedented innovation.
Did you know? Microsoft has heavily invested in OpenAI, initially putting in $1 billion and later adding another $10 billion. This strategic move seems to be paying off, as Bing has seen a 15% increase in daily traffic since integrating ChatGPT.
DeepSeek vs. ChatGPT: How do they compare?
ChatGPT and DeepSeek are both advanced AI tools, but they serve different objectives. DeepSeek is designed for problem-solving in the tech domain, making it ideal for users who need an efficient tool for niche tasks. ChatGPT, on the other hand, is a versatile AI known for its ease of use and creativity, making it suitable for everything from casual conversations to content creation.
When it comes to architecture, DeepSeek R1 uses a resource-efficient MoE framework, while ChatGPT uses a versatile transformer-based approach. Transformers are a type of deep learning model that revolutionized natural language processing by using attention mechanisms to weigh the importance of different parts of the input sequence when processing information.
MoE uses 671 billion parameters but activates only 37 billion per query, enhancing computational efficiency. ChatGPT has a monolithic 1.8 trillion-parameter design, suitable for versatile language generation and creative tasks.
Reinforcement learning (RL) post-training in DeepSeek achieves humanlike “chain-of-thought” problem-solving without heavy reliance on supervised data sets. ChatGPT (o1 model) is optimized for multi-step reasoning, particularly in STEM fields like mathematics and coding.
DeepSeek is built to handle complex queries efficiently, offering precise solutions quickly and cost-effectively. While ChatGPT is powerful, its primary strength lies in general content generation rather than technical problem-solving. ChatGPT stands out when it comes to creative tasks. It can help users generate ideas, write stories, craft poems, and produce marketing content.
Cost is another key difference. DeepSeek offers a more affordable pricing model, especially for users who require AI assistance for technical tasks. ChatGPT, with its broader range of applications, comes at a higher cost for those seeking premium features or enterprise solutions. While ChatGPT offers free trials, DeepSeek is completely free to use, except for API access, which is more affordable than ChatGPT.
DeepSeek R1 was trained in 55 days on 2,048 Nvidia H800 GPUs for $5.5 million, which is less than 1/10th of ChatGPT’s training cost. ChatGPT required massive computational resources, approximately $100 million, and training costs.
Here is a brief comparison of DeepSeek and ChatGPT:
Did you know? Grok AI’s direct access to real-time X data gives it a key advantage: the ability to churn out information on current events and trends, something other AI solutions can’t match.
Limitations and criticisms of DeepSeek
Like other Chinese AI models like Baidu’s Ernie and ByteDance’s Doubao, DeepSeek is programmed to avoid politically sensitive topics. When asked about events like the 1989 Tiananmen Square incident, DeepSeek refuses to respond, stating that it is designed to provide only “helpful and harmless” answers. This built-in censorship may limit DeepSeek’s appeal outside of China.
Security concerns have also been raised regarding DeepSeek. Australia’s science minister, Ed Husic, expressed reservations about the app, emphasizing the need to scrutinize data privacy, content quality and consumer preferences. He advised caution, stating that these issues require careful evaluation before widespread adoption.
In terms of privacy policy, DeepSeek is data-intensive, with a focus on commercialization and potential for broader data sharing, including with advertising partners. Concerns have been raised about data security and privacy surrounding data storage in China.
On the contrary, OpenAI is transparent about data collection and usage, with a stronger emphasis on user privacy, data security and anonymization before using data for AI training.
Here is a simplified comparison between the privacy policies of both rivals:
Thus, while DeepSeek offers advanced AI capabilities at a lower cost, this affordability brings both opportunities and risks. The affordability of advanced AI will make it available to bad actors on both state and non-state levels, which might compromise world security. There is a need to balance innovation with potential geopolitical and security concerns.
This article first appeared at Cointelegraph.com News