- Get link
- X
- Other Apps
Company Background and Research Direction
Hangzhou DeepSeek AI Basic Technology Research Co., Ltd. (DeepSeek) was founded in July 2023 by Liang Wenfeng, the founder of the quantitative investment company Huanfang Quantitative.news.sciencenet.cnstdaily.comThe company focuses on the research and development of open source large-scale language models (LLM), with core areas including general natural language processing, mathematical reasoning, and programming code generation. DeepQuest adheres to the "open source + efficient" strategy of the government and has never publicly raised funds or launched 2C consumer products.stdaily.comSix months after its establishment, the company launched its first open-source coding model, DeepSeek Coder (V1), and in May 2024, it released its second-generation MoE large model, DeepSeek-V2.news.sciencenet.cnDeepSeek emphasizes algorithm and engineering efficiency**, using a mixture of experts (MoE) architecture, low-precision training, and algorithm optimization to achieve leading performance with limited resources.
Main models and technical progress
DeepSeek-V2 Series
-
Model architecture : DeepSeek-V2 uses a sparse expert (MoE) architecture, with 236 billion model parameters and 21B parameters activated per step.arxiv.orgIntroduced an innovative multi-head latent attention (MLA) mechanism and DeepSeekMoE sparse computing structure to significantly compress the KV cache and improve inference efficiencyarxiv.orgstdaily.com.
-
Training data and methods : DeepSeek-V2 is self-trained on 8.1T high-quality multi-source corpus, and further optimized by command fine-tuning (SFT) and reinforcement learning (RL)arxiv.org. FP8 mixed precision, multi-machine expert parallelism and other technologies are used during training to reduce costs. The official report pointed out that compared with the previous generation DeepSeek-67B model, V2 saved 42.5% of training costs, reduced KV cache by 93.3%, and increased generation throughput by 5.76 times.arxiv.org.
-
Performance : DeepSeek reports that on knowledge question answering benchmarks such as MMLU, DeepSeek-V2 achieves top performance among open source models with only 21B activation parametersarxiv.org; On code and math benchmarks (e.g. HumanEval, MATH), it performs on par with other open source strong models (e.g. Mixtral 8×22B)arxiv.orgFor example, after optimization in June 2024, DeepSeek-V2 Chat's HumanEval reached 84.76%, MATH reached 71.02%, and BBH reached 83.40%.api-docs.deepseek.com; Under the merged version DeepSeek-V2.5, HumanEval can reach 89%api-docs.deepseek.com.
-
Open source : DeepSeek-V2 model is fully open source, code, weights and technical reports are publicly available (MIT license, commercial use allowed)51cto.comarxiv.orgThe official GitHub provides model checkpoints, which users can download and use on platforms such as Hugging Face.51cto.comarxiv.org.
DeepSeek-Coder Series
-
Positioning and architecture : The DeepSeek-Coder series focuses on code generation and mathematical reasoning. DeepSeek-Coder-V2 (released in June 2024) inherits the V2 architecture and is also a MoE model with 236 billion parameters and 21B activations.51cto.comIn addition, a 16B parameter “Lite” version is provided to support FIM (parameter fusion) deployment.
-
Performance : Officially, DeepSeek-Coder-V2 ranks second in the world in multiple code and math benchmarks, second only to OpenAI's GPT-4o and surpassing GPT-4 Turbo51cto.comIn actual tests, DeepSeek-Coder-V2 has reached the level of GPT-4 Turbo in code generation, understanding, and repair, while also having excellent mathematical and reasoning capabilities.api-docs.deepseek.com51cto.comFor example, in the merged V2.5 model, HumanEval achieved an accuracy rate of 89% and LiveCodeBench achieved an accuracy rate of 41%.api-docs.deepseek.com.
-
Open source : DeepSeek-Coder series is also fully open source : model weights, training code and papers are published on GitHub, allowing free commercial use51cto.comThe official website provides a multi-language interface, and users can call DeepSeek-Coder-V2 to generate code in a 32K context.51cto.com.
DeepSeek-V3 Series
-
Model architecture : DeepSeek-V3 (open source in December 2024) further scales up to 671 billion parameters , activating 37B parameters. It uses the MLA and DeepSeekMoE architectures, and pioneers a load balancing strategy without auxiliary loss and multi-Token prediction targets to improve performancearxiv.org.
-
Training resources : V3 is pre-trained on about 14.8T of diverse corpus, and also performs SFT and RL tuningarxiv.orgAccording to official reports, DeepSeek-V3 training only took about 2.79M H800 GPU hours (≈$5.576 million), which is much lower than the $100 million cost of models such as GPT-4o.arxiv.orgnews.sciencenet.cnThe training process is stable and no rollback is required.
-
Performance : DeepSeek-V3 benchmarks the best closed-source model (Anthropic Claude-3.5-Sonnet) on common multilingual tasks (such as MMLU and GPQA), and significantly surpasses other open-source and closed-source models on math competition tasks (AIME and CNMO).news.sciencenet.cnThe generation speed has also been greatly increased to 60 TPS, more than double that of V2.5news.sciencenet.cnMeta AI experts such as Tian Yuandong spoke highly of the overall progress of this model.news.sciencenet.cn.
-
Open source : DeepSeek-V3 and its conversational version (Chat-V3) are both open source, and the weights can be obtained from the official GitHubarxiv.orgThe company simultaneously launched ChatWeb, mobile apps and API services, and users can try the V3 model for free.deepseek.comnews.sciencenet.cn.
DeepSeek-R1 Series
-
Model positioning : DeepSeek-R1 is a model specifically designed to enhance reasoning capabilities and was released in January 2025. Its feature is that it introduces a large amount of reinforcement learning (similar to the concept of OpenAI O1) in the post-training stage, making the model outstanding in logical reasoning and rapid problem solving.finance.sina.com.cnOfficials say R1’s performance is close to GPT-O1, and it has been made available on AWS and other major cloud platforms.ikala.aifinance.sina.com.cn.
-
Technological innovation : R1 adopts the same MoE/MLA architecture as V3, but focuses on the design of reinforcement learning algorithms such as Group Relative Policy Optimization (GRPO).github.comThe document shows that the reinforcement learning part of R1 training significantly improves the model mining and activation capabilities.
-
Performance : DeepSeek-R1 performs well in mathematical and natural language reasoning tasks. It is said to be close to or even better than GPT-4o in mathematical reasoning benchmarks. Turing Award winners have commented that its thinking ability has innovated the level of open source models.news.sciencenet.cnCurrently, DeepSeek-R1 is widely regarded as one of the most advanced models in the open source camp.finance.sina.com.cn.
Comparison of models with competitors
-
Cost-effectiveness : DeepSeek emphasizes "using a small amount of computing power to do big things." The reasoning cost of DeepSeek-V2 is only 1 yuan per million tokens , which is about 1/7 of the open source Llama3-70B and 1/70 of GPT-4 Turbo.stdaily.comThe training cost of DeepSeek-V3 is about 5.576 million US dollars, which is much lower than the tens of billions of dollars invested in GPT-4o.news.sciencenet.cnIndustry observers believe that DeepSeek's algorithm optimization reduces the model's dependence on huge computing power and provides new ideas for the development of large models.stdaily.comstdaily.com.
-
Open source strategy : All core models of DeepSeek are open source for free (including API, SDK, Docker image), supporting secondary development and commercial use51cto.comnews.sciencenet.cnIn contrast, the OpenAI/Anthropic large models are mainly commercialized through closed-source APIs, and the weights are not public; some Google models are open source (such as some weights of Gemini) but with many restrictions. The openness of DeepSeek has lowered the technical barriers and attracted a lot of attention from developers.
-
Professional capabilities : DeepSeek has advantages in programming code, mathematical reasoning and Chinese processing : it has specially released Coder and Math models, and its code capabilities surpass mainstream open source (ranked among the top in the world)51cto.com), the mathematics 7B model Few-shot MATH can achieve an accuracy rate of more than 50%. DeepSeek's Chinese understanding, dialogue and generation capabilities are also among the best in China.news.sciencenet.cnnews.sciencenet.cnAlthough OpenAI's GPT-4o supports multi-modality and multi-language, its Chinese performance and localization controllability are slightly inferior; Google and Anthropic have made improvements recently, but in the Chinese ecosystem, DeepSeek has the advantage of local data.
Commercial application scenarios
-
To B (Enterprise) : DeepSeek provides flexible products and services for enterprises. For example, intelligent customer service/Q&A : can be integrated into the customer service system and knowledge base to achieve human-computer dialogue and information retrieval; code generation assistant : software manufacturers can embed models to help developers automatically generate, complete or review code; data analysis : combined with proprietary data, the model can output analysis reports and extract key conclusions; content creation : help e-commerce, media, etc. quickly generate product descriptions, news summaries and other texts; industry solutions : DeepSeek has cooperated with cross-border e-commerce platforms (DeepSeek has joined hands with Shushang Cloud to enable B2B mall intelligence), and has carried out verification in multiple fields such as finance, government affairs, automobiles, and the Internet.finance.sina.com.cnMany cloud platforms (Baidu Cloud, Huawei Cloud, Alibaba Cloud, Tencent Cloud, 360 Security Cloud, etc.) have launched the DeepSeek large model, and developers can use the cloud API to try it out at a low cost.finance.sina.com.cnqbitai.com.
-
To C (individual) : DeepSeek provides mobile and web AI assistants , supporting writing, translation, study guidance, Q&A and other functionsdeepseek.comIts educational applications include providing students with Q&A and programming learning assistance; in terms of content creation, it can generate personal blogs, stories, poems, etc.; in terms of entertainment interaction, it can create intelligent chat partners and game NPCs. Currently, DeepSeek App and web version are online and permanently free to use, so that users can experience the latest models.
Commercialization Status and Ecosystem
-
Products and Services : DeepSeek has launched online chatbots (webpage Chat, mobile APP) and open platform APIs, which developers can subscribe to. The API supports 64K context (DeepSeek-V3) and dynamic programming interfaces (function calls, etc.)api-docs.deepseek.comapi-docs.deepseek.comWe also provide enterprise-level private deployment solutions: the standard price is about RMB 450,000 per year, including inference servers (8-card H20 or Huawei 910B), software suites and 5-man-day technical support51cto.com.
-
Pricing strategy : The official API is charged by usage (in US dollars). For example, DeepSeek-V3 (model identification
deepseek-chat
) inputs $0.07/1M tokens (cache hits) and outputs $1.10/1M tokens during peak hours.api-docs.deepseek.comThis corresponds to a few RMB per trillion tokens, which is very competitive in the industry. SciNet reported that V3 inputs 0.5 RMB/million tokens and outputs 8 RMB/million tokens when the cache hits.news.sciencenet.cnAt the same time, many platforms launched discounts (Baidu Cloud and Alibaba Cloud were free or discounted for the initial period of launch).qbitai.comfinance.sina.com.cn). -
Ecosystem and cooperation : DeepSeek cooperates with multiple parties to expand the ecosystem. The National Supercomputing Network Platform launched the "AI Ecosystem Partner Acceleration Plan", providing computing power support for partners and opening up DeepSeek API for free for 3 months.app.dahecube.comIn terms of cloud services, Baidu Qianfan and Alibaba PAI can deploy DeepSeek-V3/R1 with one click.finance.sina.com.cn; Huawei Cloud uses its self-developed inference accelerator to launch DeepSeek inference servicefinance.sina.com.cn; Tencent Cloud supports one-click deployment and callingfinance.sina.com.cn; 360 Security, Yunzhou Technology, etc. also joined and announced to develop a large security model based on DeepSeekfinance.sina.com.cnWhat is more noteworthy is that Nvidia, Amazon AWS and Microsoft Azure will all introduce the DeepSeek-R1 model in 2025.finance.sina.com.cn, indicating that DeepSeek technology is gaining international recognition.
Challenges and Future Prospects
Challenges : Despite its technological leadership, DeepSeek faces fierce competition and pressure to realize commercialization. On the one hand, giants such as OpenAI and Google have huge resources and diversified product ecosystems, and the rapid iteration of new technologies may quickly surpass them; on the other hand, DeepSeek insists on the open source route, and its profit model mainly relies on cloud platform cooperation and value-added services, and it needs to find sustainable income. Data security and compliance are also concerns: open models are prone to abuse, and enterprises need to ensure privatization and security reinforcement before deployment. In addition, the current DeepSeek team is limited in size (about a hundred peoplestdaily.com), still faces organizational development challenges in supporting multiple models, large-scale deployment and customer support.
Future direction : DeepSeek emphasizes algorithm innovation over simply piling up computing power . Industry experts point out that in the future, large models should not only pursue linear expansion of parameters and computing power, but also improve performance through post-training (such as RL, expert models).stdaily.com. DeepSeek has verified the idea of using reinforcement learning to improve logical reasoning in R1, and plans to continue iterating models on diverse tasks (such as multimodal understanding, ultra-long context, tool use, etc.). Possible strategies include: launching stronger hybrid models (such as V2.5 that integrates V3 and Coder capabilities), expanding visual language models (DeepSeek-VL) and professional field models (such as Medicine and Finance versions), optimizing the efficiency of small models (low-latency services), etc. On the commercial level, we will strengthen collaboration with cloud vendors and industry partners to promote the implementation of domestic large models in enterprise applications. Overall, DeepSeek will continue to focus on the two core aspects of "cost-effectiveness" and "sustainable innovation" in the future, and strive to maintain its leading position in the global open source ecosystem by continuously optimizing the architecture and training process.
References: DeepSearch official website and technical reportsarxiv.orgarxiv.org, Official API documentationapi-docs.deepseek.comapi-docs.deepseek.com, Chinese media reportsnews.sciencenet.cnnews.sciencenet.cnstdaily.comfinance.sina.com.cn, industry analysis, etc.
- Get link
- X
- Other Apps
Comments
Post a Comment