Complete Manual for Local Deployment of Deepseek R1

 

I. Introduction

DeepseekR1 is a high-performance general-purpose large language model that supports complex reasoning, multimodal processing, and technical document generation. This manual provides a comprehensive guide for the local deployment of DeepseekR1, covering hardware configuration, domestic chip adaptation, quantization schemes, cloud-based alternatives, and the complete deployment method for the 671B MoE model using Ollama.

Key Notes:

  • ​**Individual Users:**​ Deployment of 32B and above models is not recommended due to high hardware costs and complex maintenance.
  • ​**Enterprise Users:**​ Professional team support is required, and ROI (Return on Investment) should be evaluated before deployment.

II. Core Configuration Requirements for Local Deployment

1. Model Parameters and Hardware Requirements

Model ParametersWindows ConfigurationMac ConfigurationUse Case
1.5BRAM: 4GB, GPU: Integrated/Modern CPU, Storage: 5GBMemory: 8GB (M1/M2/M3), Storage: 5GBSimple text generation, basic code completion
7BRAM: 8-10GB, GPU: GTX 1680 (4-bit quantization), Storage: 8GBMemory: 16GB (M2 Pro/M3), Storage: 8GBMedium complexity Q&A, code debugging
14BRAM: 24GB, GPU: RTX 3090 (24GB VRAM), Storage: 20GBMemory: 32GB (M3 Max), Storage: 20GBComplex reasoning, technical document generation
32B+Enterprise-level deployment (multi-GPU required)Not supportedScientific research, large-scale data processing

2. Computing Power Requirements

ModelParameter SizePrecisionMinimum VRAMMinimum Computing Power
DeepSeek-R1 (671B)671BFP8≥890GB2XE9680 (16H20 GPU)
DeepSeek-R1-Distill-70B70BBF16≥180GB4L20 or 2H20 GPU

III. Domestic Chip and Hardware Adaptation

1. Domestic Ecosystem Partner Updates

CompanyAdaptation ContentPerformance Benchmark (vs NVIDIA)
Huawei AscendNative support for R1 series, end-to-end inference optimizationEquivalent to A100 (FP16)
Moore ThreadsMXN series supports 70B model BF16 inference, VRAM utilization increased by 30%Equivalent to RTX 3090
Hygon DCUAdapted for V3/R1 models, performance benchmarked against NVIDIA A100Equivalent to A100 (BF16)

2. Recommended Domestic Hardware Configurations

Model ParametersRecommended SolutionUse Case
1.5BTaichu T100 accelerator cardPrototype validation for individual developers
14BKunlun K200 clusterEnterprise-level complex task inference
32BBichen computing platform + Ascend 910B clusterScientific research and multimodal processing

IV. Cloud Deployment Alternatives

1. Recommended Domestic Cloud Service Providers

PlatformCore AdvantagesUse Case
Silicon FlowOfficial API, low latency, supports multimodal modelsEnterprise-level high-concurrency inference
Tencent CloudOne-click deployment + limited-time free trial, supports VPC privatizationRapid deployment for small to medium-scale models
PPIO CloudPrice is 1/20 of OpenAI, 50 million tokens free upon registrationLow-cost testing and experimentation

2. International Access Channels (requires VPN or enterprise network environment)

  • ​**NVIDIA NIM:**​ Enterprise-level GPU cluster deployment (link)
  • ​**Groq:**​ Ultra-low latency inference (link)

V. Complete 671B MoE Model Deployment (Ollama + Unsloth)

1. Quantization Schemes and Model Selection

Quantization VersionFile SizeMinimum Memory + VRAMUse Case
DeepSeek-R1-UD-IQ1_M158 GB≥200 GBConsumer-grade hardware (e.g., Mac Studio)
DeepSeek-R1-Q4_K_M404 GB≥500 GBHigh-performance servers/cloud GPU

**Download Links:**​

  • HuggingFace Model Library
  • UnslothAI Official Documentation

2. Hardware Configuration Recommendations

Hardware TypeRecommended ConfigurationPerformance (Short Text Generation)
Consumer-gradeMac Studio (192GB unified memory)10+ tokens/sec
High-performance server4xRTX 4090 (96GB VRAM + 384GB RAM)7-8 tokens/sec (mixed inference)

3. Deployment Steps (Linux Example)

  1. ​**Install Dependencies:**​
    bash
    brew install llama.cpp  
  2. ​**Download and Merge Model Shards:**​
    bash
    llama-gguf-split --merge DeepSeek-R1-UD-IQ1_M-00001-of-00004.gguf DeepSeek-R1-UD-IQ1_S.gguf  
  3. ​**Install Ollama:**​
    bash
    curl -fsSL https://ollama.com/install.sh | sh  
  4. ​**Create Modelfile:**​
    markdown
    FROM /path/to/DeepSeek-R1-UD-IQ1_M.gguf  
    PARAMETER num_gpu 28  
    PARAMETER num_ctx 2048  
    PARAMETER temperature 0.6  
    TEMPLATE "<|end_of_thinking|>{{.Prompt}}<|end_of_thinking|>"  
  5. ​**Run the Model:**​
    bash
    ollama create DeepSeek-R1-UD-IQ1_M -f DeepSeek01_Modelfile  
    ollama run DeepSeek-R1-UD-IQ1_M --verbose  

4. Performance Tuning and Testing

  • ​**Low GPU Utilization:**​ Upgrade to high-bandwidth memory (e.g., DDR5 5600+).
  • ​**Expand Swap Space:**​
    bash
    sudo fallocate -l 100G /swapfile  
    sudo chmod 600 /swapfile  
    sudo mkswap /swapfile  
    sudo swapon /swapfile  

VI. Precautions and Risk Warnings

1. Cost Warnings:

  • ​**70B Model:**​ Requires 3+ 80GB VRAM GPUs (e.g., RTX A6000), not feasible for single-GPU users.
  • ​**671B Model:**​ Requires 8xH100 cluster, only deployable in supercomputing centers.

2. Alternatives:

  • ​**Individual Users:**​ Recommended to use cloud APIs (e.g., Silicon Flow) for maintenance-free and compliant solutions.

3. Domestic Hardware Compatibility: Requires customized frameworks (e.g., Ascend CANN, Moore Threads MXMLLM).

VII. Appendix: Technical Support and Resources

  • ​**Huawei Ascend:**​ Ascend Cloud Services
  • ​**Moore Threads:**​ Free API Trial
  • ​**Li Xihan's Blog:**​ Complete Deployment Tutorial

Conclusion

Local deployment of Deepseek R1 requires significant hardware investment and technical expertise. Individual users should proceed with caution, while enterprise users should thoroughly evaluate their needs and costs. Domestic adaptation and cloud services can significantly reduce risks and improve efficiency. Rational planning is essential for cost-effectiveness!

**Manual Updates and Feedback:**​ For additions or corrections, please contact the document author. For detailed access instructions, refer to the Silicon Flow community documentation.

Global Enterprise and Individual Channels

  1. ​**Meta Search:**​ https://metaso.cn
  2. ​**360 Nano AI Search:**​ https://www.n.cn/
  3. ​**Silicon Flow:**​ https://cloud.siliconflow.cn/i/OBklluwO
  4. ​**ByteDance Volcano Engine:**​ https://console.volcengine.com/ark/region:ark+cn-beijing/experience
  5. ​**Baidu Cloud Qianfan:**​ https://console.bce.baidu.com/qianfan/modelcenter/model/buildln/list
  6. ​**NVIDIA NIM:**​ https://build.nvidia.com/deepseek-ai/deepseek-r1
  7. ​**Groq:**​ https://groq.com/
  8. ​**Fireworks:**​ https://fireworks.ai/models/fireworks/deepseek-r1
  9. ​**Chutes:**​ https://chutes.ai/app/chute/
  10. ​**Github:**​ https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1/playground
  11. ​**POE:**​ https://poe.com/DeepSeek-R1
  12. ​**Cursor:**​ https://cursor.sh/
  13. ​**Monica:**​ https://monica.im/invitation?c=ACZ7WJJ9
  14. ​**Lambda:**​ https://lambdalabs.com/
  15. ​**Cerebras:**​ https://cerebras.ai
  16. ​**Perplexity:**​ https://www.perplexity.ai
  17. ​**Alibaba Cloud Bailian:**​ https://api.together.ai/playground/chat/deepseek-ai/DeepSeek-R1

**Note:**​ Requires VPN or enterprise network environment.

Domestic AI Chip Company Support

DateCompanyAnnouncement Title
February 1HuaweiFirst Release! Silicon Flow x Huawei Cloud Jointly Launch DeepSeek R1 & V3 Inference Services Based on Ascend Cloud!
February 1Moore ThreadsGitee AI Jointly Launches Full Suite of DeepSeek R1 Distilled Models with Moore Threads, Free Trial Available!
February 4HygonHygon DCU Successfully Adapts DeepSeek V3 and R1 Models, Officially Launched!
February 4HuaweiAscend Native: Luchen Tech Launches DeepSeek R1 Series Inference API and Cloud Image Services Based on Ascend Computing Power
February 5Moore ThreadsDeepSeek-V3 Full Version Launched on Domestic Moore Threads GPU for First Experience!
February 5HygonHygon DCU Successfully Adapts DeepSeek-Janus-Pro Multimodal Large Model
February 5Bichen TechDeepSeek R1 Launched on Bichen Domestic AI Computing Platform, Empowering Developer Innovation with Full Series Models
February 5Taichu YuanqiDeepSeek-R1 Series Models Adapted on Taichu T100 Accelerator Card in 2 Hours, Free API Service Available!
February 5Yuntian LifeyDeepEdge10 Completes Adaptation of DeepSeek R1 Series Models
February 6Suiyuan TechSuiyuan Tech Achieves Full Deployment of DeepSeek Inference Services Across National AI Computing Centers
February 6Kunlun CoreDomestic AI Card Fully Adapts DeepSeek Training and Inference Versions, Outstanding Performance, One-Click Deployment Available (Document Download Included)

Cloud and AI Computing Company Support

DateCompanyAnnouncement Title
January 28WuWen XinQiongWuWen XinQiong Infini-AI Heterogeneous Cloud Now Offers DeepSeek-R1-Distill, Perfect Combination of Domestic Models and Heterogeneous Cloud
January 28PPIO CloudBig News! DeepSeek-R1 Launched on PPIO Computing Cloud
January 28Silicon FlowSilicon Cloud Launches DeepSeek Multimodal Model: Janus-Pro-7B is Here!
February 1Huawei CloudFirst Release! Silicon Flow x Huawei Cloud Jointly Launch DeepSeek R1 & V3 Inference Services Based on Ascend Cloud!
February 1Silicon FlowFirst Release! Silicon Flow x Huawei Cloud Jointly Launch DeepSeek R1 & V3 Inference Services Based on Ascend Cloud!
February 1China Telecom CloudMysterious "Eastern Power" Gathers! DeepSeek-R1 Model Launched on China Telecom Cloud!
February 2Tencent CloudOne-Click Deployment, 3-Minute Call! DeepSeek-R1 Lands on Tencent Cloud
February 2ZStackFirst Release! ZStack Smart Tower Supports DeepSeek V3/R1/Janus Pro, Multiple Domestic CPU/GPU Available for Private Deployment
February 2PPIO CloudPPIO Computing Cloud Integrates Full DeepSeek Models, Price Only 1/20 of OpenAI, 50 Million Tokens Free Upon Registration!
February 3Alibaba Cloud3 Steps, 0 Code! One-Click Deployment of DeepSeek-V3 and DeepSeek-R1
February 3Baidu Smart CloudBaidu Smart Cloud Qianfan Fully Supports DeepSeek-R1/V3 Calls, Ultra-Low Price
February 3SCNetSupercomputing Internet Launches DeepSeek Series Models, Provides Super Intelligent Fusion Computing Power Support
February 4Tencent CloudOne-Click Deployment + Limited-Time Free Trial! Tencent Cloud Launches DeepSeek Series Models
February 4Silicon FlowFull Package Arrives! Silicon Flow Launches Accelerated Version of DeepSeek-R1 Distilled Model
February 4Volcano EngineFull-Size DeepSeek Models Land on Volcano Engine!
February 4QingCloudLimited-Time Free, One-Click Deployment! Jishi Computing Officially Launches DeepSeek-R1 Series Models
February 4Computing InterconnectDomestic GPU and DeepSeek Accelerated Adaptation, Computing Interconnect Collaborates with Hygon to Launch DeepSeek-R1 Model Services
February 4JD CloudOne-Click Deployment! JD Cloud Fully Launches DeepSeek-R1/V3
February 4SCNetNew Arrival! Try DeepSeek on Supercomputing Internet!
February 5China Unicom Cloud"Nezha Stirs the Sea"! China Unicom Cloud Launches DeepSeek-R1 Series Models!
February 5PPIO CloudPPIO Holiday Report: 99.9% Availability! Overnight Support for Full Version of DeepSeek, Helping Customers Easily Handle Traffic Peaks
February 5Tencent VideoBingji Tech
February 5UCloudUCloud Adapts Full DeepSeek Series Models Based on Domestic Chips
February 5China Mobile CloudFull Version, Full Size, Full Function! China Mobile Cloud Fully Launches DeepSeek
February 6QingCloudContinuous Launch of DeepSeek! Jishi Computing Janus-Pro-7B Text-to-Image Model Arrives
February 6Digital China3-Minute Deployment of High-Performance AI Model DeepSeek, Digital China Helps Enterprises Transform with Intelligence
February 6China Telecom CloudNew Breakthrough in Domestic AI Ecosystem! "Xirang" + DeepSeek Super Combination Arrives!
February 6Parallel TechServer Busy? Parallel Tech Helps You DeepSeek Freely!
February 6UCloudUCloud Private Cloud Launches DeepSeek Series Models
February 7Inspur CloudInspur Cloud First Releases 671B DeepSeek Large Model All-in-One Solution
February 7Beijing SupercomputingBeijing Supercomputing x DeepSeek: Dual Engines Ignite, Driving Trillion-Level AI Innovation Storm

Comments