DeepSeek Releases V4 Models With 9.5x Lower Memory Requirements and Huawei Ascend Support

DeepSeek has introduced two new open-weight models in preview: DeepSeek V4, a mixture-of-experts model with 284 billion parameters and 13 billion active parameters, and DeepSeek V4-Pro, a 1.6 trillion parameter model with 49 billion active parameters. Both are available for download on Hugging Face as well as through DeepSeek’s API and web service.

V4-Pro was trained on 33 trillion tokens. The company claims it outperforms all open-weight large language models and rivals leading proprietary Western models across its benchmark suite. However, since these claims are self-reported, they should be considered with caution and evaluated against independent testing.

Architectural Changes Behind DeepSeek V4’s Efficiency Gains

The most notable technical update in V4 is a hybrid attention mechanism that combines Compressed Sparse Attention and Heavy Compressed Attention. This combination reduces the computation required during inference and compresses the key-value caches used to track the model’s state. DeepSeek reports that this results in a context window of one million tokens, with memory requirements decreasing by a factor of 9.5 to 13.7 compared to DeepSeek V3.2.

Both V4 models utilize a mix of FP8 and FP4 precision, with quantization-aware training applied to the mixture-of-experts weights. Using FP4 roughly halves the memory needed to store model weights compared to FP8. Additionally, DeepSeek V4 introduces a new optimizer called Muon, which aims to accelerate convergence and enhance training stability.

DeepSeek V4 Hardware Support and API Pricing vs GPT?5.5

DeepSeek V4 has been confirmed to operate on both Nvidia GPUs and Huawei Ascend NPU platforms. The paper mentions validation of the model’s expert parallel scheme across these hardware types. It is not clear whether Huawei accelerators were used during training or solely for inference.

DeepSeek V4 costs $0.14 per million input tokens and $0.28 per million output tokens for uncached requests. The V4-Pro version is priced at $1.74 per million input tokens and $3.48 per million output tokens. In comparison, OpenAI’s GPT-5.5 is priced at $5 per million input tokens and $30 per million output tokens. Both models, including base and instruction-tuned versions, are now available in preview through the DeepSeek API and Hugging Face.

Thank you for being a Ghacks reader. The post DeepSeek Releases V4 Models With 9.5x Lower Memory Requirements and Huawei Ascend Support appeared first on gHacks.