Benchmarks

See how DeepSeek V4 performs across industry-standard benchmarks and real-world tasks

DeepSeek V4 Performance Benchmarks

Open Source Spirit. Commercial Grade Power.
DeepSeek V4 leads both open-source and closed-source models across all core benchmarks.

SWE-bench

Software Engineering

42.5%

DeepSeek V4 Score

DeepSeek V4
42.5%
Competitor O*
22.1%
Competitor A*
20.3%

HumanEval

Coding

95.2%

DeepSeek V4 Score

DeepSeek V4
95.2%
Competitor O*
90.1%
Competitor A*
88%

MATH

Reasoning

88.9%

DeepSeek V4 Score

DeepSeek V4
88.9%
Competitor O*
84.5%
Competitor A*
82.1%

All benchmarks are fully reproducible. View evaluation code here