NVIDIA has officially trained a 12B-parameter language model on 10 trillion tokens — entirely in 4-bit precision (NVFP4).
This is the first-ever stable large-scale 4-bit pretraining run without accuracy loss — and it could completely change how AI models are built.
Here’s what makes NVFP4 revolutionary 👇
- ⚡ 2–3× faster math throughput than FP8
- 💾 50% less memory usage
- 🎯 Accuracy nearly identical: FP8 MMLU-Pro = 62.62%, NVFP4 = 62.58%
- 🧩 Stability solved using Random Hadamard transforms, stochastic rounding, and two-dimensional scaling
- 🖥️ Trained entirely on NVIDIA Blackwell GPUs — the first 4-bit model stable across 10 trillion tokens
This is a major milestone for efficient AI training — proving that you can scale to frontier-level performance with half the precision, half the cost, and half the energy.
💡 The takeaway:
4-bit training is no longer theoretical — it’s real, stable, and production-ready.
The next generation of frontier models will be faster, cheaper, and greener — without compromise.

