OpenAI is doubling down on hardware control, pairing Nvidia GPUs for model training with Broadcom-designed custom chips for inference, according to The Wall Street Journal.
The goal? Cut costs, improve latency, and secure chip supply — the three pillars of scaling AGI infrastructure sustainably.
The roadmap targets 10 gigawatts of compute by 2030, up from 2 GW today and another 16 GW already committed. These systems rely on high-bandwidth memory (HBM) and dense optical networking to keep massive data centers running efficiently.
Here’s the breakdown 👇
- 🧠 Training stays on Nvidia’s flexible accelerators (still holding ~70% of the training market).
- ⚡ Inference shifts to custom silicon optimized for known model workloads — crucial since serving LLMs dominates cost at global scale.
- 💾 Chips integrate large on-package memory to prevent compute stalls, and expert routing to activate only the necessary parts of the model — cutting power use dramatically.
- 🔗 Broadcom’s switching and optical interconnects ensure that thousands of accelerators act as a single, ultra-fast machine.
Even OpenAI’s new Pulse product — which uses AI agents to summarize the internet for users daily — runs behind a $200/month Pro tier, showing how inference costs shape product pricing.
💡 Bottom line: OpenAI is evolving from a model company into a vertically integrated AI hardware ecosystem, where custom inference silicon defines the next phase of scale.
📄 Source: WSJ

