NVIDIA’s GTC 2026 keynote was, as usual, a two-hour showcase of infrastructure ambition. Most of the coverage has focused on chip specs and data centre architecture. For anyone working in AI video production, the relevant number is this: the Vera Rubin platform is designed to deliver inference at up to 10x lower cost per token than the previous generation, Blackwell.
That’s an infrastructure cost reduction that flows directly to every AI video tool built on NVIDIA hardware – which is nearly all of them. Runway, Luma, Pika, and the rest run on NVIDIA GPUs. When the cost of the underlying compute drops by an order of magnitude, the price of generating a minute of AI video drops with it. Not immediately, and not uniformly, but structurally.
The Vera Rubin NVL72 rack – a single liquid-cooled unit containing 72 Rubin GPUs and 36 custom Vera CPUs – produces 700 million tokens per second. The previous generation managed 22 million from an entire 1GW data centre. NVIDIA also announced DLSS 5, which it’s calling “neural rendering”: a hybrid approach that combines traditional 3D graphics data with generative AI to produce photorealistic output in real time on local RTX hardware. That’s relevant for creators working with tools like ComfyUI, where NVIDIA showed 2x acceleration for text-to-video and keyframe upscaling to 4K on current RTX cards.
For the AI video industry, the long-term trajectory is clear. The hardware that powers generation is getting dramatically cheaper and faster. Local generation on desktop hardware is becoming viable for professional work, not just experimentation. And the gap between “submit a prompt and wait” and “work in real time” is closing faster than most production workflows have accounted for.
None of this is available to consumers yet – Vera Rubin-derived consumer GPUs are expected in 2027. But the data centre hardware is shipping now, which means the tools running on it will start reflecting the performance gains well before the desktop cards.



