Wednesday, March 19th 2025

NVIDIA Unveils Vera CPU and Rubin Ultra AI GPU, Announces Feynman Architecture
NVIDIA at GTC 2025 announced its next-generation flagship AI GPU, the Rubin Ultra. A successor to the Blackwell Ultra unveiled this year, Rubin Ultra is slated for the second half of 2027. A single Rubin Ultra package contains four AI GPU dies joined at the hip with die-to-die bonding and a fast interconnect that enables cache coherency. The package also features a whopping 1 TB of HBM4e memory. NVIDIA is claiming a performance target of 100 petaFLOPs FP4 per package.
The company also unveiled its next-generation CPU for AI supercomputers, called simply the Vera CPU. A successor to Grace, Vera comes with 88 Arm CPU cores. These are custom high-performance cores designed by NVIDIA, and aren't carried over from the reference Arm Cortex family. The cores support SMT, giving the CPU 176 logical processors. The chip comes with a 1.8 TB/s NVLink C2C connection. Lastly, the company announced that the architecture succeeding Rubin will be codenamed Feynman, after Richard Feynman. The company is looking to debut the first silicon based on Feynman in 2028.
Source:
VideoCardz
The company also unveiled its next-generation CPU for AI supercomputers, called simply the Vera CPU. A successor to Grace, Vera comes with 88 Arm CPU cores. These are custom high-performance cores designed by NVIDIA, and aren't carried over from the reference Arm Cortex family. The cores support SMT, giving the CPU 176 logical processors. The chip comes with a 1.8 TB/s NVLink C2C connection. Lastly, the company announced that the architecture succeeding Rubin will be codenamed Feynman, after Richard Feynman. The company is looking to debut the first silicon based on Feynman in 2028.
13 Comments on NVIDIA Unveils Vera CPU and Rubin Ultra AI GPU, Announces Feynman Architecture
Feynman in 2028 seems to be the RTX 6000 series.
At least that's what im reading from this news. Next year they will likely release Blackwell based Super series. Possibly with 3GB G7 modules. Maybe based on Blackwell Ultra.
It's unlikely that Blackwell Ultra (B300/GB300) will be made a consumer design in its raw form. I haven't seen any details about what changed between B100 and B300 so we don't know if those changes already made it to consumer GB202 for example.
When it comes to Super refresh it depends. If Nvidia is content with just a memory capacity upgrade (assuming we even get it) then yes it will be the same Blackwell we have now. If they also plan on increasing performance then it would make sense to use Blackwell Ultra for 2026.
Rub Ultra is likely a similar scheme, exact same architecture, but with double the amount of chips in a bigger package, with denser (maybe faster?) memory.
A super variant on the consumer space may either use just 3GB modules, or actually do that and use some faster modules. Current product stack uses 28Gbps modules, but AFAIK there are 30Gbps+ modules already out there, doesn't the 5080 already makes use of one of those faster variants?
So yeah, bump memory capacity, make memory faster, and keep the prices the same or even give it some discount, in a similar fashion to what we saw with the 4080 Super.
www.nvidia.com/en-us/data-center/gb200-nvl72/?ncid=no-ncid
www.nvidia.com/en-us/data-center/gb300-nvl72/
As you can see, the raw perf numbers are pretty much the same. The major difference is the increased VRAM.
I also just noticed that the CPU is now using SOCAMM, interesting.
www.nvidia.com/en-us/data-center/dgx-gb300/
Anyhow, the ratio between FP32:FP16:FP8/INT8:FP4/INT4 is always 1:2:4:8. 360 PFlop FP8 sparse perf from the slides presented still match the numbers from GB200, so no difference in this regard.
It wouldn't make any sense for the perf ratio between data types to change. More memory and better networking is EXTREMELY relevant for larger models. GPU clusters often stay at 20~40% utilization due to the communication bottleneck. More memory means you can fit bigger models at the same number of nodes, or use way less nodes for the same model (which implies in less communication overhead).
It's the exact same thing between the H100 and H200, just a memory subsystem upgrade (although the H200 got both more memory AND extra memory bandwidth).
It seems that the slides were indeed misleading as the DGX specs don't have the preliminary indicator and match your argument. Thanks for linking it.