How well does NVIDIA GB200 Superchip scale across multiple GPUs?

Resposta

NVIDIA GB200 Superchip performance headline: 4,500 FP16 TFLOPS, 150 FP32 TFLOPS, 16,000 GB/s bandwidth, 384 GB VRAM.

Converted into practical benchmarks: model training a 7B-parameter LLM in FP16 with reasonable batch sizes typically saturates compute before bandwidth; real-time serving on the same model is usually bandwidth-bound and tracks the 16,000 GB/s figure. Diffusion image generation benchmarks sit between the two — compute-heavy steps utilise tensor cores well, while attention blocks still touch bandwidth.

See the NVIDIA GB200 Superchip page for the full spec sheet and comparisons to related GPUs.

Mais FAQs sobre NVIDIA GB200 Superchip

Explore NVIDIA GB200 Superchip