Blackwell is NVIDIA’s data-center GPU architecture that succeeds Hopper, named after mathematician David Blackwell. Its headline data-center parts are the B100 and B200 accelerators and the GB200 Grace-Blackwell superchip, which pairs Blackwell GPUs with a Grace Arm CPU over a high-bandwidth chip-to-chip link. The defining engineering move is the multi-die design: a Blackwell GPU joins two reticle-limited dies with a fast die-to-die interconnect so they behave as a single, cache-coherent accelerator. For anyone renting capacity, that translates into far more memory and tensor throughput per logical GPU than a single-die Hopper part could offer.

When you see Blackwell instances in the comparison above, you are looking at the current top tier of NVIDIA AI silicon, aimed squarely at trillion-parameter-scale training and very large-model inference rather than everyday experimentation.

Memory, bandwidth and interconnect

The traits that matter most when you rent a Blackwell GPU are its memory system and how the GPUs talk to each other:

HBM3e memory with very large per-GPU capacity. A B200 carries on the order of 180+ GB of HBM3e, dramatically more than the 80 GB common on Hopper-class H100 parts. That extra capacity lets bigger models, longer context windows and larger batches stay resident without aggressive sharding.
Multi-terabyte-per-second memory bandwidth. Blackwell’s HBM3e stack delivers bandwidth in the multiple-TB/s range per GPU, which is the single biggest reason it accelerates memory-bound inference and attention-heavy workloads.
Fifth-generation NVLink. Blackwell GPUs interconnect over the newest NVLink generation, with per-GPU NVLink bandwidth that materially exceeds Hopper’s. Combined with NVLink Switch fabric, this lets many GPUs in a rack act as one large memory-coherent pool, which is what makes very large training jobs scale efficiently.

Because of that interconnect, Blackwell is most often rented in tightly coupled multi-GPU forms (8-GPU nodes, and rack-scale NVLink domains in GB200 NVL configurations) rather than as isolated single cards. If your workload genuinely spans many GPUs, the fabric is a real differentiator; if you only need one GPU, much of Blackwell’s advantage goes unused.

Compute, precision support and the FP4 story

Blackwell’s second-generation Transformer Engine is built around low-precision math. It supports the established AI precisions and adds finer-grained ones:

FP4 and FP6 support, alongside FP8, BF16/FP16 and INT8. The new 4-bit floating-point path is the architecture’s signature inference feature, roughly doubling effective throughput and shrinking memory footprint versus FP8 when models tolerate it.
Second-generation Transformer Engine that manages mixed-precision and dynamic scaling automatically, so frameworks can exploit FP8/FP4 without hand-tuning every layer.
Decompression and security engines that help with data pipelines and confidential computing, relevant if you are renting for regulated or sensitive datasets.

The practical takeaway is that Blackwell’s quoted “exaflop-class” figures rely on these very low precisions. For FP4/FP8 inference of large language models the speedups are dramatic; for workloads locked to FP32 or FP64 (some scientific and engineering codes), the generational jump is smaller, and a Blackwell rental may be overkill relative to its cost.

Which rental workloads Blackwell fits

Blackwell earns its premium on a specific band of jobs:

Frontier and large-model training where many GPUs must share gradients fast — the NVLink fabric and HBM capacity shorten wall-clock time and reduce the number of nodes needed.
High-throughput LLM inference, especially serving very large models or long contexts at scale, where FP4/FP8 and big VRAM cut the GPU count per replica.
Large-batch and high-concurrency serving, where memory bandwidth and capacity raise tokens-per-second per dollar despite the higher hourly rate.

It is usually underused for single-GPU fine-tuning of small models, classic 3D rendering, or modest batch inference — workloads where an older data-center card or even a high-end workstation GPU delivers the result at a fraction of the cost. Match the card to the job rather than defaulting to the newest silicon.

Renting Blackwell: cost, scarcity and availability

Blackwell sits at the very top of the cloud GPU cost spectrum — above Hopper-class H100/H200 on a per-hour basis — because it is the newest, highest-density part and supply is still constrained. A few patterns are worth checking against the comparison above:

Availability is tight. Blackwell capacity is frequently sold through reservations, committed terms or waitlists rather than always-on, click-to-launch on-demand pools. Spot or interruptible Blackwell is rarer than for older generations.
It is usually sold in multiples. Expect 8-GPU nodes or rack-scale units; single-GPU Blackwell rentals are less common, so factor whole-node pricing into your math.
Judge it on throughput, not sticker price. Because Blackwell can replace several older GPUs for the right job, the relevant figure is cost per token or per training step, not the headline hourly rate. Use the live comparison above to confirm current pricing and what is actually in stock — these move quickly.

Frequently asked questions

How is Blackwell different from Hopper for renters?

Blackwell uses a dual-die design with more HBM3e capacity and bandwidth per GPU, faster fifth-generation NVLink, and adds FP4/FP6 precision support through a second-generation Transformer Engine. In practice that means higher throughput per GPU for large-model training and low-precision inference, at a higher hourly rate and tighter availability than Hopper-class H100/H200 instances.

Do I need Blackwell, or is an older GPU enough?

If your model fits comfortably on a single 80 GB GPU, or your job is small-scale fine-tuning, rendering, or low-concurrency inference, an older data-center card is usually more cost-effective. Blackwell pays off when you train very large models across many GPUs or serve large models at high throughput where its memory and FP4/FP8 speedups reduce the GPU count.

Can I rent a single Blackwell GPU?

Sometimes, but Blackwell is most often offered in multi-GPU nodes or rack-scale NVLink configurations because its biggest advantage is fast GPU-to-GPU communication. Check the comparison above for the smallest configuration each option offers, since whole-node pricing changes the effective cost.

Why does Blackwell pricing vary so much between options?

Newer architecture, constrained supply, differing reservation versus on-demand terms, and node size all push prices apart. Because rates and stock shift frequently, treat the live comparison above as the source of truth rather than any fixed dollar figure, and compare on cost per unit of work rather than per hour.

GB200 Superchip проти B200 проти B100 — найкращі варіанти з цього посібника

GB200 Superchip vs B200 vs B100
	GB200 Superchip Блеквелл · 384 GB	B200 Блеквелл · 192 GB	B100 Блеквелл · 192 GB
Характеристики
Виробник	NVIDIA	NVIDIA	NVIDIA
Архітектура	Блеквелл	Блеквелл	Блеквелл
Відеопам’ять	384 GB HBM3e	192 GB HBM3e	192 GB HBM3e
Пропускна здатність	16,000 GB/s	8,000 GB/s	8,000 GB/s
FP16 (Tensor)	4,500 TFLOPS	2,250 TFLOPS	1,750 TFLOPS
FP32	150 TFLOPS	75 TFLOPS	60 TFLOPS
TDP	2700 W	1000 W	700 W
Рік випуску	2024	2024	2024
Сегмент	Центр обробки даних	Центр обробки даних	Центр обробки даних
Хмарне ціноутворення
Найдешевше за запитом	—	$1.99/hr	—
Провайдери	0	2	0

Створіть власне порівняння GPU

Виберіть будь-які 2 GPU з цього посібника та відкрийте їх поруч.

GB200 Superchip NVIDIA · 384 GB B200 NVIDIA · 192 GB · $1.99/hr B100 NVIDIA · 192 GB RTX PRO 6000 NVIDIA · 96 GB · $1.71/hr RTX 5090 NVIDIA · 32 GB · $0.34/hr RTX 5080 NVIDIA · 16 GB RTX 5070 Ti NVIDIA · 16 GB

Порада: порівняння GPU відбуваються парами. Виберіть рівно 2 — якщо не виберете, ми відкриємо топ-2 з цього посібника.