Best Cloud GPU Providers with NVIDIA H200

The NVIDIA H200 builds on the H100 with 141GB of HBM3e memory and 2x the memory bandwidth, making it particularly effective for large language model inference where model weights must fit entirely in GPU memory. Fewer providers currently offer H200 instances, making availability a key differentiator. This guide helps you find and compare cloud GPU providers with H200 access.

Updated June 2026 Showing 6 GPU providers H200
Trustpilot Rating
4.6
Trustpilot Reviews
2,427
+13 (7d) +47 (30d) +142 (90d)
HQ
DigitalOcean United StatesUnited States
Starting Price
$0.76/hr
Max VRAM
192 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
4.1
Trustpilot Reviews
237
+0 (7d) +8 (30d) +26 (90d)
HQ
Vast.ai United StatesUnited States
Starting Price
$0.06/hr
Max VRAM
192 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
3.7
Trustpilot Reviews
3
+0 (7d) +0 (30d) +0 (90d)
HQ
Latitude.sh BrazilBrazil
Starting Price
$0.35/hr
Max VRAM
96 GB
Max GPUs
8
Billing
Per-hour
Trustpilot Rating
3.4
Trustpilot Reviews
245
+1 (7d) +13 (30d) +36 (90d)
HQ
RunPod United StatesUnited States
Starting Price
$0.06/hr
Max VRAM
288 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
3.2
Trustpilot Reviews
1
+0 (7d) +0 (30d) +1 (90d)
HQ
Massed Compute United StatesUnited States
Starting Price
$0.35/hr
Max VRAM
141 GB
Max GPUs
8
Billing
Per-minute
Trustpilot Rating
1.7
Trustpilot Reviews
557
+1 (7d) +4 (30d) +19 (90d)
HQ
Vultr United StatesUnited States
Starting Price
$0.47/hr
Max VRAM
288 GB
Max GPUs
16
Billing
Per-hour

What the NVIDIA H200 actually is

The H200 is NVIDIA’s memory-upgraded member of the Hopper data-center generation, the same architecture family as the H100. It keeps the Hopper GPU compute engine but pairs it with a substantially larger and faster memory subsystem, which is the single most important reason people rent it specifically rather than settling for an H100. If your workload is bottlenecked on how much model and KV-cache you can hold in one GPU’s memory, the H200 is the card that directly addresses that pain point without forcing you to jump to a newer, scarcer Blackwell-class part.

Because it shares Hopper’s compute design, the per-clock math throughput is in the same ballpark as the H100. The H200 is not primarily a “more FLOPS” upgrade. It is a “more memory and more memory bandwidth” upgrade, and that distinction should drive whether you filter the comparison above for it.

Hardware characteristics that matter when renting

  • Memory capacity: the H200 ships with 141 GB of HBM3e, a large jump over the 80 GB of HBM3 on the standard H100. That extra headroom lets a single GPU hold larger weights, longer context windows, and bigger inference batches before you are forced into multi-GPU sharding.
  • Memory bandwidth: HBM3e pushes aggregate bandwidth into the multiple-terabytes-per-second range, meaningfully above the H100’s HBM3. For memory-bound inference and any workload that streams large tensors, this bandwidth is often the real performance multiplier, not raw tensor-core peak.
  • Compute and precisions: as a Hopper part it carries fourth-generation Tensor Cores with the Transformer Engine, and supports FP16, BF16, INT8, TF32, and importantly FP8. FP8 matters because it lets you push throughput and fit larger effective batch sizes for both training and inference when your framework supports it.
  • Interconnect: the SXM form factor exposes fourth-generation NVLink and, in an 8-GPU server, NVSwitch, giving high-bandwidth GPU-to-GPU communication. This is what makes tensor-parallel and pipeline-parallel jobs scale well across 2, 4, or 8 GPUs. A PCIe variant exists but offers lower inter-GPU bandwidth, so check which form factor an instance in the list above is actually using.
  • Power and thermal class: this is a roughly 700 W-class data-center accelerator that lives in air- or liquid-cooled servers. You will never run one outside a proper data center, which is exactly why renting is the sensible path for almost everyone.

How the bigger memory changes the math

The practical effect of 141 GB is that models which previously needed two or more H100s can sometimes fit on a single H200, or run with far more comfortable batch sizes and longer context. For large language model inference, the key-value cache grows with sequence length and concurrency; more VRAM directly buys you longer prompts and more simultaneous users per GPU. That can translate into needing fewer GPUs overall, which is often where the real cost saving shows up even when the per-hour rate is higher.

Which workloads the H200 genuinely fits

  • High-throughput LLM inference: this is the H200’s sweet spot. Memory-bound serving of large models benefits directly from both the capacity and the bandwidth, and FP8 helps you squeeze more tokens per second.
  • Fine-tuning and training of mid-to-large models: the extra memory reduces the need for aggressive offloading, gradient checkpointing, or sharding, simplifying your training recipe.
  • Long-context and large-batch jobs: anything where you keep hitting out-of-memory errors on 80 GB cards is a natural candidate.
  • Memory-bound HPC and scientific computing: workloads dominated by data movement rather than pure FP64 throughput can benefit from the bandwidth uplift.

Where it is overkill or a poor fit: small models, light experimentation, classic computer-vision training that fits comfortably in 24-48 GB, most game-style rendering, and real-time inference of compact models. For those, a smaller or older card from the comparison above will be far cheaper and just as capable. Renting an H200 to serve a 7B model at low concurrency usually wastes most of what you are paying for.

Rental availability, cost spectrum, and scarcity

On the cost ladder, the H200 sits at the premium end, generally above the H100 and below the newest Blackwell-generation parts. It is a flagship-tier rental, so expect it to be one of the more expensive options in any list. Live per-hour rates move constantly and differ by region, commitment, and form factor, so use the comparison above for current numbers rather than any figure quoted in prose.

A few things worth checking before you commit:

  • On-demand vs interruptible: spot or preemptible H200 capacity can be markedly cheaper but may be reclaimed mid-job, which is fine for checkpointed training and batch inference but risky for stateful, long-running tasks.
  • Single GPU vs 8-GPU node: confirm whether you are renting one SXM GPU or a full NVLink/NVSwitch node, because multi-GPU scaling efficiency depends on that interconnect.
  • Scarcity: as a sought-after, recent part, H200 availability fluctuates and capacity can be regional. If a configuration shows as available in the list above, that availability can be time-sensitive.
  • Form factor: SXM versus PCIe changes both bandwidth and multi-GPU behavior, so it is not just a packaging detail.

Frequently asked questions

How much memory does the NVIDIA H200 have, and why does it matter?

The H200 has 141 GB of HBM3e, compared with 80 GB on the standard H100. That larger capacity lets a single GPU hold bigger models, longer context windows, and larger inference batches, often reducing how many GPUs you need to rent for a given job.

Is the H200 faster than the H100?

For pure compute, the H200 is in the same Hopper performance class as the H100; the gains come from much higher memory bandwidth and capacity. So for memory-bound workloads like large-model inference you will often see meaningful speedups, while compute-bound tasks may look similar.

When is renting an H200 worth the premium over cheaper GPUs?

It is worth it when your workload is constrained by VRAM or memory bandwidth, such as serving large language models at high concurrency or fine-tuning models that do not fit comfortably on 80 GB. For small models or light experimentation, a cheaper card from the comparison above is the better value.

Can I scale across multiple H200 GPUs?

Yes. SXM-based H200 servers use fourth-generation NVLink and NVSwitch for high-bandwidth GPU-to-GPU communication, which makes tensor- and pipeline-parallel jobs scale efficiently. Verify the instance in the list above is the SXM form factor and a full NVLink node if multi-GPU scaling matters to you.

DigitalOcean vs Vast.ai - Comparison of Top Firms in This Guide

DigitalOcean vs Vast.ai - GPU Provider Comparison (June 2026)

Head-to-head comparison of DigitalOcean and Vast.ai. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed June 2026.

Bottom Line: DigitalOcean vs Vast.ai

DigitalOcean and Vast.ai are closely matched — each leads in several categories, so the right pick depends on your priorities.

Where DigitalOcean leads

  • Trustpilot Rating (4.6 vs 4.1)
  • Regions (5 vs 2)
  • Frameworks (7 vs 5)
  • Kubernetes Support

Where Vast.ai leads

  • Starting Price ($/hr) ($0.06/hr vs $0.76/hr)
  • GPU Models (35 vs 6)
  • Spot/Preemptible

Choose DigitalOcean for Trustpilot Rating. Choose Vast.ai for Starting Price ($/hr).

Frequently Asked Questions

Is DigitalOcean or Vast.ai better?
It is close — DigitalOcean and Vast.ai each lead in several categories. Compare the points that matter most to you below.
Which has a better Trustpilot Rating, DigitalOcean or Vast.ai?
DigitalOcean (4.6 vs 4.1).
Which has a better Starting Price ($/hr), DigitalOcean or Vast.ai?
Vast.ai ($0.06/hr vs $0.76/hr).
DigitalOcean vs Vast.ai - GPU Provider Comparison (June 2026)
DigitalOcean
Simple, scalable GPU cloud for AI/ML
Visit DigitalOcean
Vast.ai
Instant GPUs. Transparent Pricing.
Visit Vast.ai
Overview
Trustpilot Rating 4.6 4.1
Headquarters United States United States
Provider Type N/A GPU Marketplace
Best For AI training inference fine-tuning LLM deployment LLM serving computer vision startups generative AI research AI training inference fine-tuning Stable Diffusion batch processing research LLM serving generative AI
GPU Hardware
GPU Models RTX 4000 Ada RTX 6000 Ada L40S MI300X H100 SXM H200 B200 H200 H100 SXM H100 NVL A100 SXM A100 PCIe RTX 5090 RTX 5080 RTX 5070 Ti RTX 6000 Pro RTX 6000 Ada RTX 4500 Ada RTX A6000 RTX A5000 RTX A4000 L40S L40 A40 A10 RTX 4090 RTX 4080 RTX 4070 Ti RTX 4070 RTX 4060 Ti RTX 4060 RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 Tesla V100 Tesla T4 A2 GTX 1080
Max VRAM (GB) 192 192
Max GPUs/Instance 8 8
Interconnect NVLink NVLink, InfiniBand
Pricing
Starting Price ($/hr) $0.76/hr $0.06/hr
Billing Granularity Per-second Per-second
Spot/Preemptible No Yes
Reserved Discounts N/A Up to 50% (1-6 month reserved)
Free Credits $200 free credit for 60 days Small test credit on signup
Egress Fees None (included in plan) Varies by host ($/TB)
Storage 500-720 GiB NVMe boot (included), 5 TiB NVMe scratch on larger configs, Volumes at $0.10/GiB/mo Varies by host ($/GB/hr, charged while instance exists)
Infrastructure
Regions New York (NYC2), Toronto (TOR1), Atlanta (ATL1), Richmond (RIC1), Amsterdam (AMS3) 500+ locations, 40+ data centers
Uptime SLA 99% No formal SLA (host reliability scores visible)
Developer Experience
Frameworks PyTorch TensorFlow Jupyter Miniconda CUDA ROCm Hugging Face PyTorch TensorFlow CUDA vLLM ComfyUI
Docker Support Yes Yes
SSH Access Yes Yes
Jupyter Notebooks Yes Yes
API / CLI Yes Yes
Setup Time Minutes Seconds
Kubernetes Support Yes No
Business Terms
Min Commitment None None
Compliance SOC 2 Type II SOC 3 HIPAA (with BAA) CSA STAR Level 1 SOC 2 Type 2 HIPAA GDPR CCPA
DigitalOcean Vast.ai

Build your own comparison

Select any 2-6 firms from this guide and open them in the full comparison table.

Tip: if you do not select any firms we will start with the top 2 from this guide.