Best Cloud GPU Providers with NVIDIA H100

The NVIDIA H100 is the industry standard for large-scale AI training and high-throughput inference. Built on the Hopper architecture with 80GB HBM3 memory and support for FP8 precision, the H100 delivers up to 4x the training performance of the A100. This guide lists cloud GPU providers that offer H100 instances, so you can compare pricing, availability, and multi-GPU configurations across platforms.

Updated June 2026 Showing 7 GPU providers H100
Trustpilot Rating
4.6
Trustpilot Reviews
2,427
+13 (7d) +47 (30d) +141 (90d)
HQ
DigitalOcean United StatesUnited States
Starting Price
$0.76/hr
Max VRAM
192 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
4.1
Trustpilot Reviews
237
+0 (7d) +8 (30d) +26 (90d)
HQ
Vast.ai United StatesUnited States
Starting Price
$0.06/hr
Max VRAM
192 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
3.7
Trustpilot Reviews
3
+0 (7d) +0 (30d) +0 (90d)
HQ
Latitude.sh BrazilBrazil
Starting Price
$0.35/hr
Max VRAM
96 GB
Max GPUs
8
Billing
Per-hour
Trustpilot Rating
3.4
Trustpilot Reviews
245
+1 (7d) +13 (30d) +36 (90d)
HQ
RunPod United StatesUnited States
Starting Price
$0.06/hr
Max VRAM
288 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
3.2
Trustpilot Reviews
1
+0 (7d) +0 (30d) +1 (90d)
HQ
Massed Compute United StatesUnited States
Starting Price
$0.35/hr
Max VRAM
141 GB
Max GPUs
8
Billing
Per-minute
Trustpilot Rating
2.9
Trustpilot Reviews
7
+0 (7d) +0 (30d) +2 (90d)
HQ
Novita AI United StatesUnited States
Starting Price
$0.11/hr
Max VRAM
80 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
1.7
Trustpilot Reviews
557
+1 (7d) +4 (30d) +18 (90d)
HQ
Vultr United StatesUnited States
Starting Price
$0.47/hr
Max VRAM
288 GB
Max GPUs
16
Billing
Per-hour

What the NVIDIA H100 actually is

The H100 is NVIDIA’s data-center accelerator built on the Hopper architecture, the generation that sits between the older Ampere A100 and the newer Blackwell parts. It is the GPU most teams reach for when they want serious large-model training or high-throughput inference without dropping all the way down to consumer cards. When you rent an H100 instance from the comparison above, you are renting a purpose-built AI accelerator rather than a repurposed gaming GPU, and that shapes both what it can do and what it costs.

The key hardware traits that matter for a renter:

  • Memory: the SXM5 variant ships with 80 GB of HBM3, while the PCIe variant uses HBM2e, also at 80 GB on the mainstream part. A later refresh, the H100 NVL, raises capacity per card. The large, fast HBM is the single biggest reason to pick this card over GDDR6-based options.
  • Memory bandwidth: HBM3 on the SXM5 part delivers roughly 3 TB/s, far above what consumer GDDR6/GDDR6X cards reach. Bandwidth, not raw FLOPs, is what keeps large transformer layers fed.
  • Tensor cores and precisions: fourth-generation tensor cores support FP16, BF16, TF32, INT8, and — new with Hopper — FP8. FP8 is the headline feature for modern LLM workloads, roughly doubling throughput versus FP16 on supported kernels while keeping accuracy acceptable with the right scaling.
  • Transformer Engine: Hopper pairs FP8 hardware with software that dynamically manages precision per layer, which is why H100 throughput on transformer training and inference can pull well ahead of the previous generation on the same model.
  • Interconnect: SXM5 boards use fourth-generation NVLink and NVSwitch for high-bandwidth GPU-to-GPU links inside a node, while PCIe cards rely on the PCIe bus (with optional NVLink bridges on some variants). This distinction matters enormously for multi-GPU jobs.
  • Power class: the SXM5 card is a roughly 700 W part in an 8-GPU server chassis; the PCIe card sits lower, around 350 W. You do not pay the power bill directly when renting, but it explains why these instances are dense, hot, and priced accordingly.

Which workloads the H100 genuinely fits

The H100 earns its keep on memory-bound, throughput-hungry jobs. It is a strong match for:

  • Large-model training and fine-tuning: 80 GB of HBM lets you hold bigger batches and larger parameter shards per GPU, and NVLink makes multi-GPU and multi-node scaling efficient for data- and tensor-parallel training.
  • High-throughput LLM inference: FP8 and the Transformer Engine make it excellent for serving large language models at high request volume, where you care about tokens per second per dollar.
  • Fine-tuning of mid-to-large models: full fine-tunes and parameter-efficient methods on multi-billion-parameter models fit comfortably where smaller-VRAM cards force aggressive offloading.
  • Scientific and HPC compute: strong FP64 throughput (unlike consumer cards, which are deliberately weak here) makes it viable for simulation and numerical work, not just AI.

It is overkill for small-model experimentation, light inference of compact models, classical ML, notebooks, and most rendering or visualization work — those run fine on far cheaper cards, and renting an H100 for them mostly wastes money. It is rarely underpowered for single-node work; the main thing that pushes teams past it is needing more aggregate memory than a single node of these cards provides, or wanting the newest-generation efficiency gains.

SXM vs PCIe — the variant you rent matters

This is the most overlooked detail when renting. SXM5 boards offer higher bandwidth, full NVLink/NVSwitch fabric, and higher sustained clocks, which is why serious multi-GPU training almost always uses them. PCIe cards are cheaper to host and fine for single-GPU inference or smaller jobs, but their inter-GPU links are slower. If the listing above does not state the variant, treat that as a question to resolve before committing to a multi-GPU training run.

Rental cost, availability, and scarcity

In the cloud GPU cost spectrum, the H100 sits near the top — above A100 and far above consumer cards like the RTX 4090 — though it has been displaced from the absolute ceiling by newer Blackwell-class hardware. Because exact rates move constantly and differ by provider, region, commitment term, and variant, you should read live numbers from the comparison above rather than trust any fixed figure. A few qualitative points hold steady:

  • On-demand vs spot: interruptible or spot H100 capacity can be substantially cheaper than on-demand, which suits checkpointed training and fault-tolerant batch inference, but is risky for long single-shot jobs without good checkpointing.
  • Scarcity: H100 supply has historically been tight, so availability — not just price — varies by region and provider. The cheapest listing is worthless if the capacity is sold out in your region.
  • Per-GPU vs full-node: many providers rent single H100s, but the best multi-GPU performance comes from full 8-GPU NVLink nodes; check whether you are getting fractional, single, or full-node access.
  • Billing granularity: per-second or per-minute billing favors short fine-tunes and bursty inference; hourly minimums favor long runs. Match the model to your workload pattern.

Frequently asked questions

How much VRAM does a cloud H100 have?

The mainstream H100 carries 80 GB of high-bandwidth memory — HBM3 on the SXM5 variant and HBM2e on the PCIe variant. A later H100 NVL refresh increases per-card capacity. Always confirm the exact variant and memory figure in the listing above, since that determines the largest model and batch size you can run without offloading.

Is the H100 worth renting over an A100?

For transformer training and inference, usually yes: Hopper’s FP8 support and Transformer Engine can deliver materially higher throughput per GPU than Ampere’s A100, which often offsets the higher hourly rate on a cost-per-token or cost-per-step basis. For small or memory-light jobs that do not exploit FP8, a cheaper A100 or consumer card can be the better value.

Should I choose the SXM5 or PCIe H100?

Pick SXM5 for multi-GPU training and tightly coupled distributed jobs, because its NVLink/NVSwitch fabric and higher bandwidth scale far better across GPUs. PCIe is fine and often cheaper for single-GPU inference or smaller workloads. If the variant is not specified, ask before booking a multi-GPU run.

Can I save money with spot or interruptible H100 instances?

Yes, interruptible capacity is typically much cheaper than on-demand, and it works well for training with frequent checkpoints or for batch inference that tolerates restarts. Avoid it for long, un-checkpointed jobs where a mid-run eviction would waste hours of paid compute. Compare both pricing modes in the table above.

DigitalOcean vs Vast.ai - Comparison of Top Firms in This Guide

DigitalOcean vs Vast.ai - GPU Provider Comparison (June 2026)

Head-to-head comparison of DigitalOcean and Vast.ai. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed June 2026.

Bottom Line: DigitalOcean vs Vast.ai

DigitalOcean and Vast.ai are closely matched — each leads in several categories, so the right pick depends on your priorities.

Where DigitalOcean leads

  • Trustpilot Rating (4.6 vs 4.1)
  • Regions (5 vs 2)
  • Frameworks (7 vs 5)
  • Kubernetes Support

Where Vast.ai leads

  • Starting Price ($/hr) ($0.06/hr vs $0.76/hr)
  • GPU Models (35 vs 6)
  • Spot/Preemptible

Choose DigitalOcean for Trustpilot Rating. Choose Vast.ai for Starting Price ($/hr).

Frequently Asked Questions

Is DigitalOcean or Vast.ai better?
It is close — DigitalOcean and Vast.ai each lead in several categories. Compare the points that matter most to you below.
Which has a better Trustpilot Rating, DigitalOcean or Vast.ai?
DigitalOcean (4.6 vs 4.1).
Which has a better Starting Price ($/hr), DigitalOcean or Vast.ai?
Vast.ai ($0.06/hr vs $0.76/hr).
DigitalOcean vs Vast.ai - GPU Provider Comparison (June 2026)
DigitalOcean
Simple, scalable GPU cloud for AI/ML
Visit DigitalOcean
Vast.ai
Instant GPUs. Transparent Pricing.
Visit Vast.ai
Overview
Trustpilot Rating 4.6 4.1
Headquarters United States United States
Provider Type N/A GPU Marketplace
Best For AI training inference fine-tuning LLM deployment LLM serving computer vision startups generative AI research AI training inference fine-tuning Stable Diffusion batch processing research LLM serving generative AI
GPU Hardware
GPU Models RTX 4000 Ada RTX 6000 Ada L40S MI300X H100 SXM H200 B200 H200 H100 SXM H100 NVL A100 SXM A100 PCIe RTX 5090 RTX 5080 RTX 5070 Ti RTX 6000 Pro RTX 6000 Ada RTX 4500 Ada RTX A6000 RTX A5000 RTX A4000 L40S L40 A40 A10 RTX 4090 RTX 4080 RTX 4070 Ti RTX 4070 RTX 4060 Ti RTX 4060 RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 Tesla V100 Tesla T4 A2 GTX 1080
Max VRAM (GB) 192 192
Max GPUs/Instance 8 8
Interconnect NVLink NVLink, InfiniBand
Pricing
Starting Price ($/hr) $0.76/hr $0.06/hr
Billing Granularity Per-second Per-second
Spot/Preemptible No Yes
Reserved Discounts N/A Up to 50% (1-6 month reserved)
Free Credits $200 free credit for 60 days Small test credit on signup
Egress Fees None (included in plan) Varies by host ($/TB)
Storage 500-720 GiB NVMe boot (included), 5 TiB NVMe scratch on larger configs, Volumes at $0.10/GiB/mo Varies by host ($/GB/hr, charged while instance exists)
Infrastructure
Regions New York (NYC2), Toronto (TOR1), Atlanta (ATL1), Richmond (RIC1), Amsterdam (AMS3) 500+ locations, 40+ data centers
Uptime SLA 99% No formal SLA (host reliability scores visible)
Developer Experience
Frameworks PyTorch TensorFlow Jupyter Miniconda CUDA ROCm Hugging Face PyTorch TensorFlow CUDA vLLM ComfyUI
Docker Support Yes Yes
SSH Access Yes Yes
Jupyter Notebooks Yes Yes
API / CLI Yes Yes
Setup Time Minutes Seconds
Kubernetes Support Yes No
Business Terms
Min Commitment None None
Compliance SOC 2 Type II SOC 3 HIPAA (with BAA) CSA STAR Level 1 SOC 2 Type 2 HIPAA GDPR CCPA
DigitalOcean Vast.ai

Build your own comparison

Select any 2-6 firms from this guide and open them in the full comparison table.

Tip: if you do not select any firms we will start with the top 2 from this guide.