Best Cloud GPU Providers with NVIDIA GH200

The NVIDIA GH200 Grace Hopper Superchip combines an ARM-based Grace CPU with an H100 GPU on a single module, connected via NVLink-C2C with 900 GB/s bandwidth. With 96GB HBM3 GPU memory plus up to 480GB LPDDR5x CPU memory, the GH200 is optimized for workloads that require massive unified memory, such as large-scale recommendation models and graph neural networks. This guide lists providers offering GH200 instances.

Updated July 2026 Showing 2 GPU providers GH200

Trustpilot Rating

3.7

Trustpilot Reviews

+0 (7d) +0 (30d) +0 (90d)

Starting Price

$0.35/hr

Max VRAM

96 GB

Max GPUs

Billing

Per-hour

Compare

🌐 Visit Website

Trustpilot Rating

1.7

Trustpilot Reviews

557

+1 (7d) +4 (30d) +18 (90d)

Starting Price

$0.47/hr

Max VRAM

288 GB

Max GPUs

Billing

Per-hour

Compare

🌐 Visit Website

What the NVIDIA GH200 Grace Hopper Superchip actually is

The GH200 is not a conventional add-in GPU card. It is a superchip that fuses an NVIDIA Hopper-generation H100-class GPU with a 72-core Arm Neoverse-based Grace CPU on a single module, joined by NVLink-C2C — a coherent chip-to-chip interconnect that links the CPU and GPU at far higher bandwidth than a standard PCIe slot. When you rent a GH200 instance from the comparison above, you are renting that whole CPU+GPU package, not just a GPU you bolt onto someone else’s host CPU.

That architecture is the whole point. Because the Grace CPU and Hopper GPU share a coherent memory space over NVLink-C2C, the GPU can reach into the CPU’s large LPDDR5X memory pool without the usual PCIe bottleneck. For workloads where a model or dataset spills past the GPU’s own high-bandwidth memory, this changes what is practical to run on a single node.

The hardware that matters when you rent it

GPU memory: the Hopper GPU side carries HBM3 in the original GH200 and HBM3e in the higher-capacity variant. You will see configurations advertised around 96 GB of HBM3 and around 141 GB of HBM3e per superchip — check the exact figure in the list above, because the memory capacity is the single biggest differentiator between GH200 SKUs.
System memory: the Grace CPU adds a large bank of LPDDR5X (commonly cited around 480 GB), coherently accessible to the GPU. Combined with HBM, a single GH200 exposes a very large unified address space — useful for big embeddings, KV caches, and graph or recommendation workloads.
Memory bandwidth: HBM3/HBM3e delivers multiple terabytes per second to the GPU, while NVLink-C2C provides hundreds of GB/s of coherent CPU-GPU bandwidth — roughly an order of magnitude beyond a PCIe 5.0 x16 link.
Compute and precision: as a Hopper part, the GPU includes fourth-generation Tensor Cores and the Transformer Engine, with hardware support for FP8, BF16, FP16, TF32, INT8, and FP64. FP8 in particular is what makes Hopper attractive for both LLM training and high-throughput inference.
Multi-GPU scaling: GH200 modules can be linked with NVLink and, in NVL32-style rack designs, NVLink Switch fabric, so multiple superchips behave closer to one large accelerator than a loose PCIe cluster.
Power and thermal class: this is a data-center-only, high-TDP module (commonly configurable up to roughly 1000 W including the CPU). It is liquid- or heavily air-cooled in the host, which is why availability is tied to specialized hosts rather than commodity servers.

Workloads the GH200 genuinely fits

The GH200 earns its place where the coherent CPU-GPU memory and HBM bandwidth do real work:

Large-model and long-context inference: big KV caches and large weights benefit from the combination of HBM plus the spillover into Grace’s LPDDR5X, letting a single node serve models that would otherwise need sharding across several discrete GPUs.
LLM fine-tuning and training: FP8 and BF16 via the Transformer Engine make it strong for transformer training and parameter-efficient fine-tuning, especially when datasets or optimizer states are memory-hungry.
Recommendation, GNN, and vector workloads: anything that streams large tables or graphs between CPU and GPU memory benefits directly from NVLink-C2C coherence.
HPC and scientific computing: native FP64 plus tight CPU-GPU coupling suits simulation and data-analytics pipelines that mix scalar CPU work with GPU kernels.

It is overkill for small-model experimentation, classic computer-vision fine-tuning that fits comfortably in 24–48 GB, batch jobs that are not memory-bound, and most real-time inference of small models — a cheaper Ada or Ampere card from the broader catalog will be more cost-effective there. The GH200 is also a poor fit for graphics, gaming, or workstation rendering: it has no display outputs and is built for compute, not rasterization.

Rental context: cost, availability, and what to compare

The GH200 sits near the top of the cloud-GPU cost spectrum, in the same premium tier as H100-class instances, because you are paying for HBM, a bundled Grace CPU, and high-bandwidth interconnect. Supply is genuinely scarce relative to demand, so it tends to appear on AI-focused and specialist hosts rather than every general-purpose cloud, and on-demand capacity can sell out. Spot or interruptible pricing exists in places but is less common and less deep than for older accelerators. For live, provider-specific rates, rely on the comparison above rather than any fixed figure, since pricing moves and varies by region and commitment term.

When you read the table, weigh these dimensions:

HBM capacity (96 GB HBM3 vs ~141 GB HBM3e) — this dictates the largest model you can hold resident.
On-demand vs reserved vs spot availability, and minimum commitment.
Interconnect topology for multi-node training (NVLink fabric vs plain Ethernet/InfiniBand between nodes).
Storage throughput and egress fees, which often dominate total cost for data-heavy training.
Billing granularity (per-second vs per-hour) if your jobs are short or bursty.

Frequently asked questions

Is the GH200 the same as an H100?

Not quite. The GH200 contains a Hopper GPU closely related to the H100, but it ships as a superchip combined with a 72-core Grace Arm CPU and NVLink-C2C coherent interconnect. A standalone H100 is just the GPU paired with whatever x86 host the provider supplies, so the GH200’s advantage is the tightly coupled, large coherent CPU-GPU memory pool.

How much memory does a GH200 have?

It depends on the variant. The GPU side carries roughly 96 GB of HBM3 or about 141 GB of HBM3e, and the Grace CPU adds a large LPDDR5X bank (commonly cited near 480 GB) that the GPU can access coherently. Confirm the exact HBM figure in the list above, as it varies by SKU and is the spec that limits model size.

What precisions does the GH200 support for AI?

Being a Hopper part, it supports FP8, BF16, FP16, TF32, INT8, and FP64 through fourth-generation Tensor Cores and the Transformer Engine. FP8 is the headline feature for efficient large-model training and high-throughput inference.

When is renting a GH200 not worth it?

If your model fits comfortably in 24–48 GB, your workload is not memory-bandwidth bound, or you only need small-model real-time inference, a cheaper Ada or Ampere instance will deliver better value. The GH200 pays off specifically when large memory, HBM bandwidth, or coherent CPU-GPU data movement is the bottleneck.

Latitude.sh vs Vultr - Comparison of Top Firms in This Guide

Latitude.sh vs Vultr - GPU Provider Comparison (July 2026)

Head-to-head comparison of Latitude.sh and Vultr. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed July 2026.

Bottom Line: Latitude.sh vs Vultr

Vultr comes out ahead overall, leading in 9 of 12 compared categories.

Where Latitude.sh leads

Trustpilot Rating (3.7 vs 1.7)
Starting Price ($/hr) ($0.35/hr vs $0.47/hr)
Regions (8 vs 5)

Where Vultr leads

Max VRAM (GB) (288 vs 96)
Uptime SLA (100% vs 99.9%)
Max GPUs/Instance (16 vs 8)
GPU Models (12 vs 9)
Spot/Preemptible
Frameworks (7 vs 4)

Choose Latitude.sh for AI training, inference, bare metal GPU. Choose Vultr for AI training, inference, video rendering.

Frequently Asked Questions

Is Latitude.sh or Vultr better?

Vultr leads in 9 of 12 compared categories. The right choice still depends on the factors that matter most to you.

Which has a better Trustpilot Rating, Latitude.sh or Vultr?

Latitude.sh (3.7 vs 1.7).

Which has a better Starting Price ($/hr), Latitude.sh or Vultr?

Latitude.sh ($0.35/hr vs $0.47/hr).

Latitude.sh vs Vultr - GPU Provider Comparison (July 2026)
	Latitude.sh Bare metal GPU cloud across 23 global locations Visit Latitude.sh	Vultr High-performance cloud GPU across 32 global regions Visit Vultr
Overview
Trustpilot Rating	3.7	1.7
Headquarters	Brazil	United States
Provider Type	Bare Metal	Multi-Cloud
Best For	AI training inference bare metal GPU fine-tuning research dedicated workloads generative AI	AI training inference video rendering HPC Stable Diffusion game development generative AI fine-tuning research
GPU Hardware
GPU Models	A30 RTX A5000 RTX A6000 L40S RTX 6000 Ada A100 SXM H100 SXM GH200 RTX PRO 6000	A16 A40 L40S A100 PCIe GH200 A100 SXM H100 SXM B200 B300 MI300X MI325X MI355X
Max VRAM (GB)	96	288
Max GPUs/Instance	8	16
Interconnect	NVLink	NVLink
Pricing
Starting Price ($/hr)	$0.35/hr	$0.47/hr
Billing Granularity	Per-hour	Per-hour
Spot/Preemptible	No	Yes
Reserved Discounts	N/A	N/A
Free Credits	$200 via referral program	Up to $300 free credit for 30 days
Egress Fees	None	Standard (varies by plan)
Storage	Local NVMe included (up to 4x 3.8TB), Block Storage $0.10/GB/mo, Filesystem Storage $0.05/GB/mo	350 GB - 61 TB NVMe (included), Block Storage at $0.10/GB/mo, S3-compatible Object Storage
Infrastructure
Regions	23 locations: US (8 cities), LATAM (5), Europe (5), APAC (4), Mexico City. GPU in Dallas, Frankfurt, Sydney, Tokyo	32 regions across 6 continents (Americas, Europe, Asia, Australia, Africa)
Uptime SLA	99.9%	100%
Developer Experience
Frameworks	ML-optimized images PyTorch TensorFlow (user-installed) CUDA	PyTorch TensorFlow CUDA cuDNN ROCm Hugging Face NVIDIA NGC
Docker Support	Yes	Yes
SSH Access	Yes	Yes
Jupyter Notebooks	No	Yes
API / CLI	Yes	Yes
Setup Time	Seconds	Minutes
Kubernetes Support	No	Yes
Business Terms
Min Commitment	None	None
Compliance	Single-tenant isolation DPA available	SOC 2+ (HIPAA) PCI ISO 27001 ISO 27017 ISO 27018 ISO 20000-1 CSA STAR Level 1

Latitude.sh

Vultr

Build your own comparison

Select any 2-6 firms from this guide and open them in the full comparison table.

Latitude.sh Rating 3.7 | Brazil Vultr Rating 1.7 | United States

Tip: if you do not select any firms we will start with the top 2 from this guide.