Best Cloud GPU Providers with NVIDIA GH200
The NVIDIA GH200 Grace Hopper Superchip combines an ARM-based Grace CPU with an H100 GPU on a single module, connected via NVLink-C2C with 900 GB/s bandwidth. With 96GB HBM3 GPU memory plus up to 480GB LPDDR5x CPU memory, the GH200 is optimized for workloads that require massive unified memory, such as large-scale recommendation models and graph neural networks. This guide lists providers offering GH200 instances.
Brazil
United States What the NVIDIA GH200 Grace Hopper Superchip actually is
The GH200 is not a conventional add-in GPU card. It is a superchip that fuses an NVIDIA Hopper-generation H100-class GPU with a 72-core Arm Neoverse-based Grace CPU on a single module, joined by NVLink-C2C — a coherent chip-to-chip interconnect that links the CPU and GPU at far higher bandwidth than a standard PCIe slot. When you rent a GH200 instance from the comparison above, you are renting that whole CPU+GPU package, not just a GPU you bolt onto someone else’s host CPU.
That architecture is the whole point. Because the Grace CPU and Hopper GPU share a coherent memory space over NVLink-C2C, the GPU can reach into the CPU’s large LPDDR5X memory pool without the usual PCIe bottleneck. For workloads where a model or dataset spills past the GPU’s own high-bandwidth memory, this changes what is practical to run on a single node.
The hardware that matters when you rent it
- GPU memory: the Hopper GPU side carries HBM3 in the original GH200 and HBM3e in the higher-capacity variant. You will see configurations advertised around 96 GB of HBM3 and around 141 GB of HBM3e per superchip — check the exact figure in the list above, because the memory capacity is the single biggest differentiator between GH200 SKUs.
- System memory: the Grace CPU adds a large bank of LPDDR5X (commonly cited around 480 GB), coherently accessible to the GPU. Combined with HBM, a single GH200 exposes a very large unified address space — useful for big embeddings, KV caches, and graph or recommendation workloads.
- Memory bandwidth: HBM3/HBM3e delivers multiple terabytes per second to the GPU, while NVLink-C2C provides hundreds of GB/s of coherent CPU-GPU bandwidth — roughly an order of magnitude beyond a PCIe 5.0 x16 link.
- Compute and precision: as a Hopper part, the GPU includes fourth-generation Tensor Cores and the Transformer Engine, with hardware support for FP8, BF16, FP16, TF32, INT8, and FP64. FP8 in particular is what makes Hopper attractive for both LLM training and high-throughput inference.
- Multi-GPU scaling: GH200 modules can be linked with NVLink and, in NVL32-style rack designs, NVLink Switch fabric, so multiple superchips behave closer to one large accelerator than a loose PCIe cluster.
- Power and thermal class: this is a data-center-only, high-TDP module (commonly configurable up to roughly 1000 W including the CPU). It is liquid- or heavily air-cooled in the host, which is why availability is tied to specialized hosts rather than commodity servers.
Workloads the GH200 genuinely fits
The GH200 earns its place where the coherent CPU-GPU memory and HBM bandwidth do real work:
- Large-model and long-context inference: big KV caches and large weights benefit from the combination of HBM plus the spillover into Grace’s LPDDR5X, letting a single node serve models that would otherwise need sharding across several discrete GPUs.
- LLM fine-tuning and training: FP8 and BF16 via the Transformer Engine make it strong for transformer training and parameter-efficient fine-tuning, especially when datasets or optimizer states are memory-hungry.
- Recommendation, GNN, and vector workloads: anything that streams large tables or graphs between CPU and GPU memory benefits directly from NVLink-C2C coherence.
- HPC and scientific computing: native FP64 plus tight CPU-GPU coupling suits simulation and data-analytics pipelines that mix scalar CPU work with GPU kernels.
It is overkill for small-model experimentation, classic computer-vision fine-tuning that fits comfortably in 24–48 GB, batch jobs that are not memory-bound, and most real-time inference of small models — a cheaper Ada or Ampere card from the broader catalog will be more cost-effective there. The GH200 is also a poor fit for graphics, gaming, or workstation rendering: it has no display outputs and is built for compute, not rasterization.
Rental context: cost, availability, and what to compare
The GH200 sits near the top of the cloud-GPU cost spectrum, in the same premium tier as H100-class instances, because you are paying for HBM, a bundled Grace CPU, and high-bandwidth interconnect. Supply is genuinely scarce relative to demand, so it tends to appear on AI-focused and specialist hosts rather than every general-purpose cloud, and on-demand capacity can sell out. Spot or interruptible pricing exists in places but is less common and less deep than for older accelerators. For live, provider-specific rates, rely on the comparison above rather than any fixed figure, since pricing moves and varies by region and commitment term.
When you read the table, weigh these dimensions:
- HBM capacity (96 GB HBM3 vs ~141 GB HBM3e) — this dictates the largest model you can hold resident.
- On-demand vs reserved vs spot availability, and minimum commitment.
- Interconnect topology for multi-node training (NVLink fabric vs plain Ethernet/InfiniBand between nodes).
- Storage throughput and egress fees, which often dominate total cost for data-heavy training.
- Billing granularity (per-second vs per-hour) if your jobs are short or bursty.
Frequently asked questions
Is the GH200 the same as an H100?
Not quite. The GH200 contains a Hopper GPU closely related to the H100, but it ships as a superchip combined with a 72-core Grace Arm CPU and NVLink-C2C coherent interconnect. A standalone H100 is just the GPU paired with whatever x86 host the provider supplies, so the GH200’s advantage is the tightly coupled, large coherent CPU-GPU memory pool.
How much memory does a GH200 have?
It depends on the variant. The GPU side carries roughly 96 GB of HBM3 or about 141 GB of HBM3e, and the Grace CPU adds a large LPDDR5X bank (commonly cited near 480 GB) that the GPU can access coherently. Confirm the exact HBM figure in the list above, as it varies by SKU and is the spec that limits model size.
What precisions does the GH200 support for AI?
Being a Hopper part, it supports FP8, BF16, FP16, TF32, INT8, and FP64 through fourth-generation Tensor Cores and the Transformer Engine. FP8 is the headline feature for efficient large-model training and high-throughput inference.
When is renting a GH200 not worth it?
If your model fits comfortably in 24–48 GB, your workload is not memory-bandwidth bound, or you only need small-model real-time inference, a cheaper Ada or Ampere instance will deliver better value. The GH200 pays off specifically when large memory, HBM bandwidth, or coherent CPU-GPU data movement is the bottleneck.
Latitude.sh vs Vultr - Comparison of Top Firms in This Guide
Latitude.sh vs Vultr - GPU Provider Comparison (July 2026)
Head-to-head comparison of Latitude.sh and Vultr. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed July 2026.
Bottom Line: Latitude.sh vs Vultr
Vultr comes out ahead overall, leading in 9 of 12 compared categories.
Where Latitude.sh leads
- Trustpilot Rating (3.7 vs 1.7)
- Starting Price ($/hr) ($0.35/hr vs $0.47/hr)
- Regions (8 vs 5)
Where Vultr leads
- Max VRAM (GB) (288 vs 96)
- Uptime SLA (100% vs 99.9%)
- Max GPUs/Instance (16 vs 8)
- GPU Models (12 vs 9)
- Spot/Preemptible
- Frameworks (7 vs 4)
Choose Latitude.sh for AI training, inference, bare metal GPU. Choose Vultr for AI training, inference, video rendering.
Frequently Asked Questions
Is Latitude.sh or Vultr better?
Which has a better Trustpilot Rating, Latitude.sh or Vultr?
Which has a better Starting Price ($/hr), Latitude.sh or Vultr?
|
Latitude.sh
Bare metal GPU cloud across 23 global locations
|
Vultr
High-performance cloud GPU across 32 global regions
|
|
|---|---|---|
| Overview | ||
| Trustpilot Rating | 3.7 | 1.7 |
| Headquarters | Brazil | United States |
| Provider Type | Bare Metal | Multi-Cloud |
| Best For | AI training inference bare metal GPU fine-tuning research dedicated workloads generative AI | AI training inference video rendering HPC Stable Diffusion game development generative AI fine-tuning research |
| GPU Hardware | ||
| GPU Models | A30 RTX A5000 RTX A6000 L40S RTX 6000 Ada A100 SXM H100 SXM GH200 RTX PRO 6000 | A16 A40 L40S A100 PCIe GH200 A100 SXM H100 SXM B200 B300 MI300X MI325X MI355X |
| Max VRAM (GB) | 96 | 288 |
| Max GPUs/Instance | 8 | 16 |
| Interconnect | NVLink | NVLink |
| Pricing | ||
| Starting Price ($/hr) | $0.35/hr | $0.47/hr |
| Billing Granularity | Per-hour | Per-hour |
| Spot/Preemptible | No | Yes |
| Reserved Discounts | N/A | N/A |
| Free Credits | $200 via referral program | Up to $300 free credit for 30 days |
| Egress Fees | None | Standard (varies by plan) |
| Storage | Local NVMe included (up to 4x 3.8TB), Block Storage $0.10/GB/mo, Filesystem Storage $0.05/GB/mo | 350 GB - 61 TB NVMe (included), Block Storage at $0.10/GB/mo, S3-compatible Object Storage |
| Infrastructure | ||
| Regions | 23 locations: US (8 cities), LATAM (5), Europe (5), APAC (4), Mexico City. GPU in Dallas, Frankfurt, Sydney, Tokyo | 32 regions across 6 continents (Americas, Europe, Asia, Australia, Africa) |
| Uptime SLA | 99.9% | 100% |
| Developer Experience | ||
| Frameworks | ML-optimized images PyTorch TensorFlow (user-installed) CUDA | PyTorch TensorFlow CUDA cuDNN ROCm Hugging Face NVIDIA NGC |
| Docker Support | Yes | Yes |
| SSH Access | Yes | Yes |
| Jupyter Notebooks | No | Yes |
| API / CLI | Yes | Yes |
| Setup Time | Seconds | Minutes |
| Kubernetes Support | No | Yes |
| Business Terms | ||
| Min Commitment | None | None |
| Compliance | Single-tenant isolation DPA available | SOC 2+ (HIPAA) PCI ISO 27001 ISO 27017 ISO 27018 ISO 20000-1 CSA STAR Level 1 |
Latitude.sh
Vultr
Build your own comparison
Select any 2-6 firms from this guide and open them in the full comparison table.
Tip: if you do not select any firms we will start with the top 2 from this guide.