Best Cloud GPU Providers with NVIDIA A40

The NVIDIA A40 is a data center GPU with 48GB GDDR6 memory designed for visual computing and inference workloads. It supports hardware ray tracing and is commonly used for virtual desktop infrastructure (VDI), rendering, and AI inference at scale. This guide lists cloud GPU providers with A40 availability.

Updated June 2026 Showing 4 GPU providers A40

Trustpilot Rating

4.6

Trustpilot Reviews

146

+0 (7d) +1 (30d) +8 (90d)

Starting Price

$0.16/hr

Max VRAM

80 GB

Max GPUs

Billing

Per-hour

Compare

🌐 Visit Website

Trustpilot Rating

4.1

Trustpilot Reviews

237

+0 (7d) +8 (30d) +26 (90d)

Starting Price

$0.06/hr

Max VRAM

192 GB

Max GPUs

Billing

Per-second

Compare

🌐 Visit Website

Trustpilot Rating

3.4

Trustpilot Reviews

245

+1 (7d) +13 (30d) +36 (90d)

Starting Price

$0.06/hr

Max VRAM

288 GB

Max GPUs

Billing

Per-second

Compare

🌐 Visit Website

Trustpilot Rating

1.7

Trustpilot Reviews

557

+1 (7d) +4 (30d) +19 (90d)

Starting Price

$0.47/hr

Max VRAM

288 GB

Max GPUs

Billing

Per-hour

Compare

🌐 Visit Website

What the NVIDIA A40 actually is

The A40 is an Ampere-generation data-center GPU built on the GA102 die — the same silicon family that powers NVIDIA’s high-end Ampere workstation and consumer cards. It sits in NVIDIA’s professional visualization line rather than its compute-first A100 line, which shapes everything about how it rents and what it is good at. The headline feature for anyone renting it is memory capacity: the A40 carries 48 GB of GDDR6 with ECC, a large pool that lets you load sizeable models and datasets without resorting to multi-GPU sharding.

Because it uses GDDR6 rather than the HBM2e found on the A100, its raw memory bandwidth is lower — in the rough range of hundreds of GB/s rather than the multiple-terabytes-per-second that HBM cards deliver. That single difference is the most important thing to understand when reading the comparison above: the A40 gives you abundant VRAM at modest bandwidth, which is a very different value proposition from a bandwidth-bound HBM card.

Compute and precision support

The A40 includes Ampere’s third-generation Tensor Cores and second-generation RT cores. For AI work, the relevant precisions are well covered:

FP16 and BF16 for mixed-precision training and inference, with Tensor Core acceleration.
TF32, Ampere’s tensor format that speeds up FP32-style training math with minimal code changes.
INT8 and INT4 for quantized, high-throughput inference.
Structural sparsity support, which can roughly double effective Tensor Core throughput on models trained to exploit it.

What it does not have is FP8, which arrived with the later Hopper generation (H100) and the Ada and Blackwell families. If your workflow specifically targets FP8 training or inference kernels, the A40 is the wrong card and you should filter for a newer generation in the list above.

Interconnect, scaling and power

The A40 connects over PCIe Gen4 and supports NVLink to bridge a pair of cards, which can present a combined 96 GB pool for memory-hungry jobs. It does not scale across many GPUs with the same dense NVLink fabric (NVSwitch) that the SXM A100/H100 use, so it is best thought of as a strong single- or dual-GPU card rather than the foundation of an eight-way training cluster. It is a passively cooled, server-oriented board in roughly the 300 W class, which is part of why providers can pack it densely and offer it at attractive rates.

Which workloads the A40 genuinely fits

The A40’s profile — lots of VRAM, moderate bandwidth, full Ampere AI features, plus real RT cores — makes it a versatile mid-tier rental. It is a strong match for:

Fine-tuning and LoRA/QLoRA of mid-sized language models, where 48 GB lets you keep optimizer states and longer sequences resident.
High-throughput batch inference for models in the 7B–34B parameter range (especially quantized), where capacity matters more than peak bandwidth.
Rendering, 3D, virtual workstations and visualization, where the RT cores and large frame buffer are directly useful — this is the card’s home territory.
Computer vision, diffusion-model image generation, and general ML experimentation that fits comfortably in 48 GB.

It is overkill for tiny models or light dev work where a smaller, cheaper card would do, and it is underpowered for frontier-scale pretraining of very large models, where you want HBM bandwidth, FP8, and dense multi-GPU interconnect. For latency-critical real-time inference at extreme throughput, a newer inference-optimized card will usually win on tokens per second per dollar even if it has less VRAM.

Renting the A40: cost and availability context

In the rental market the A40 generally sits in the mid tier — meaningfully cheaper per hour than A100/H100-class HBM cards, but above entry-level consumer GPUs. Its appeal is the cost-per-gigabyte-of-VRAM: when your bottleneck is fitting the model in memory rather than crunching it at maximum bandwidth, the A40 is often the most economical way to get 48 GB. Because it is a mature, widely deployed card, on-demand availability tends to be good and it is less prone to the scarcity and waitlists that hit the newest accelerators.

Many providers also offer it on spot or interruptible tiers at a further discount, which suits checkpointed fine-tuning and batch jobs that tolerate restarts. Exact rates move constantly and differ by provider, region, and commitment, so use the comparison above for live per-hour pricing rather than any fixed figure — and when you compare, weigh on-demand versus spot, billing granularity, and whether NVLink pairing is offered if you need the 96 GB combined pool.

Frequently asked questions

How much VRAM does the NVIDIA A40 have?

The A40 has 48 GB of GDDR6 memory with ECC. That large capacity is its main draw for rental, letting you fine-tune mid-sized models or run quantized large-model inference on a single card.

Is the A40 good for training large language models?

It is well suited to fine-tuning and training mid-sized models thanks to its 48 GB pool, but it is not ideal for frontier-scale pretraining. For that you want HBM bandwidth, FP8 support, and dense multi-GPU interconnect found on newer Hopper or Blackwell cards — filter the list above for those if pretraining is your goal.

How does the A40 compare to the A100?

Both are Ampere, but the A100 uses high-bandwidth HBM2e memory for far greater bandwidth and supports dense NVSwitch scaling, making it stronger for large-scale training. The A40 trades that bandwidth for a large 48 GB GDDR6 pool plus RT cores, and rents for less — a better fit when capacity and cost matter more than peak throughput.

Does the A40 support NVLink for multi-GPU jobs?

Yes, two A40s can be bridged with NVLink to present a combined 96 GB memory pool, which helps for models that do not fit on a single card. It does not, however, scale across many GPUs with the dense fabric used by SXM data-center cards, so think of it as a single- or dual-GPU rental.

Cherry Servers vs Vast.ai - Comparison of Top Firms in This Guide

Cherry Servers vs Vast.ai - GPU Provider Comparison (June 2026)

Head-to-head comparison of Cherry Servers and Vast.ai. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed June 2026.

Bottom Line: Cherry Servers vs Vast.ai

Vast.ai comes out ahead overall, leading in 7 of 10 compared categories.

Where Cherry Servers leads

Trustpilot Rating (4.6 vs 4.1)
Regions (6 vs 2)
Kubernetes Support

Where Vast.ai leads

Starting Price ($/hr) ($0.06/hr vs $0.16/hr)
Max VRAM (GB) (192 vs 80)
Max GPUs/Instance (8 vs 2)
GPU Models (35 vs 6)
Spot/Preemptible
Frameworks (5 vs 3)

Choose Cherry Servers for Trustpilot Rating. Choose Vast.ai for Starting Price ($/hr).

Frequently Asked Questions

Is Cherry Servers or Vast.ai better?

Vast.ai leads in 7 of 10 compared categories. The right choice still depends on the factors that matter most to you.

Which has a better Trustpilot Rating, Cherry Servers or Vast.ai?

Cherry Servers (4.6 vs 4.1).

Which has a better Starting Price ($/hr), Cherry Servers or Vast.ai?

Vast.ai ($0.06/hr vs $0.16/hr).

Cherry Servers vs Vast.ai - GPU Provider Comparison (June 2026)
	Cherry Servers Bare metal GPU servers with 24 years of hosting experience and full hardware-level control. Visit Cherry Servers	Vast.ai Instant GPUs. Transparent Pricing. Visit Vast.ai
Overview
Trustpilot Rating	4.6	4.1
Headquarters	Lithuania	United States
Provider Type	N/A	GPU Marketplace
Best For	AI training inference fine-tuning rendering research HPC generative AI deep learning	AI training inference fine-tuning Stable Diffusion batch processing research LLM serving generative AI
GPU Hardware
GPU Models	A100 A40 A16 A10 A2 Tesla P4	B200 H200 H100 SXM H100 NVL A100 SXM A100 PCIe RTX 5090 RTX 5080 RTX 5070 Ti RTX 6000 Pro RTX 6000 Ada RTX 4500 Ada RTX A6000 RTX A5000 RTX A4000 L40S L40 A40 A10 RTX 4090 RTX 4080 RTX 4070 Ti RTX 4070 RTX 4060 Ti RTX 4060 RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 Tesla V100 Tesla T4 A2 GTX 1080
Max VRAM (GB)	80	192
Max GPUs/Instance	2	8
Interconnect	PCIe	NVLink, InfiniBand
Pricing
Starting Price ($/hr)	$0.16/hr	$0.06/hr
Billing Granularity	Per-hour	Per-second
Spot/Preemptible	No	Yes
Reserved Discounts	N/A	Up to 50% (1-6 month reserved)
Free Credits	None	Small test credit on signup
Egress Fees	N/A	Varies by host ($/TB)
Storage	NVMe SSD, Elastic Block Storage ($0.071/GB/mo)	Varies by host ($/GB/hr, charged while instance exists)
Infrastructure
Regions	Lithuania, Netherlands, Germany, Sweden, US, Singapore (6 locations)	500+ locations, 40+ data centers
Uptime SLA	99.97%	No formal SLA (host reliability scores visible)
Developer Experience
Frameworks	PyTorch TensorFlow CUDA (bare metal — full stack control)	PyTorch TensorFlow CUDA vLLM ComfyUI
Docker Support	Yes	Yes
SSH Access	Yes	Yes
Jupyter Notebooks	No	Yes
API / CLI	Yes	Yes
Setup Time	Minutes	Seconds
Kubernetes Support	Yes	No
Business Terms
Min Commitment	None	None
Compliance	ISO 27001 ISO 20000-1 GDPR PCI DSS	SOC 2 Type 2 HIPAA GDPR CCPA

Cherry Servers

Vast.ai

Build your own comparison

Select any 2-6 firms from this guide and open them in the full comparison table.

Cherry Servers Rating 4.6 | Lithuania Vast.ai Rating 4.1 | United States RunPod Rating 3.4 | United States Vultr Rating 1.7 | United States

Tip: if you do not select any firms we will start with the top 2 from this guide.

Best Cloud GPU Providers with NVIDIA A40

What the NVIDIA A40 actually is

Compute and precision support

Interconnect, scaling and power

Which workloads the A40 genuinely fits

Renting the A40: cost and availability context

Frequently asked questions

How much VRAM does the NVIDIA A40 have?

Is the A40 good for training large language models?

How does the A40 compare to the A100?

Does the A40 support NVLink for multi-GPU jobs?

Cherry Servers vs Vast.ai - Comparison of Top Firms in This Guide

Cherry Servers vs Vast.ai - GPU Provider Comparison (June 2026)

Bottom Line: Cherry Servers vs Vast.ai

Where Cherry Servers leads

Where Vast.ai leads

Frequently Asked Questions

Related comparisons

Build your own comparison