Best Cloud GPU Providers with NVIDIA A40
The NVIDIA A40 is a data center GPU with 48GB GDDR6 memory designed for visual computing and inference workloads. It supports hardware ray tracing and is commonly used for virtual desktop infrastructure (VDI), rendering, and AI inference at scale. This guide lists cloud GPU providers with A40 availability.
Lithuania
United States
United States
United States What the NVIDIA A40 actually is
The A40 is an Ampere-generation data-center GPU built on the GA102 die — the same silicon family that powers NVIDIA’s high-end Ampere workstation and consumer cards. It sits in NVIDIA’s professional visualization line rather than its compute-first A100 line, which shapes everything about how it rents and what it is good at. The headline feature for anyone renting it is memory capacity: the A40 carries 48 GB of GDDR6 with ECC, a large pool that lets you load sizeable models and datasets without resorting to multi-GPU sharding.
Because it uses GDDR6 rather than the HBM2e found on the A100, its raw memory bandwidth is lower — in the rough range of hundreds of GB/s rather than the multiple-terabytes-per-second that HBM cards deliver. That single difference is the most important thing to understand when reading the comparison above: the A40 gives you abundant VRAM at modest bandwidth, which is a very different value proposition from a bandwidth-bound HBM card.
Compute and precision support
The A40 includes Ampere’s third-generation Tensor Cores and second-generation RT cores. For AI work, the relevant precisions are well covered:
- FP16 and BF16 for mixed-precision training and inference, with Tensor Core acceleration.
- TF32, Ampere’s tensor format that speeds up FP32-style training math with minimal code changes.
- INT8 and INT4 for quantized, high-throughput inference.
- Structural sparsity support, which can roughly double effective Tensor Core throughput on models trained to exploit it.
What it does not have is FP8, which arrived with the later Hopper generation (H100) and the Ada and Blackwell families. If your workflow specifically targets FP8 training or inference kernels, the A40 is the wrong card and you should filter for a newer generation in the list above.
Interconnect, scaling and power
The A40 connects over PCIe Gen4 and supports NVLink to bridge a pair of cards, which can present a combined 96 GB pool for memory-hungry jobs. It does not scale across many GPUs with the same dense NVLink fabric (NVSwitch) that the SXM A100/H100 use, so it is best thought of as a strong single- or dual-GPU card rather than the foundation of an eight-way training cluster. It is a passively cooled, server-oriented board in roughly the 300 W class, which is part of why providers can pack it densely and offer it at attractive rates.
Which workloads the A40 genuinely fits
The A40’s profile — lots of VRAM, moderate bandwidth, full Ampere AI features, plus real RT cores — makes it a versatile mid-tier rental. It is a strong match for:
- Fine-tuning and LoRA/QLoRA of mid-sized language models, where 48 GB lets you keep optimizer states and longer sequences resident.
- High-throughput batch inference for models in the 7B–34B parameter range (especially quantized), where capacity matters more than peak bandwidth.
- Rendering, 3D, virtual workstations and visualization, where the RT cores and large frame buffer are directly useful — this is the card’s home territory.
- Computer vision, diffusion-model image generation, and general ML experimentation that fits comfortably in 48 GB.
It is overkill for tiny models or light dev work where a smaller, cheaper card would do, and it is underpowered for frontier-scale pretraining of very large models, where you want HBM bandwidth, FP8, and dense multi-GPU interconnect. For latency-critical real-time inference at extreme throughput, a newer inference-optimized card will usually win on tokens per second per dollar even if it has less VRAM.
Renting the A40: cost and availability context
In the rental market the A40 generally sits in the mid tier — meaningfully cheaper per hour than A100/H100-class HBM cards, but above entry-level consumer GPUs. Its appeal is the cost-per-gigabyte-of-VRAM: when your bottleneck is fitting the model in memory rather than crunching it at maximum bandwidth, the A40 is often the most economical way to get 48 GB. Because it is a mature, widely deployed card, on-demand availability tends to be good and it is less prone to the scarcity and waitlists that hit the newest accelerators.
Many providers also offer it on spot or interruptible tiers at a further discount, which suits checkpointed fine-tuning and batch jobs that tolerate restarts. Exact rates move constantly and differ by provider, region, and commitment, so use the comparison above for live per-hour pricing rather than any fixed figure — and when you compare, weigh on-demand versus spot, billing granularity, and whether NVLink pairing is offered if you need the 96 GB combined pool.
Frequently asked questions
How much VRAM does the NVIDIA A40 have?
The A40 has 48 GB of GDDR6 memory with ECC. That large capacity is its main draw for rental, letting you fine-tune mid-sized models or run quantized large-model inference on a single card.
Is the A40 good for training large language models?
It is well suited to fine-tuning and training mid-sized models thanks to its 48 GB pool, but it is not ideal for frontier-scale pretraining. For that you want HBM bandwidth, FP8 support, and dense multi-GPU interconnect found on newer Hopper or Blackwell cards — filter the list above for those if pretraining is your goal.
How does the A40 compare to the A100?
Both are Ampere, but the A100 uses high-bandwidth HBM2e memory for far greater bandwidth and supports dense NVSwitch scaling, making it stronger for large-scale training. The A40 trades that bandwidth for a large 48 GB GDDR6 pool plus RT cores, and rents for less — a better fit when capacity and cost matter more than peak throughput.
Does the A40 support NVLink for multi-GPU jobs?
Yes, two A40s can be bridged with NVLink to present a combined 96 GB memory pool, which helps for models that do not fit on a single card. It does not, however, scale across many GPUs with the dense fabric used by SXM data-center cards, so think of it as a single- or dual-GPU rental.
Cherry Servers vs Vast.ai - Comparison of Top Firms in This Guide
Cherry Servers vs Vast.ai - GPU Provider Comparison (June 2026)
Head-to-head comparison of Cherry Servers and Vast.ai. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed June 2026.
Bottom Line: Cherry Servers vs Vast.ai
Vast.ai comes out ahead overall, leading in 7 of 10 compared categories.
Where Cherry Servers leads
- Trustpilot Rating (4.6 vs 4.1)
- Regions (6 vs 2)
- Kubernetes Support
Where Vast.ai leads
- Starting Price ($/hr) ($0.06/hr vs $0.16/hr)
- Max VRAM (GB) (192 vs 80)
- Max GPUs/Instance (8 vs 2)
- GPU Models (35 vs 6)
- Spot/Preemptible
- Frameworks (5 vs 3)
Choose Cherry Servers for Trustpilot Rating. Choose Vast.ai for Starting Price ($/hr).
Frequently Asked Questions
Is Cherry Servers or Vast.ai better?
Which has a better Trustpilot Rating, Cherry Servers or Vast.ai?
Which has a better Starting Price ($/hr), Cherry Servers or Vast.ai?
|
Cherry Servers
Bare metal GPU servers with 24 years of hosting experience and full hardware-level control.
|
Vast.ai
Instant GPUs. Transparent Pricing.
|
|
|---|---|---|
| Overview | ||
| Trustpilot Rating | 4.6 | 4.1 |
| Headquarters | Lithuania | United States |
| Provider Type | N/A | GPU Marketplace |
| Best For | AI training inference fine-tuning rendering research HPC generative AI deep learning | AI training inference fine-tuning Stable Diffusion batch processing research LLM serving generative AI |
| GPU Hardware | ||
| GPU Models | A100 A40 A16 A10 A2 Tesla P4 | B200 H200 H100 SXM H100 NVL A100 SXM A100 PCIe RTX 5090 RTX 5080 RTX 5070 Ti RTX 6000 Pro RTX 6000 Ada RTX 4500 Ada RTX A6000 RTX A5000 RTX A4000 L40S L40 A40 A10 RTX 4090 RTX 4080 RTX 4070 Ti RTX 4070 RTX 4060 Ti RTX 4060 RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 Tesla V100 Tesla T4 A2 GTX 1080 |
| Max VRAM (GB) | 80 | 192 |
| Max GPUs/Instance | 2 | 8 |
| Interconnect | PCIe | NVLink, InfiniBand |
| Pricing | ||
| Starting Price ($/hr) | $0.16/hr | $0.06/hr |
| Billing Granularity | Per-hour | Per-second |
| Spot/Preemptible | No | Yes |
| Reserved Discounts | N/A | Up to 50% (1-6 month reserved) |
| Free Credits | None | Small test credit on signup |
| Egress Fees | N/A | Varies by host ($/TB) |
| Storage | NVMe SSD, Elastic Block Storage ($0.071/GB/mo) | Varies by host ($/GB/hr, charged while instance exists) |
| Infrastructure | ||
| Regions | Lithuania, Netherlands, Germany, Sweden, US, Singapore (6 locations) | 500+ locations, 40+ data centers |
| Uptime SLA | 99.97% | No formal SLA (host reliability scores visible) |
| Developer Experience | ||
| Frameworks | PyTorch TensorFlow CUDA (bare metal — full stack control) | PyTorch TensorFlow CUDA vLLM ComfyUI |
| Docker Support | Yes | Yes |
| SSH Access | Yes | Yes |
| Jupyter Notebooks | No | Yes |
| API / CLI | Yes | Yes |
| Setup Time | Minutes | Seconds |
| Kubernetes Support | Yes | No |
| Business Terms | ||
| Min Commitment | None | None |
| Compliance | ISO 27001 ISO 20000-1 GDPR PCI DSS | SOC 2 Type 2 HIPAA GDPR CCPA |
Cherry Servers
Build your own comparison
Select any 2-6 firms from this guide and open them in the full comparison table.
Tip: if you do not select any firms we will start with the top 2 from this guide.