Cloud GPU Providers with Spot / Preemptible Instances
Spot or preemptible GPU instances offer 50-90% savings compared to on-demand pricing, in exchange for the possibility of interruption during high-demand periods. They are ideal for fault-tolerant workloads like distributed training with checkpointing, batch inference, and hyperparameter sweeps. This guide lists cloud GPU providers that offer spot pricing, helping you significantly reduce your GPU compute costs.
United States
United States
United States
United States What spot and preemptible GPU instances actually are
A spot or preemptible GPU instance is rented from a provider’s pool of spare capacity at a steep discount in exchange for one critical condition: the provider can reclaim the machine at any time, usually with little or no warning. The hardware is identical to the on-demand version of the same GPU — the same VRAM, the same tensor cores, the same interconnect — but the contract around its availability is different. You are buying compute that is cheap precisely because it is interruptible. Every provider in the list above marked as offering spot or preemptible capacity exposes this trade in some form, though the names differ: spot, preemptible, interruptible, community, or surplus instances all describe the same underlying idea.
The discount exists because data centers rarely run at 100% utilization. Idle GPUs earn nothing, so providers sell that slack at a fraction of the standard rate and accept that they may need to take it back the moment a full-price customer wants it, or when their own scheduling needs change. For the renter, that means the headline savings are real, but they come with operational obligations you do not have on a guaranteed on-demand node.
Why interruptible pricing matters for real workloads
The reason spot capacity is worth understanding is that GPU rental is expensive, and the discount on interruptible instances is often large enough to change which projects are economically viable. The catch is that not every workload tolerates being killed mid-run. The deciding factor is almost always how well your job checkpoints and resumes.
- Excellent fit: long training and fine-tuning runs that save checkpoints to durable storage every few minutes, large batch-inference or embedding jobs, offline rendering, hyperparameter sweeps where individual trials are independent, and any pipeline already built around fault tolerance.
- Poor fit: real-time or low-latency inference serving a live application, interactive development sessions where losing the box means losing unsaved work, and tightly synchronized multi-GPU training that cannot recover gracefully when one node disappears.
The mental model is simple: if losing the instance costs you only the minutes since your last checkpoint, spot is almost always the right call. If losing it costs you a request, a customer, or hours of un-saved state, the on-demand premium buys you peace of mind that is worth paying for.
The trade-offs to weigh
Interruption is the obvious cost, but it is not the only one. When you compare providers on this dimension, keep the full picture in mind:
- Reclaim behavior: some providers give a short termination notice (often a couple of minutes) so your job can save state and exit cleanly; others can pull the machine instantly. A grace period is enormously valuable because it lets you trigger a final checkpoint.
- Availability variance: spot pools fluctuate. The exact GPU you want at the price you saw can be unavailable for stretches, and the most in-demand accelerators are reclaimed more aggressively than older or less popular cards.
- Storage that outlives the instance: if your checkpoints live only on the instance’s local disk, an interruption wipes them. Spot only works safely when your data and checkpoints sit on persistent or network storage that survives the node.
- Restart friction: after a reclaim you must re-acquire capacity, re-pull your container image and data, and resume — so cold-start time and image size affect your effective throughput and cost.
What to check before renting spot capacity
Because the same word can mean different things across providers, use the comparison above to confirm the specifics rather than assuming. Before committing a workload to interruptible instances, work through this checklist:
- Notice window: does the provider warn you before reclaiming, and how long is the grace period? Even 30–120 seconds changes how you design your checkpointing.
- How aggressive are reclaims: are spot machines taken back only under genuine capacity pressure, or also for routine rebalancing? Frequent, low-pressure reclaims erode the savings.
- Checkpoint plumbing: can you write checkpoints to durable object or network storage cheaply, and is egress to retrieve them reasonable? This is the single most important enabler of safe spot use.
- Automatic re-acquisition: does the platform automatically requeue and restart your job when capacity returns, or must you script that yourself? Managed requeue makes spot far less hands-on.
- Multi-GPU and multi-node behavior: if you need several GPUs together, losing one can stall the whole job. Check whether the provider can hold a group atomically or only offers single-GPU spot.
- Billing granularity: per-second or per-minute billing pairs well with spot because you only pay for the time you actually ran before a reclaim, rather than rounding up.
A practical pattern many teams adopt is a hybrid setup: run the bulk of throughput-oriented, checkpointable work on spot to capture the discount, while keeping a small on-demand footprint for anything latency-sensitive or stateful. That blend captures most of the savings without exposing the parts of your pipeline that genuinely cannot tolerate interruption.
Frequently asked questions
Will I lose my work when a spot GPU instance is reclaimed?
You lose any state that exists only on the instance at the moment it is reclaimed — including unsaved progress and anything on local-only disk. You do not lose work that you have already written to persistent or network storage. This is why frequent checkpointing to durable storage is the core discipline of using spot capacity safely; with good checkpointing you lose at most the few minutes since your last save.
Is the GPU hardware different on spot versus on-demand instances?
No. Spot and on-demand instances draw from the same physical hardware, so the GPU, its VRAM, its tensor cores, and its interconnect are identical. The only difference is the contract around availability and price: spot is cheaper but interruptible, while on-demand costs more and is not reclaimed out from under you. You are paying for guaranteed continuity, not for faster silicon.
How much can spot instances actually save compared to on-demand?
The discount is typically substantial and is the main reason to choose interruptible capacity, but the exact figure varies by provider, GPU model, region, and current demand, and it moves constantly. Rather than rely on a single number, check the live comparison above for the current spot versus on-demand spread on the specific GPU you want.
Which workloads should never run on spot instances?
Avoid spot for anything that cannot survive a sudden disappearance: live, low-latency inference behind a production application, interactive sessions holding unsaved work, and tightly coupled multi-GPU jobs that cannot recover when one node is lost. For those, the on-demand premium is worth it. Everything that checkpoints cleanly and tolerates restarts — training, fine-tuning, batch inference, rendering, and sweeps — is well suited to spot.
Vast.ai vs RunPod - Comparison of Top Firms in This Guide
Vast.ai vs RunPod - GPU Provider Comparison (June 2026)
Head-to-head comparison of Vast.ai and RunPod. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed June 2026.
Bottom Line: Vast.ai vs RunPod
Vast.ai comes out ahead overall, leading in 4 of 5 compared categories.
Where Vast.ai leads
- Trustpilot Rating (4.1 vs 3.4)
- GPU Models (35 vs 30)
- Regions (2 vs 1)
- Compliance (4 vs 1)
Where RunPod leads
- Max VRAM (GB) (288 vs 192)
Choose Vast.ai for Trustpilot Rating. Choose RunPod for Max VRAM (GB).
Frequently Asked Questions
Is Vast.ai or RunPod better?
Which has a better Trustpilot Rating, Vast.ai or RunPod?
Which has a better Max VRAM (GB), Vast.ai or RunPod?
|
Vast.ai
Instant GPUs. Transparent Pricing.
|
RunPod
The cloud built for AI — deploy and scale GPU workloads from serverless inference to instant multi-node clusters on demand.
|
|
|---|---|---|
| Overview | ||
| Trustpilot Rating | 4.1 | 3.4 |
| Headquarters | United States | United States |
| Provider Type | GPU Marketplace | GPU-Focused |
| Best For | AI training inference fine-tuning Stable Diffusion batch processing research LLM serving generative AI | AI training inference fine-tuning Stable Diffusion batch processing rendering research LLM serving generative AI |
| GPU Hardware | ||
| GPU Models | B200 H200 H100 SXM H100 NVL A100 SXM A100 PCIe RTX 5090 RTX 5080 RTX 5070 Ti RTX 6000 Pro RTX 6000 Ada RTX 4500 Ada RTX A6000 RTX A5000 RTX A4000 L40S L40 A40 A10 RTX 4090 RTX 4080 RTX 4070 Ti RTX 4070 RTX 4060 Ti RTX 4060 RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 Tesla V100 Tesla T4 A2 GTX 1080 | B300 B200 H200 H100 SXM H100 PCIe H100 NVL MI300X A100 SXM A100 PCIe RTX 5090 RTX PRO 6000 L40S L40 RTX 6000 Ada RTX 5000 Ada RTX A6000 RTX A5000 RTX 4090 RTX 4080 SUPER RTX 4080 RTX 4070 Ti RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 A40 A30 A2 L4 |
| Max VRAM (GB) | 192 | 288 |
| Max GPUs/Instance | 8 | 8 |
| Interconnect | NVLink, InfiniBand | NVLink |
| Pricing | ||
| Starting Price ($/hr) | $0.06/hr | $0.06/hr |
| Billing Granularity | Per-second | Per-second |
| Spot/Preemptible | Yes | Yes |
| Reserved Discounts | Up to 50% (1-6 month reserved) | 15-29% (1-month to 1-year plans) |
| Free Credits | Small test credit on signup | $5-$500 bonus after first $10 spend |
| Egress Fees | Varies by host ($/TB) | None (Free) |
| Storage | Varies by host ($/GB/hr, charged while instance exists) | Container/Volume ($0.10/GB/mo), Idle Volume ($0.20/GB/mo), Network Storage ($0.07/GB/mo 1TB) |
| Infrastructure | ||
| Regions | 500+ locations, 40+ data centers | 31 global regions |
| Uptime SLA | No formal SLA (host reliability scores visible) | 99.99% |
| Developer Experience | ||
| Frameworks | PyTorch TensorFlow CUDA vLLM ComfyUI | PyTorch TensorFlow JAX ONNX CUDA |
| Docker Support | Yes | Yes |
| SSH Access | Yes | Yes |
| Jupyter Notebooks | Yes | Yes |
| API / CLI | Yes | Yes |
| Setup Time | Seconds | Instant |
| Kubernetes Support | No | No |
| Business Terms | ||
| Min Commitment | None | None |
| Compliance | SOC 2 Type 2 HIPAA GDPR CCPA | SOC 2 Type II |
RunPod
Build your own comparison
Select any 2-6 firms from this guide and open them in the full comparison table.
Tip: if you do not select any firms we will start with the top 2 from this guide.