Spot pricing (also called interruptible, preemptible, or surplus capacity pricing) lets you rent a GPU instance from a provider’s unused inventory at a steep discount in exchange for one condition: the provider can reclaim that capacity at short notice when a full-price customer wants it, or when supply tightens. The hardware is identical to the on-demand version of the same instance — the same H100, A100, L40S, or RTX-class card with the same VRAM and bandwidth — but you give up guaranteed continuity in return for a lower hourly or per-second rate. Every provider in the comparison above marked as offering spot pricing exposes some form of this discounted, reclaimable tier.

The discount is real and often large, but it is not free money. The right mental model is that you are buying compute that someone else has first claim on. That trade-off is excellent for some workloads and actively dangerous for others, and knowing which is the whole game.

Why interruptibility matters, and which workloads tolerate it

When capacity is reclaimed, your instance is typically given a brief termination warning and then stopped. The length of that warning, what state survives, and whether you can be reassigned a new node all vary by provider — so the single most important thing to compare is not just the headline discount but how each provider handles interruption.

Workloads split cleanly along whether they can absorb a sudden stop:

Fits spot well: long training and fine-tuning runs that checkpoint frequently, large batch-inference and offline embedding jobs, hyperparameter sweeps where individual trials are disposable, rendering farms that queue independent frames, and any fault-tolerant pipeline built around a job queue.
Fits spot poorly: real-time inference endpoints with latency SLAs, interactive notebook sessions you are actively typing in, a single long run with no checkpointing, and anything where a mid-run kill loses hours of irreplaceable state.

The defining habit of teams who use spot effectively is checkpointing. If your training loop writes model and optimizer state to persistent storage every few minutes, an interruption costs you only the work since the last checkpoint plus the time to acquire a replacement node. Without checkpointing, a reclaim can wipe out an entire run, and the discount you chased becomes a net loss.

Engineering you should have in place first

Frequent, atomic checkpoints written to durable storage that outlives the instance — not to the local NVMe scratch disk, which usually disappears with the node.
Automatic resume logic so a freshly acquired instance reloads the latest checkpoint and continues without manual intervention.
A handler for the termination signal so you flush state in the warning window rather than being killed cold.
An orchestration or queue layer that re-requests capacity and re-schedules work when a node vanishes, ideally with fallback to a different instance type or region.

What to check on the spot dimension before you commit

“Offers spot pricing” is a yes/no flag in the table above, but underneath it the implementations differ enough that two providers with the same nominal discount can deliver very different reliability. Compare these specifics:

Interruption notice: how much warning you get before the node is reclaimed, and whether that window is long enough to checkpoint and drain.
Price behavior: whether the spot rate is fixed at a discount or floats with demand, and whether you can set a maximum bid above which you would rather be interrupted than keep paying.
Reclaim frequency: how often a given GPU class actually gets pulled, which tracks scarcity — scarce, in-demand parts like top-tier training GPUs are reclaimed more aggressively than mid-range cards.
Storage persistence: whether your volume and data survive an interruption or are destroyed with the instance, since this decides how painful a reclaim is.
Billing granularity: per-second versus per-hour billing, which matters a lot when nodes are short-lived — being billed a full hour for a node reclaimed after ten minutes erodes the discount.
Capacity depth and regions: whether the provider has enough surplus of the GPU you want that you can actually re-acquire one quickly after a reclaim, rather than waiting in a starved pool.

Refer to the comparison above for which providers expose spot tiers and at what live rates; the discount itself moves constantly with supply, so treat any number you see there as a current snapshot rather than a fixed price.

Spot versus on-demand and reserved capacity

Spot sits at one end of a reliability-versus-cost spectrum. On-demand gives you a node that runs until you stop it, at the highest hourly rate. Reserved or committed capacity locks in hardware for a fixed term at a discount in exchange for a usage commitment. Spot is the cheapest per hour but the least guaranteed. A common and sensible pattern is to mix them: serve latency-sensitive production inference on on-demand or reserved instances, and push the elastic, fault-tolerant work — training sweeps, batch jobs, overnight rendering — onto spot to capture the savings where interruptions are cheap. Used this way, spot is less a gamble and more a deliberate cost-optimization lever for the right half of your workload.

Frequently asked questions

Is the GPU hardware different on a spot instance?

No. A spot instance gives you the same physical GPU, VRAM, and bandwidth as the equivalent on-demand instance of that type. The only difference is the commercial terms: a lower price in exchange for the provider’s right to reclaim the capacity. You are not getting a slower or cut-down card.

How much can spot pricing actually save me?

Discounts off the on-demand rate for the same GPU are typically substantial, and they vary by provider, GPU model, region, and current demand. Because those figures float with supply, check the live comparison above rather than relying on a fixed percentage — and weigh any quoted saving against the engineering cost of making your workload interruption-tolerant.

What happens to my work when a spot instance is reclaimed?

The provider issues a termination signal and then stops the node, usually after a short warning. Anything held only in GPU memory or on local scratch disk is lost unless you have written it to persistent storage. This is why frequent checkpointing and automatic resume are essential: with them, a reclaim costs you minutes; without them, it can cost you the whole run.

Should I run a production inference API on spot?

Generally not on spot alone. Real-time endpoints with latency or uptime commitments need predictable capacity, and an abrupt reclaim can drop live traffic. If you do want spot’s savings for serving, pair it with an on-demand or reserved baseline that absorbs traffic during interruptions, and keep purely spot-backed capacity for batch or offline inference where a pause is harmless.

B200 vs H200 SXM vs H100 SXM — pilihan teratas dari panduan ini

B200 vs H200 SXM vs H100 SXM
	B200 Blackwell · 192 GB	H200 SXM Hopper · 141 GB	H100 SXM Hopper · 80 GB
Spesifikasi
Pengeluar	NVIDIA	NVIDIA	NVIDIA
Seni Bina	Blackwell	Hopper	Hopper
VRAM	192 GB HBM3e	141 GB HBM3e	80 GB HBM3
Lebar Jalur	8,000 GB/s	4,800 GB/s	3,350 GB/s
FP16 (Tensor)	2,250 TFLOPS	990 TFLOPS	990 TFLOPS
FP32	75 TFLOPS	67 TFLOPS	67 TFLOPS
TDP	1000 W	700 W	700 W
Tahun Keluaran	2024	2024	2023
Segmen	Pusat data	Pusat data	Pusat data
Harga Awan
Termurah Atas Permintaan	$1.99/hr	$2.05/hr	$1.57/hr
Penyedia	2	3	7

GPU Awan dengan Harga Spot — June 2026

What spot pricing means when you rent cloud GPUs