GPU Awan dengan Harga Spot — June 2026

GPU tersedia dengan harga spot/preemptible — biasanya 50-80% lebih murah daripada harga atas permintaan untuk beban kerja tahan ralat.

Dikemas kini Jun 2026 Memaparkan 12 model GPU Tersedia dengan harga spot

What spot pricing means when you rent cloud GPUs

Spot pricing (also called interruptible, preemptible, or surplus capacity pricing) lets you rent a GPU instance from a provider’s unused inventory at a steep discount in exchange for one condition: the provider can reclaim that capacity at short notice when a full-price customer wants it, or when supply tightens. The hardware is identical to the on-demand version of the same instance — the same H100, A100, L40S, or RTX-class card with the same VRAM and bandwidth — but you give up guaranteed continuity in return for a lower hourly or per-second rate. Every provider in the comparison above marked as offering spot pricing exposes some form of this discounted, reclaimable tier.

The discount is real and often large, but it is not free money. The right mental model is that you are buying compute that someone else has first claim on. That trade-off is excellent for some workloads and actively dangerous for others, and knowing which is the whole game.

Why interruptibility matters, and which workloads tolerate it

When capacity is reclaimed, your instance is typically given a brief termination warning and then stopped. The length of that warning, what state survives, and whether you can be reassigned a new node all vary by provider — so the single most important thing to compare is not just the headline discount but how each provider handles interruption.

Workloads split cleanly along whether they can absorb a sudden stop:

  • Fits spot well: long training and fine-tuning runs that checkpoint frequently, large batch-inference and offline embedding jobs, hyperparameter sweeps where individual trials are disposable, rendering farms that queue independent frames, and any fault-tolerant pipeline built around a job queue.
  • Fits spot poorly: real-time inference endpoints with latency SLAs, interactive notebook sessions you are actively typing in, a single long run with no checkpointing, and anything where a mid-run kill loses hours of irreplaceable state.

The defining habit of teams who use spot effectively is checkpointing. If your training loop writes model and optimizer state to persistent storage every few minutes, an interruption costs you only the work since the last checkpoint plus the time to acquire a replacement node. Without checkpointing, a reclaim can wipe out an entire run, and the discount you chased becomes a net loss.

Engineering you should have in place first

  • Frequent, atomic checkpoints written to durable storage that outlives the instance — not to the local NVMe scratch disk, which usually disappears with the node.
  • Automatic resume logic so a freshly acquired instance reloads the latest checkpoint and continues without manual intervention.
  • A handler for the termination signal so you flush state in the warning window rather than being killed cold.
  • An orchestration or queue layer that re-requests capacity and re-schedules work when a node vanishes, ideally with fallback to a different instance type or region.

What to check on the spot dimension before you commit

“Offers spot pricing” is a yes/no flag in the table above, but underneath it the implementations differ enough that two providers with the same nominal discount can deliver very different reliability. Compare these specifics:

  • Interruption notice: how much warning you get before the node is reclaimed, and whether that window is long enough to checkpoint and drain.
  • Price behavior: whether the spot rate is fixed at a discount or floats with demand, and whether you can set a maximum bid above which you would rather be interrupted than keep paying.
  • Reclaim frequency: how often a given GPU class actually gets pulled, which tracks scarcity — scarce, in-demand parts like top-tier training GPUs are reclaimed more aggressively than mid-range cards.
  • Storage persistence: whether your volume and data survive an interruption or are destroyed with the instance, since this decides how painful a reclaim is.
  • Billing granularity: per-second versus per-hour billing, which matters a lot when nodes are short-lived — being billed a full hour for a node reclaimed after ten minutes erodes the discount.
  • Capacity depth and regions: whether the provider has enough surplus of the GPU you want that you can actually re-acquire one quickly after a reclaim, rather than waiting in a starved pool.

Refer to the comparison above for which providers expose spot tiers and at what live rates; the discount itself moves constantly with supply, so treat any number you see there as a current snapshot rather than a fixed price.

Spot versus on-demand and reserved capacity

Spot sits at one end of a reliability-versus-cost spectrum. On-demand gives you a node that runs until you stop it, at the highest hourly rate. Reserved or committed capacity locks in hardware for a fixed term at a discount in exchange for a usage commitment. Spot is the cheapest per hour but the least guaranteed. A common and sensible pattern is to mix them: serve latency-sensitive production inference on on-demand or reserved instances, and push the elastic, fault-tolerant work — training sweeps, batch jobs, overnight rendering — onto spot to capture the savings where interruptions are cheap. Used this way, spot is less a gamble and more a deliberate cost-optimization lever for the right half of your workload.

Frequently asked questions

Is the GPU hardware different on a spot instance?

No. A spot instance gives you the same physical GPU, VRAM, and bandwidth as the equivalent on-demand instance of that type. The only difference is the commercial terms: a lower price in exchange for the provider’s right to reclaim the capacity. You are not getting a slower or cut-down card.

How much can spot pricing actually save me?

Discounts off the on-demand rate for the same GPU are typically substantial, and they vary by provider, GPU model, region, and current demand. Because those figures float with supply, check the live comparison above rather than relying on a fixed percentage — and weigh any quoted saving against the engineering cost of making your workload interruption-tolerant.

What happens to my work when a spot instance is reclaimed?

The provider issues a termination signal and then stops the node, usually after a short warning. Anything held only in GPU memory or on local scratch disk is lost unless you have written it to persistent storage. This is why frequent checkpointing and automatic resume are essential: with them, a reclaim costs you minutes; without them, it can cost you the whole run.

Should I run a production inference API on spot?

Generally not on spot alone. Real-time endpoints with latency or uptime commitments need predictable capacity, and an abrupt reclaim can drop live traffic. If you do want spot’s savings for serving, pair it with an on-demand or reserved baseline that absorbs traffic during interruptions, and keep purely spot-backed capacity for batch or offline inference where a pause is harmless.

B200 vs H200 SXM vs H100 SXM — pilihan teratas dari panduan ini

B200 vs H200 SXM vs H100 SXM
B200
Blackwell · 192 GB
H200 SXM
Hopper · 141 GB
H100 SXM
Hopper · 80 GB
Spesifikasi
Pengeluar NVIDIA NVIDIA NVIDIA
Seni Bina Blackwell Hopper Hopper
VRAM 192 GB HBM3e 141 GB HBM3e 80 GB HBM3
Lebar Jalur 8,000 GB/s 4,800 GB/s 3,350 GB/s
FP16 (Tensor) 2,250 TFLOPS 990 TFLOPS 990 TFLOPS
FP32 75 TFLOPS 67 TFLOPS 67 TFLOPS
TDP 1000 W 700 W 700 W
Tahun Keluaran 2024 2024 2023
Segmen Pusat data Pusat data Pusat data
Harga Awan
Termurah Atas Permintaan $1.99/hr $2.05/hr $1.57/hr
Penyedia 2 3 7

Bina perbandingan GPU anda sendiri

Pilih mana-mana 2 GPU dari panduan ini dan buka secara bersebelahan.

Petua: Perbandingan GPU dijalankan berpasangan. Pilih tepat 2 — jika anda tidak memilih, kami akan buka 2 teratas dari panduan ini.