Cloud GPU Providers with Per-Second Billing

Per-second billing ensures you pay only for the exact compute time consumed, which is particularly valuable for short experiments, iterative development, and inference jobs that complete in minutes. Compared to hourly billing, per-second granularity can save 30-50% on typical development workflows. This guide lists cloud GPU providers that offer per-second or sub-minute billing.

Updated July 2026 Showing 4 GPU providers per-second
Trustpilot Rating
4.6
Trustpilot Reviews
2,429
+15 (7d) +47 (30d) +143 (90d)
HQ
DigitalOcean United StatesUnited States
Starting Price
$0.76/hr
Max VRAM
192 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
4.1
Trustpilot Reviews
237
+0 (7d) +8 (30d) +25 (90d)
HQ
Vast.ai United StatesUnited States
Starting Price
$0.06/hr
Max VRAM
192 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
3.4
Trustpilot Reviews
245
+1 (7d) +12 (30d) +36 (90d)
HQ
RunPod United StatesUnited States
Starting Price
$0.06/hr
Max VRAM
288 GB
Max GPUs
8
Billing
Per-second
Trustpilot Rating
2.9
Trustpilot Reviews
7
+0 (7d) +0 (30d) +2 (90d)
HQ
Novita AI United StatesUnited States
Starting Price
$0.11/hr
Max VRAM
80 GB
Max GPUs
8
Billing
Per-second

What per-second billing actually means for cloud GPU rentals

Per-second billing means the provider meters your GPU instance in one-second increments and charges only for the seconds the instance is running, rather than rounding up to the next full hour or charging a fixed hourly block. When you launch an instance and tear it down 23 minutes later, you pay for roughly 1,380 seconds of compute, not a full hour. The underlying advertised rate is still usually expressed as a price per GPU-hour, but the meter ticks far more finely, so the gap between what you use and what you pay shrinks to near zero.

This sounds like a small accounting detail, but it materially changes the economics of bursty, automated, and experimental workloads. The list above filters specifically for providers that bill at this granularity, which is the dimension that separates a platform built for short, frequent jobs from one optimized for long-lived, steady-state instances.

Per-second versus per-hour and per-minute

Billing granularity sits on a spectrum, and the differences compound at scale:

  • Per-hour, rounded up: a 90-second job and a 59-minute job both cost a full hour. This is the worst case for short tasks and punishes frequent start/stop cycles.
  • Per-minute: better, but a 5-second inference call or a 20-second container start still rounds up to 60 seconds, which adds up across thousands of invocations.
  • Per-second: you pay for what the wall clock actually records, often with a small minimum charge (commonly the first minute) to discourage abusive churn.

For a single long training run, the granularity barely matters — the rounding error on a 40-hour job is statistically irrelevant. For an autoscaling inference fleet that spins instances up and down hundreds of times a day, or a hyperparameter sweep that launches and kills containers constantly, per-second metering can be the difference between paying for compute you used and paying for idle rounding.

Which workflows benefit most

Per-second billing rewards anything spiky, automated, or short-lived:

  • Burst inference and serverless-style scaling: workloads that scale GPU capacity to match request volume and release it within minutes capture the most savings, because idle rounding is eliminated on every scale event.
  • CI/CD and automated testing: GPU-backed test suites or model-validation jobs that run for a couple of minutes per commit avoid paying full-hour blocks on every pipeline trigger.
  • Hyperparameter search and experimentation: launching dozens of short trials, killing the losers early, and only keeping promising configurations is far cheaper when each killed trial costs only the seconds it actually ran.
  • Interactive notebook sessions: a researcher who fires up a GPU for a ten-minute debugging session and shuts it down pays for ten minutes, not an hour.
  • Batch jobs of unpredictable length: rendering frames, running a batch of embeddings, or transcoding clips where the runtime varies from seconds to minutes per task.

Conversely, if you keep a GPU pinned for days of continuous training, the billing granularity is close to irrelevant and you should weight other factors — interconnect, VRAM, spot discounts, and storage — far more heavily.

The trade-offs and the fine print

Per-second billing is almost always a positive, but it does not exist in isolation, and a few details determine whether the headline advantage is real:

  • Minimum charge: many providers apply a minimum billable period, frequently the first 60 seconds. If your jobs are sub-minute, that floor matters, so check whether a minimum applies and how long it is.
  • What the clock includes: confirm whether billing starts at instance provisioning, at boot, or at the moment the GPU is ready. Slow cold starts, image pulls, and driver initialization can all fall inside the metered window, so a fast-billing rate paired with a slow boot can erase the savings.
  • Storage and IP charges: the GPU compute may stop billing the instant you terminate, but attached persistent volumes, snapshots, and reserved IPs often keep accruing. Per-second compute does not make storage free.
  • Egress and data transfer: these are typically billed by volume, not time, and are unaffected by granularity — a separate line item to compare.
  • Spot and interruptible pricing: per-second metering pairs naturally with interruptible instances, since you are not penalized for a node that lives only a few minutes before reclamation. Together they suit fault-tolerant, checkpointed work.

What to check in the comparison above

When reading the list, treat per-second billing as one axis among several rather than a single deciding factor:

  1. Confirm the granularity is genuinely per-second, not per-minute marketed loosely as “by the second.”
  2. Find the minimum billable period and the point at which the meter starts.
  3. Estimate your typical job duration and start/stop frequency — the finer granularity pays off in proportion to how short and how frequent your jobs are.
  4. Separate compute billing from storage, networking, and idle-resource charges, which granularity does not address.
  5. Cross-reference against the GPU model and on-demand versus spot availability so you are not optimizing seconds on hardware that is wrong for the workload.

Live rates change constantly and vary by region and instance type, so use the comparison above for current per-second pricing rather than any fixed figure.

Frequently asked questions

Does per-second billing make GPUs meaningfully cheaper?

It lowers your effective cost only to the extent that your jobs are short or frequently cycled. For long, continuous runs the savings versus per-hour billing are negligible; for bursty, autoscaling, or experimental workloads with many short instances, the elimination of rounding can produce a real reduction in your bill.

Is there usually a minimum charge with per-second billing?

Often yes. Many providers bill a minimum period — commonly the first minute — even when an instance runs for only a few seconds. This is meant to prevent rapid churn abuse. If your jobs are sub-minute, confirm the minimum before assuming you pay for literal seconds.

When does the per-second meter start and stop?

It varies. Some providers begin metering at provisioning, others at boot or when the GPU becomes usable. Cold-start time, image pulls, and driver setup may all fall inside the billed window, so a fast rate with a slow startup can be worse than a slightly higher rate that boots quickly.

Does per-second billing apply to storage and data transfer too?

No. Per-second granularity typically covers GPU compute only. Persistent volumes, snapshots, reserved IPs, and egress are usually billed separately — by capacity or by volume — and continue to accrue even after the GPU instance is terminated.

DigitalOcean vs Vast.ai - Comparison of Top Firms in This Guide

DigitalOcean vs Vast.ai - GPU Provider Comparison (July 2026)

Head-to-head comparison of DigitalOcean and Vast.ai. Compare GPU models, hourly pricing, billing granularity, spot instances, VRAM, infrastructure, developer tools, Kubernetes support, and compliance before choosing a provider. Data refreshed July 2026.

Bottom Line: DigitalOcean vs Vast.ai

DigitalOcean and Vast.ai are closely matched — each leads in several categories, so the right pick depends on your priorities.

Where DigitalOcean leads

  • Trustpilot Rating (4.6 vs 4.1)
  • Regions (5 vs 2)
  • Frameworks (7 vs 5)
  • Kubernetes Support

Where Vast.ai leads

  • Starting Price ($/hr) ($0.06/hr vs $0.76/hr)
  • GPU Models (35 vs 6)
  • Spot/Preemptible

Choose DigitalOcean for Trustpilot Rating. Choose Vast.ai for Starting Price ($/hr).

Frequently Asked Questions

Is DigitalOcean or Vast.ai better?
It is close — DigitalOcean and Vast.ai each lead in several categories. Compare the points that matter most to you below.
Which has a better Trustpilot Rating, DigitalOcean or Vast.ai?
DigitalOcean (4.6 vs 4.1).
Which has a better Starting Price ($/hr), DigitalOcean or Vast.ai?
Vast.ai ($0.06/hr vs $0.76/hr).
DigitalOcean vs Vast.ai - GPU Provider Comparison (July 2026)
DigitalOcean
Simple, scalable GPU cloud for AI/ML
Visit DigitalOcean
Vast.ai
Instant GPUs. Transparent Pricing.
Visit Vast.ai
Overview
Trustpilot Rating 4.6 4.1
Headquarters United States United States
Provider Type N/A GPU Marketplace
Best For AI training inference fine-tuning LLM deployment LLM serving computer vision startups generative AI research AI training inference fine-tuning Stable Diffusion batch processing research LLM serving generative AI
GPU Hardware
GPU Models RTX 4000 Ada RTX 6000 Ada L40S MI300X H100 SXM H200 B200 H200 H100 SXM H100 NVL A100 SXM A100 PCIe RTX 5090 RTX 5080 RTX 5070 Ti RTX 6000 Pro RTX 6000 Ada RTX 4500 Ada RTX A6000 RTX A5000 RTX A4000 L40S L40 A40 A10 RTX 4090 RTX 4080 RTX 4070 Ti RTX 4070 RTX 4060 Ti RTX 4060 RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 Tesla V100 Tesla T4 A2 GTX 1080
Max VRAM (GB) 192 192
Max GPUs/Instance 8 8
Interconnect NVLink NVLink, InfiniBand
Pricing
Starting Price ($/hr) $0.76/hr $0.06/hr
Billing Granularity Per-second Per-second
Spot/Preemptible No Yes
Reserved Discounts N/A Up to 50% (1-6 month reserved)
Free Credits $200 free credit for 60 days Small test credit on signup
Egress Fees None (included in plan) Varies by host ($/TB)
Storage 500-720 GiB NVMe boot (included), 5 TiB NVMe scratch on larger configs, Volumes at $0.10/GiB/mo Varies by host ($/GB/hr, charged while instance exists)
Infrastructure
Regions New York (NYC2), Toronto (TOR1), Atlanta (ATL1), Richmond (RIC1), Amsterdam (AMS3) 500+ locations, 40+ data centers
Uptime SLA 99% No formal SLA (host reliability scores visible)
Developer Experience
Frameworks PyTorch TensorFlow Jupyter Miniconda CUDA ROCm Hugging Face PyTorch TensorFlow CUDA vLLM ComfyUI
Docker Support Yes Yes
SSH Access Yes Yes
Jupyter Notebooks Yes Yes
API / CLI Yes Yes
Setup Time Minutes Seconds
Kubernetes Support Yes No
Business Terms
Min Commitment None None
Compliance SOC 2 Type II SOC 3 HIPAA (with BAA) CSA STAR Level 1 SOC 2 Type 2 HIPAA GDPR CCPA
DigitalOcean Vast.ai

Build your own comparison

Select any 2-6 firms from this guide and open them in the full comparison table.

Tip: if you do not select any firms we will start with the top 2 from this guide.