Best 12+ GB VRAM Cloud GPUs — June 2026

Every cloud GPU with at least 12 GB of VRAM — the floor for running modern small-to-mid LLMs, image-gen, and most fine-tuning jobs.

Updated June 2026 Showing 43 GPU models 12 GB+ VRAM
NVIDIA 384 GB
GB200 Superchip
HBM3e Blackwell
VRAM 384 GB
NVIDIA 288 GB
B300
HBM3e Blackwell Ultra
VRAM 288 GB
AMD 288 GB
MI350X
HBM3e CDNA 4
VRAM 288 GB
AMD 288 GB
MI355X
HBM3e CDNA 4 $2.59/hr
VRAM 288 GB
AMD 256 GB
MI325X
HBM3e CDNA 3 $2.00/hr
VRAM 256 GB
NVIDIA 192 GB
B200
HBM3e Blackwell $1.99/hr
VRAM 192 GB
NVIDIA 192 GB
B100
HBM3e Blackwell
VRAM 192 GB
AMD 192 GB
MI300X
HBM3 CDNA 3 $1.85/hr
VRAM 192 GB
NVIDIA 141 GB
H200 SXM
HBM3e Hopper $2.05/hr
VRAM 141 GB
NVIDIA 96 GB
GH200 Superchip
HBM3 Hopper
VRAM 96 GB
NVIDIA 80 GB
H100 SXM
HBM3 Hopper $1.57/hr
VRAM 80 GB
NVIDIA 80 GB
A100 SXM (80GB)
HBM2e Ampere $1.10/hr
VRAM 80 GB
NVIDIA 64 GB
A16
GDDR6 Ampere $0.47/hr
VRAM 64 GB
NVIDIA 48 GB
L40S
GDDR6 Ada Lovelace $0.55/hr
VRAM 48 GB
NVIDIA 48 GB
L40
GDDR6 Ada Lovelace
VRAM 48 GB
NVIDIA 48 GB
A40
GDDR6 Ampere $0.30/hr
VRAM 48 GB
NVIDIA 40 GB
A100 SXM (40GB)
HBM2e Ampere $0.80/hr
VRAM 40 GB
NVIDIA 24 GB
A30
HBM2e Ampere $0.25/hr
VRAM 24 GB
NVIDIA 24 GB
L4
GDDR6 Ada Lovelace $0.39/hr
VRAM 24 GB
NVIDIA 24 GB
A10G
GDDR6 Ampere
VRAM 24 GB
NVIDIA 16 GB
V100
HBM2 Volta $0.13/hr
VRAM 16 GB
NVIDIA 16 GB
T4
GDDR6 Turing $0.08/hr
VRAM 16 GB
NVIDIA 16 GB
A2
GDDR6 Ampere $0.22/hr
VRAM 16 GB
NVIDIA 96 GB
RTX PRO 6000
GDDR7 Blackwell $1.71/hr
VRAM 96 GB
NVIDIA 48 GB
RTX 6000 Ada
GDDR6 Ada Lovelace $0.47/hr
VRAM 48 GB
NVIDIA 48 GB
RTX A6000
GDDR6 Ampere $0.30/hr
VRAM 48 GB
NVIDIA 32 GB
RTX 5000 Ada
GDDR6 Ada Lovelace
VRAM 32 GB
NVIDIA 24 GB
RTX A5000
GDDR6 Ampere
VRAM 24 GB
NVIDIA 24 GB
RTX 4500 Ada
GDDR6 Ada Lovelace
VRAM 24 GB
NVIDIA 20 GB
RTX 4000 Ada
GDDR6 Ada Lovelace $0.76/hr
VRAM 20 GB
NVIDIA 16 GB
RTX A4000
GDDR6 Ampere
VRAM 16 GB
NVIDIA 32 GB
RTX 5090
GDDR7 Blackwell $0.34/hr
VRAM 32 GB
NVIDIA 24 GB
RTX 4090
GDDR6X Ada Lovelace $0.28/hr
VRAM 24 GB
NVIDIA 24 GB
RTX 3090
GDDR6X Ampere $0.12/hr
VRAM 24 GB
NVIDIA 24 GB
RTX 3090 Ti
GDDR6X Ampere
VRAM 24 GB
NVIDIA 16 GB
RTX 5080
GDDR7 Blackwell
VRAM 16 GB
NVIDIA 16 GB
RTX 4080 SUPER
GDDR6X Ada Lovelace
VRAM 16 GB
NVIDIA 16 GB
RTX 4080
GDDR6X Ada Lovelace
VRAM 16 GB
NVIDIA 16 GB
RTX 5070 Ti
GDDR7 Blackwell
VRAM 16 GB
NVIDIA 16 GB
RTX 4060 Ti
GDDR6 Ada Lovelace
VRAM 16 GB
NVIDIA 12 GB
RTX 4070 Ti
GDDR6X Ada Lovelace
VRAM 12 GB
NVIDIA 12 GB
RTX 3080 Ti
GDDR6X Ampere
VRAM 12 GB
NVIDIA 12 GB
RTX 4070
GDDR6X Ada Lovelace
VRAM 12 GB

What the 12 GB VRAM threshold actually buys you

Filtering for 12 GB or more of video memory is a deliberate floor, not a ceiling. It marks the point where a rented cloud GPU stops being a toy for tiny demos and becomes capable of real fine-tuning, comfortable inference on mid-sized models, and most single-GPU rendering and computer-vision pipelines. Below 12 GB you spend your time fighting out-of-memory errors and shrinking batch sizes; at 12 GB and up you have enough headroom that the GPU’s compute, rather than its memory wall, usually becomes the limiting factor.

The 12 GB tier is well populated because it spans several generations of consumer and data-center silicon. You will see cards built on GDDR6 and GDDR6X memory in this band, as well as the entry rungs of data-center accelerators. That diversity is exactly why the comparison above matters: two instances both labeled “12 GB” can differ enormously in memory bandwidth, tensor throughput, and supported precisions even though their raw capacity number is identical.

Why VRAM capacity is the first number to check

For most AI and graphics work, VRAM capacity is the hard gate. A model, its activations, the optimizer state during training, and your working batch all have to fit in memory simultaneously. If they do not, the job simply will not run, no matter how fast the chip is. That is why VRAM is a useful facet to filter on first, then refine by speed and price.

What a 12 GB+ card comfortably handles:

  • Inference on models up to roughly 7B parameters when quantized to 4-bit or 8-bit (INT8 / INT4), which fits the weights plus a usable context window into 12 GB.
  • Fine-tuning with parameter-efficient methods such as LoRA and QLoRA, where only a small adapter is trained and the frozen base model is loaded in reduced precision.
  • Stable Diffusion and other image-generation pipelines, including higher resolutions and modest batch sizes, since these typically need well under 12 GB for inference.
  • Computer vision training and rendering, where 12 GB accommodates respectable batch sizes for detection, segmentation, and 3D viewport or offline render scenes.

Where 12 GB starts to hurt is full-precision (FP16/BF16) training or fine-tuning of larger models, long-context inference that inflates the key-value cache, and any workload that wants large batches for throughput. For those, you climb past this tier into 24 GB, 40 GB, or 80 GB cards, often with the high-bandwidth memory (HBM) and NVLink interconnect that this entry tier usually lacks.

Bandwidth and precision matter as much as the gigabytes

Capacity tells you whether a job fits; bandwidth and tensor capability tell you how fast it runs. Cards in the 12 GB band rely mostly on GDDR-class memory, which delivers solid but not HBM-level bandwidth, so memory-bound inference can be throttled even when capacity is fine. On the compute side, check which precisions the silicon accelerates: tensor cores that support FP16 and BF16 are common across this tier, while newer FP8 acceleration and the fastest INT8 paths appear only on certain generations. If your workload leans on a specific reduced precision, confirm the underlying architecture in the list above rather than trusting the capacity number alone.

Rental and availability context for the 12 GB tier

This is one of the most cost-effective and widely available segments of the cloud GPU market. Because the band includes mature consumer-class cards alongside entry data-center parts, supply is generally healthy and on-demand instances are easy to find without joining a waitlist. That makes the 12 GB tier the natural home for spot and interruptible instances, where you trade a guaranteed lifetime for a meaningfully lower rate.

Practical things to weigh when renting at this level:

  • On-demand versus spot: for short fine-tunes, inference endpoints, and experiments that checkpoint frequently, interruptible instances in this tier stretch a budget further; for anything that cannot tolerate a sudden eviction, pay for on-demand.
  • Billing granularity: per-second or per-minute billing rewards the bursty, iterative workflows that fit a 12 GB card, so it is worth comparing in the table above.
  • Generation gap: a newer 12 GB card can outrun an older one with the same capacity thanks to faster memory and better tensor support, so let architecture, not just gigabytes, break ties.
  • Headroom: leave a margin below the full 12 GB for the framework, CUDA context, and fragmentation, which can quietly consume a gigabyte or more.

Read the comparison above as a shortlist of everything that clears the 12 GB floor, then sort by the dimension your job actually cares about: bandwidth for inference, tensor throughput for training, and price model for budget. Live per-hour rates are shown there because they move frequently and vary by provider and region.

Frequently asked questions

Is 12 GB of VRAM enough to run a large language model?

It is enough for small to mid-sized models, roughly up to 7B parameters, when you quantize the weights to 4-bit or 8-bit. Running larger models in full precision, or serving long contexts that inflate the key-value cache, generally pushes you above this tier toward 24 GB or more.

Why do two 12 GB cloud GPUs perform so differently?

Capacity is only one factor. Two cards can both offer 12 GB yet differ in memory bandwidth, the generation of their tensor cores, and which reduced precisions (FP16, BF16, FP8, INT8) they accelerate. A newer architecture with faster GDDR memory will outperform an older one of identical capacity, so check the underlying chip in the list above.

Should I choose spot instances at this VRAM level?

Spot and interruptible instances are a strong fit here because the 12 GB tier is well supplied, which keeps eviction rates manageable, and the savings are significant. They suit checkpointed fine-tunes and stateless inference; for long unbroken jobs or production endpoints that cannot be interrupted, on-demand is safer.

How much usable memory do I really get from a 12 GB card?

Plan on somewhat less than the full 12 GB. The deep-learning framework, the CUDA context, and memory fragmentation each reserve space, often totaling a gigabyte or more before your model loads. Size your batches and context windows with that overhead in mind to avoid out-of-memory failures.

GB200 Superchip vs B300 vs MI350X — top picks from this guide

GB200 Superchip vs B300 vs MI350X
GB200 Superchip
Blackwell · 384 GB
B300
Blackwell Ultra · 288 GB
MI350X
CDNA 4 · 288 GB
Specifications
Manufacturer NVIDIA NVIDIA AMD
Architecture Blackwell Blackwell Ultra CDNA 4
VRAM 384 GB HBM3e 288 GB HBM3e 288 GB HBM3e
Memory Bandwidth 16,000 GB/s 8,000 GB/s 8,000 GB/s
FP16 (Tensor) 4,500 TFLOPS 2,250 TFLOPS 1,800 TFLOPS
FP32 150 TFLOPS 75 TFLOPS 72 TFLOPS
TDP 2700 W 1400 W 1000 W
Release Year 2024 2025 2025
Segment Data center Data center Data center
Cloud Pricing
Cheapest On-Demand
Providers 0 1 1

Build your own GPU comparison

Select any 2 GPUs from this guide and open them side-by-side.

Tip: GPU comparisons run in pairs. Pick exactly 2 — if you skip selection, we open the top 2 from this guide.