Is NVIDIA GeForce RTX 3080 faster than A100 for fine-tuning?
Risposta
Raw compute on NVIDIA GeForce RTX 3080 peaks at 29.8 FP16 TFLOPS and 14.9 FP32 TFLOPS, with 760 GB/s of memory bandwidth feeding the compute units. The Ampere architecture brings tensor cores optimised for BF16/FP16 / FP8 mixed precision — the formats that matter most for modern transformers.
Real-world model training throughput scales close to theoretical peaks on large batch sizes; smaller batches are memory-bound. For low-latency inference, tokens-per-second on transformers like Llama 70B depends heavily on quantisation strategy — FP8/INT8 unlock the compute ceiling, FP16 is bandwidth-bound.
The NVIDIA GeForce RTX 3080 page has the complete datasheet and side-by-side comparisons.