How well does NVIDIA GeForce RTX 3080 Ti scale across multiple GPUs?
Odpowiedź
NVIDIA GeForce RTX 3080 Ti performance headline: 34.1 FP16 TFLOPS, 17 FP32 TFLOPS, 912 GB/s bandwidth, 12 GB VRAM.
Converted into practical benchmarks: model training a 7B-parameter LLM in FP16 with reasonable batch sizes typically saturates compute before bandwidth; real-time serving on the same model is usually bandwidth-bound and tracks the 912 GB/s figure. Diffusion image generation benchmarks sit between the two — compute-heavy steps utilise tensor cores well, while attention blocks still touch bandwidth.
See the NVIDIA GeForce RTX 3080 Ti page for the full spec sheet and comparisons to related GPUs.