Glossary

Mixed Precision

Boost deep learning efficiency with mixed precision training! Achieve faster speeds, reduced memory usage, and energy savings without sacrificing accuracy.

Mixed precision is a technique used in deep learning to speed up model training and reduce memory consumption. It involves using a combination of lower-precision numerical formats, like 16-bit floating-point (FP16), and higher-precision formats, such as 32-bit floating-point (FP32), during computation. By strategically using lower-precision numbers for certain parts of the model, such as weight multiplication, and keeping critical components like weight updates in higher precision, mixed precision training can significantly accelerate performance on modern GPUs without a substantial loss in model accuracy.

How Mixed Precision Works

The core idea behind mixed precision is to leverage the speed and memory efficiency of lower-precision data types. Modern hardware, especially NVIDIA GPUs with Tensor Cores, can perform operations on 16-bit numbers much faster than on 32-bit numbers. The process typically involves three key steps:

  1. Casting to Lower Precision: Most of the model's operations, particularly the computationally intensive matrix multiplications and convolutions, are performed using half-precision (FP16) arithmetic. This reduces the memory footprint and speeds up calculations.
  2. Maintaining a Master Copy of Weights: To maintain model accuracy and stability, a master copy of the model's weights is kept in the standard 32-bit floating-point (FP32) format. This master copy is used to accumulate gradients and update the weights during the training process.
  3. Loss Scaling: To prevent numerical underflow—where small gradient values become zero when converted to FP16—a technique called loss scaling is used. It involves multiplying the loss by a scaling factor before backpropagation to keep the gradient values within a representable range for FP16. Before the weights are updated, the gradients are scaled back down.

Deep learning frameworks like PyTorch and TensorFlow have built-in support for automatic mixed precision, making it easy to implement.

Applications and Examples

Mixed precision is widely adopted in training large-scale machine learning (ML) models, where efficiency is paramount.

Join the Ultralytics community

Join the future of AI. Connect, collaborate, and grow with global innovators

Join now
Link copied to clipboard