Glossar

Quantisierungsorientiertes Training (QAT)

Optimiere KI-Modelle für Edge-Geräte mit Quantization-Aware Training (QAT), um hohe Genauigkeit und Effizienz in ressourcenbeschränkten Umgebungen zu gewährleisten.

Quantization-Aware Training (QAT) is a powerful technique used to optimize deep learning (DL) models, like Ultralytics YOLO models, for deployment on devices with limited computational resources, such as mobile phones or embedded systems. Standard models often use high-precision numbers (like 32-bit floating-point or FP32) for calculations, which demand significant processing power and memory. QAT aims to reduce this demand by preparing the model during the training phase to perform well even when using lower-precision numbers (e.g., 8-bit integers or INT8), thereby bridging the gap between high accuracy and efficient performance on edge devices. This optimization is crucial for enabling complex AI tasks directly on hardware like smartphones or IoT sensors.

So funktioniert quantisierungsorientiertes Training

Unlike methods that quantize a model after it has been fully trained, QAT integrates the simulation of quantization effects directly into the training process. It introduces operations called 'fake quantization' nodes within the model architecture during training. These nodes mimic the effect of lower precision (e.g., INT8 precision) on model weights and activations during the forward pass, rounding values as they would be in a truly quantized model. However, during the backward pass (where the model learns via backpropagation), gradients are typically calculated and updates applied using standard high-precision floating-point numbers. This allows the model's parameters to adapt and learn to be robust to the precision loss that will occur during actual quantized inference. By "seeing" the effects of quantization during training, the model minimizes the accuracy drop often associated with deploying models in low-precision formats, a key aspect discussed in model optimization strategies. Frameworks like TensorFlow Lite and PyTorch provide tools to implement QAT.

Abgrenzung zu verwandten Konzepten

QAT vs. Modellquantisierung (Post-Training)

The primary difference lies in when quantization is applied. Model Quantization, often referring to Post-Training Quantization (PTQ), converts a pre-trained, full-precision model to a lower-precision format after training is complete. PTQ is generally simpler to implement as it doesn't require retraining or access to the original training dataset. However, it can sometimes lead to a noticeable decrease in model accuracy, especially for complex models performing tasks like object detection or image segmentation. QAT, by contrast, simulates quantization during training, making the model inherently more robust to precision reduction. This often results in higher accuracy for the final quantized model compared to PTQ, although it requires more computational resources and access to training data. For models like YOLO-NAS, which incorporates quantization-friendly blocks, QAT can yield significant performance benefits with minimal precision loss.

QAT vs. Gemischte Präzision

While both techniques involve numerical precision, their goals differ. Mixed Precision training primarily aims to speed up the training process itself and reduce memory usage during training by using a combination of lower-precision (e.g., 16-bit float or FP16) and standard-precision (32-bit float) formats for computations and storage. QAT specifically focuses on optimizing the model for efficient inference using low-precision integer formats (like INT8) after model deployment. While mixed precision helps during training, QAT ensures the final model performs well under the constraints of quantized inference hardware, such as NPUs (Neural Processing Units) or TPUs.

Anwendungen von QAT in der realen Welt

Quantisierungssensitives Training ist entscheidend für den Einsatz anspruchsvoller KI-Modelle in ressourcenbeschränkten Umgebungen, in denen Effizienz entscheidend ist.

On-Device Computer Vision: Running complex computer vision models like Ultralytics YOLOv8 directly on smartphones for applications like real-time object detection in augmented reality apps or image classification within photo management tools. QAT allows these models to run efficiently without significant battery drain or latency.
Edge AI in Automotive and Robotics: Deploying models for tasks like pedestrian detection or lane keeping assist in autonomous vehicles or for object manipulation in robotics. QAT enables these models to run on specialized hardware like Google Edge TPUs or NVIDIA Jetson, ensuring low inference latency for critical real-time decisions. This is crucial for applications like security alarm systems or parking management.

Ultralytics supports exporting models to various formats like ONNX, TensorRT, and TFLite, which are compatible with QAT workflows, enabling efficient deployment across diverse hardware. You can manage and deploy your QAT-optimized models using platforms like Ultralytics HUB. Evaluating model performance using relevant metrics after QAT is essential to ensure accuracy requirements are met.

Quantisierungsorientiertes Training (QAT)

Trainiere YOLO Modelle einfach
mit Ultralytics HUB

Flexible Unternehmenslizenzierungslösung für deine Innovation

Trainiere KI-Modelle in Sekundenschnelle mit Ultralytics YOLO

Trainiere YOLO Modelle einfach mit Ultralytics HUB

So funktioniert quantisierungsorientiertes Training

Abgrenzung zu verwandten Konzepten

QAT vs. Modellquantisierung (Post-Training)

QAT vs. Gemischte Präzision

Anwendungen von QAT in der realen Welt

Mehr Blogs lesen

Werde Mitglied der Ultralytics Community

Quantisierungsorientiertes Training (QAT)

Trainiere YOLO Modelle einfachmit Ultralytics HUB

Flexible Unternehmenslizenzierungslösung für deine Innovation

Trainiere KI-Modelle in Sekundenschnelle mit Ultralytics YOLO

Trainiere YOLO Modelle einfach mit Ultralytics HUB

So funktioniert quantisierungsorientiertes Training

Abgrenzung zu verwandten Konzepten

QAT vs. Modellquantisierung (Post-Training)

QAT vs. Gemischte Präzision

Anwendungen von QAT in der realen Welt

Mehr Blogs lesen

Werde Mitglied der Ultralytics Community

Trainiere YOLO Modelle einfach
mit Ultralytics HUB