Deploy Quantized YOLOv8 Models on Edge Devices

Introduction to quantization and deploying quantized models

More quantization-friendly model architectures

Using different models on different hardware

Welcome to the recap of another insightful talk from our YOLO VISION 2023 (YV23) event, held at the vibrant Google for Startups Campus in Madrid. This talk was delivered by Shashi Chilappagar, Chief Architect and Co-Founder at DeGirum. It delved into the fascinating world of quantization and deploying quantized models, exploring key challenges, solutions, and future possibilities.

Introduction to quantization and deploying quantized models

Shashi provided a comprehensive overview of quantization, highlighting its importance in optimizing Ultralytics YOLO models for deployment on edge devices. From discussing the basics to exploring approaches for improving quantization, attendees gained valuable insights into the intricacies of model porting and deployment.

Challenges in quantizing YOLO models

Quantization often poses challenges, particularly with YOLO models in TFLite. Our audience learned about the significant drop in accuracy observed when all outputs are quantized with the same scale/zero point, shedding light on the complexities of maintaining model accuracy during the quantization process.

Improving quantization of YOLO models

Fortunately, solutions exist to address these challenges. The introduction of the DigiRAM fork offers a quantization-friendly approach by separating outputs and optimizing bounding box decoding. With these enhancements, quantized model accuracy sees a significant improvement from baseline levels.

More quantization-friendly model architectures

Exploring new model architectures is key to minimizing quantization loss. Attendees discovered how replacing CILU with bounded Relu6 activation leads to minimal quantization loss, offering promising results for maintaining accuracy in quantized models.

Deploying quantized models

Deploying quantized models has never been easier, with just five lines of code needed to run any model on the Digitim cloud platform. A live code demo showcased the simplicity of detecting objects with a quantized Ultralytics YOLOv5 model, highlighting the seamless integration of quantized models into real-world applications.

To this effect, Ultralytics provides a variety of model deployment options, enabling end-users to effectively deploy their applications on embedded and edge devices. Different export formats include OpenVINO, TorchScript, TensorRT, CoreML, TFlite, and TFlite EDGE TPU, offering versatility and compatibility.

This integration with third-party applications for deployment allows users to assess the performance of our models in real-world scenarios.

Using different models on different hardware

Attendees also gained insights into the versatility of deploying different models on various hardware platforms, showcasing how a single codebase can support multiple models across different accelerators. Examples of running different detection tasks on diverse hardware platforms demonstrated the flexibility and scalability of our approach.

Resources and documentation

To empower attendees further, we introduced a comprehensive resources section, providing access to our cloud platform, examples, documentation, and more. Our goal is to ensure that everyone has the tools and support they need to succeed in deploying quantized models effectively.

Wrapping up

As the field of quantization evolves, it's essential to stay informed and engaged. We're committed to providing ongoing support and resources to help you navigate this exciting journey. Check out the full talk here!

Join us as we continue to explore the latest trends and innovations in machine learning and artificial intelligence. Together, we're shaping the future of technology and driving positive change in the world.

Deploying quantized Ultralytics YOLOv8 models on edge devices with DeGirum

Introduction to quantization and deploying quantized models

Challenges in quantizing YOLO models

Improving quantization of YOLO models

More quantization-friendly model architectures

Deploying quantized models

Using different models on different hardware

Resources and documentation

Wrapping up

Read more in this category

Key highlights from Ultralytics at CVPR 2025

Ultralytics at IOT Solutions World Congress 2025

Key highlights from Ultralytics at South Summit 2025

Let’s build the future
of AI together!

Deploying quantized Ultralytics YOLOv8 models on edge devices with DeGirum

Introduction to quantization and deploying quantized models

Challenges in quantizing YOLO models

Improving quantization of YOLO models

More quantization-friendly model architectures

Deploying quantized models

Using different models on different hardware

Resources and documentation

Wrapping up

Read more in this category

Key highlights from Ultralytics at CVPR 2025

Ultralytics at IOT Solutions World Congress 2025

Key highlights from Ultralytics at South Summit 2025

Let’s build the future of AI together!

Let’s build the future
of AI together!