Discover how real-time inference with Ultralytics YOLO enables instant predictions for AI applications like autonomous driving and security systems.
Real-time inference refers to the process where a trained machine learning (ML) model makes predictions or decisions immediately as new data arrives. Unlike batch inference, which processes data in groups collected over time, real-time inference prioritizes low latency and instant responses. This capability is essential for applications requiring immediate feedback or action based on live data streams, enabling systems to react dynamically to changing conditions, aligning with the principles of real-time computing.
In practice, real-time inference means deploying an ML model, such as an Ultralytics YOLO model for computer vision (CV), so it can analyze individual data inputs (like video frames or sensor readings) and produce outputs with minimal delay. The key performance metric is inference latency, the time taken from receiving an input to generating a prediction. Achieving low latency often involves several strategies, including optimizing the model itself and leveraging specialized hardware and software.
The primary difference lies in how data is processed and the associated latency requirements:
Real-time inference powers many modern Artificial Intelligence (AI) applications where instantaneous decision-making is crucial:
Making models run fast enough for real-time applications often requires significant optimization:
Models like Ultralytics YOLO11 are designed with efficiency and accuracy in mind, making them well-suited for real-time object detection tasks. Platforms like Ultralytics HUB provide tools to train, optimize (e.g., export to ONNX or TensorRT formats), and deploy models, facilitating the implementation of real-time inference solutions across various deployment options.