Learn how bounding boxes enable object detection, AI, and machine learning systems. Explore their role in computer vision applications!
A bounding box is a rectangular frame used in computer vision (CV) to indicate the location and approximate extent of an object within an image or video frame. Typically defined by the coordinates of their top-left and bottom-right corners (or center point, width, and height), these boxes provide a simple yet effective method for specifying where an object is situated and how much space it occupies. Bounding boxes are fundamental components in various CV tasks, including object detection, object tracking, and image annotation, forming a cornerstone of many modern Artificial Intelligence (AI) and machine learning (ML) systems. They are essential for enabling machines to understand not just what objects are present, but also where they are located in a visual scene.
Bounding boxes are crucial for both training and evaluating object detection models. In tasks tackled by models like Ultralytics YOLO, bounding boxes serve as the "ground truth" during the training process. This means they represent the correct location and size of objects in the training data, teaching the model to precisely locate objects. This process often begins with careful data annotation, where humans or automated tools draw these boxes around objects in images, frequently using platforms like CVAT or integrating with platforms like Ultralytics HUB for dataset management. During inference, the trained model predicts bounding boxes around detected objects, along with class labels and confidence scores. This localization ability is vital for applications requiring not just object identification but also their exact position.
Bounding boxes are integral to numerous practical AI applications: