용어집

추론 엔진

추론 엔진이 실시간 예측을 제공하고, 모델을 최적화하고, 크로스 플랫폼 배포를 지원하여 어떻게 AI를 강화하는지 알아보세요.

In the realm of artificial intelligence (AI) and machine learning (ML), an inference engine is a crucial software or hardware component responsible for executing trained models to make predictions on new, unseen data. After a model has learned patterns during the training phase, the inference engine takes this trained model and applies it to real-world inputs. This process, known as inference, allows AI systems to perform tasks like object detection, image classification, or natural language processing (NLP) in practical applications. It's essentially the operational heart of a deployed AI model, translating learned knowledge into actionable outputs efficiently.

추론 엔진의 작동 방식

An inference engine utilizes a pre-trained model, often developed using deep learning (DL) frameworks like PyTorch or TensorFlow, which encapsulates the knowledge needed for a specific task. When new data (e.g., an image, audio clip, or text sentence) is provided as input, the inference engine processes it through the model's computational structure (often a neural network). This generates an output, such as identifying objects with bounding boxes in an image, transcribing speech, or classifying sentiment. Ultralytics YOLO models, for instance, depend on efficient inference engines to achieve real-time object detection and segmentation across various platforms, from powerful cloud servers to resource-constrained edge devices. The performance of the inference engine directly impacts the application's speed and responsiveness, often measured by inference latency and throughput.

최적화 및 주요 기능

A key role of modern inference engines is optimization. Running a large, trained deep learning model directly can be computationally expensive and slow. Inference engines employ various techniques to make models faster and more efficient, enabling deployment on diverse hardware. Common model optimization strategies include:

Model Quantization: Reducing the precision of model weights (e.g., from 32-bit floating-point to 8-bit integers) to decrease model size and speed up computation, often with minimal impact on accuracy.
Model Pruning: Removing redundant or unimportant connections (weights) within the neural network to create a smaller, faster model.
Graph Optimization: Fusing layers or rearranging operations in the model's computational graph to improve execution efficiency on specific hardware.
Hardware Acceleration: Leveraging specialized processors like GPUs, TPUs, or dedicated AI accelerators found on devices like Google Edge TPU or NVIDIA Jetson.

Many inference engines also support standardized model formats like ONNX (Open Neural Network Exchange), which allows models trained in one framework (like PyTorch) to be run using a different engine or platform. Popular inference engines include NVIDIA TensorRT, Intel's OpenVINO, and TensorFlow Lite. Ultralytics models support export to various formats compatible with these engines, detailed in the Model Deployment Options guide.

Inference Engine vs. Training Framework

It's important to distinguish inference engines from training frameworks.

Training Frameworks (e.g., PyTorch, TensorFlow, Keras): These are comprehensive libraries used for building, training, and validating machine learning models. They provide tools for defining network architectures, implementing backpropagation, managing datasets, and calculating loss functions. The focus is on flexibility and the learning process.
Inference Engines (e.g., TensorRT, OpenVINO, ONNX Runtime): These are specialized tools designed to run pre-trained models efficiently for prediction tasks (model deployment). Their primary focus is on optimizing for speed (low latency), low memory usage, and compatibility with target hardware. They often take models trained using frameworks and convert them into an optimized format.

실제 애플리케이션

Inference engines are critical for deploying AI in practical scenarios:

Autonomous Vehicles: Self-driving cars (like those developed by Waymo) rely heavily on efficient inference engines running on embedded hardware (like NVIDIA Jetson platforms) to process sensor data (cameras, LiDAR) in real-time. Engines optimize complex computer vision models like YOLO for tasks such as object detection (detecting cars, pedestrians, signs) and semantic segmentation (understanding road layout) with minimal delay, which is crucial for safety. Explore more about AI in automotive solutions.
Medical Image Analysis: Inference engines accelerate the analysis of medical scans (X-rays, CT, MRI) for tasks like detecting tumors (see Brain Tumor Dataset) or anomalies. Optimized models deployed via inference engines can run quickly on hospital servers or specialized medical devices, assisting radiologists (read about AI in Radiology) by providing faster diagnoses or second opinions. Check out AI in healthcare solutions.

In essence, inference engines bridge the gap between trained AI models and their practical application, ensuring that sophisticated AI capabilities can be delivered efficiently and effectively across a wide range of devices and platforms, including managing models via platforms like Ultralytics HUB.

추론 엔진

YOLO 모델을 Ultralytics HUB로 간단히
훈련

혁신을 지원하는 유연한 엔터프라이즈 라이선싱 솔루션

다음을 사용하여 몇 초 만에 AI 모델을 훈련하세요. Ultralytics YOLO

Ultralytics HUB로 간단히 YOLO 모델 교육

추론 엔진의 작동 방식

최적화 및 주요 기능

Inference Engine vs. Training Framework

실제 애플리케이션

블로그 더 보기

Ultralytics 커뮤니티 가입하기

추론 엔진

YOLO 모델을 Ultralytics HUB로 간단히훈련

혁신을 지원하는 유연한 엔터프라이즈 라이선싱 솔루션

다음을 사용하여 몇 초 만에 AI 모델을 훈련하세요. Ultralytics YOLO

Ultralytics HUB로 간단히 YOLO 모델 교육

추론 엔진의 작동 방식

최적화 및 주요 기능

Inference Engine vs. Training Framework

실제 애플리케이션

블로그 더 보기

Ultralytics 커뮤니티 가입하기

YOLO 모델을 Ultralytics HUB로 간단히
훈련