Khám phá cách Knowledge Distillation nén các mô hình AI để suy luận nhanh hơn, cải thiện độ chính xác và hiệu quả triển khai thiết bị biên.
Knowledge Distillation is a technique in machine learning (ML) where a smaller, compact model (the "student") is trained to mimic the behavior of a larger, more complex model (the "teacher"). The primary goal is to transfer the "knowledge" learned by the teacher model to the student model, enabling the student to achieve comparable performance but with significantly lower computational requirements, such as reduced size and faster inference latency. This makes complex deep learning (DL) models practical for deployment on resource-constrained environments like mobile devices or edge computing platforms. The concept was popularized by Geoffrey Hinton and colleagues in their paper "Distilling the Knowledge in a Neural Network".
The process typically involves a pre-trained teacher model, which could be a single powerful model or an ensemble of models known for high accuracy. The student model, usually with fewer parameters or a shallower architecture (e.g., a smaller Convolutional Neural Network (CNN)), is then trained using the outputs of the teacher model as guidance. Instead of only using the hard labels (the ground truth) from the training data, the student often learns from the teacher's "soft targets"—the full probability distributions predicted by the teacher across all classes. These soft targets contain richer information about how the teacher model generalizes and represents similarities between classes. A special loss function, often called distillation loss, is used to minimize the difference between the student's predictions and the teacher's soft targets, sometimes combined with a standard loss calculated using the actual labels.
Chưng cất kiến thức mang lại một số lợi thế chính:
Knowledge Distillation is widely used across various domains:
Knowledge Distillation is related to but distinct from other model optimization techniques:
Knowledge Distillation is a powerful tool for making state-of-the-art AI models more accessible and efficient, bridging the gap between large-scale research models and practical, real-world model deployment. Platforms like Ultralytics HUB facilitate the training and deployment of potentially distilled models like YOLOv8 or YOLO11.