Glossary

Label Smoothing

Enhance AI model accuracy and robustness with label smoothing—a proven technique to improve generalization and reduce overconfidence.

Train YOLO models simply
with Ultralytics HUB

Learn more

Label smoothing is a regularization technique used during the training of classification models in machine learning (ML) and deep learning (DL). Its primary goal is to prevent the model from becoming overconfident in its predictions. Instead of training the model using "hard" labels (where the correct class is assigned a probability of 1 and all other classes 0), label smoothing uses "soft" labels. This means the correct class is assigned a slightly lower probability (e.g., 0.9), and the small remaining probability is distributed evenly among the incorrect classes. This technique encourages the model to be less certain about its predictions, which can lead to better generalization and improved performance on unseen data. It was notably discussed in the Rethinking the Inception Architecture paper.

How Label Smoothing Works

In standard classification tasks, models are often trained using a loss function like cross-entropy, which penalizes the model based on how far its predicted probability distribution is from the target distribution (hard labels). With hard labels, the model is pushed to make the output probability for the correct class extremely close to 1 and others close to 0. This can lead to overfitting, where the model learns the training data too well, including its noise, and performs poorly on new data. Label smoothing modifies the target labels by assigning a small probability value (epsilon) to the incorrect classes and reducing the probability of the correct class by the total amount distributed. This prevents the model from producing excessively large logit values for the correct class, promoting a less confident, potentially more robust model.

Benefits of Label Smoothing

  • Improved Generalization: By discouraging overconfidence, models often generalize better to unseen data.
  • Better Model Calibration: The predicted probabilities tend to be a more accurate reflection of the true likelihood of correctness. You can learn more about model calibration in statistics.
  • Increased Robustness: Models can become more resilient to noisy labels or minor variations in the input data.
  • Reduced Overfitting: It acts as a regularizer, helping to mitigate overfitting, similar in spirit to techniques like Dropout or Data Augmentation, although it operates directly on the target labels.

Applications of Label Smoothing

Label smoothing is widely applicable, particularly in classification tasks across different domains:

  • Image Classification: When training deep neural networks like Ultralytics YOLO models on large datasets such as ImageNet, label smoothing can contribute to higher validation accuracy. This is particularly useful in fields like medical image analysis where calibrated probability estimates are important.
  • Natural Language Processing (NLP): In tasks like machine translation or training large language models (LLMs) like BERT or GPT, label smoothing helps improve the fluency and generalization of the models by preventing them from assigning absolute certainty to specific word predictions. Frameworks like PyTorch and TensorFlow often include options for label smoothing in their loss functions.

Considerations

While generally beneficial, label smoothing might slightly slow down the model's convergence during training. The extent of its benefit can also depend on the dataset and model architecture. The smoothing factor (epsilon) itself is a hyperparameter that may require tuning for optimal results. It is often integrated into the training pipelines of modern frameworks and platforms like Ultralytics HUB.

Read all