Glossary

Underfitting

Learn how to identify, prevent, and address underfitting in machine learning models with expert tips, strategies, and real-world examples.

Train YOLO models simply
with Ultralytics HUB

Learn more

In the realm of machine learning (ML), achieving optimal model performance requires finding a balance between simplicity and complexity. Underfitting is a common issue where a model is too simplistic to capture the underlying patterns present in the training data. This means the model fails to learn effectively, leading to poor performance not only on the data it was trained on but also on new, unseen data (test data or real-world inputs). An underfit model lacks the necessary capacity or training time to represent the relationships within the data accurately, resulting in high bias and an inability to generalize well.

What Causes Underfitting?

Several factors can contribute to an underfit model:

  • Insufficient Model Complexity: The chosen model might be too simple for the complexity of the data. For example, using a basic linear regression model for data with non-linear patterns, or using a neural network (NN) with too few layers or neurons.
  • Inadequate Feature Engineering: The input features provided to the model might not contain enough relevant information or might not represent the underlying patterns effectively.
  • Insufficient Training Data: The model may not have seen enough examples to learn the underlying patterns. This is particularly true for complex deep learning models. Having diverse and representative data is crucial, which can be explored through platforms like Ultralytics datasets.
  • Training Too Short: The model training process might be stopped prematurely, before it has had enough epochs to learn the patterns in the data.
  • Excessive Regularization: Techniques used to prevent overfitting, such as L1 or L2 regularization or high dropout rates, can sometimes overly constrain the model, preventing it from learning necessary patterns if applied too strongly.

Identifying Underfitting

Underfitting is typically diagnosed by evaluating the model's performance during and after training:

  • High Training Error: The model performs poorly even on the data it was trained on. Key metrics like accuracy, precision, recall, or F1 score are low, and the loss function value remains high.
  • High Validation/Test Error: The model also performs poorly on unseen validation data or test data. The performance gap between training and validation error is usually small, but both errors are unacceptably high.
  • Learning Curves: Plotting the training and validation loss/metrics against training epochs can reveal underfitting. If both curves plateau at a high error level, the model is likely underfitting. You can monitor these using tools like TensorBoard or Weights & Biases. Understanding specific YOLO performance metrics is also vital.

Addressing Underfitting

Several strategies can help overcome underfitting:

  • Increase Model Complexity: Use a more powerful model architecture with more parameters, layers, or neurons. For instance, switching from a simpler CNN to a more advanced architecture like Ultralytics YOLO11 for object detection tasks.
  • Improve Feature Engineering: Create more informative features from the existing data or incorporate new relevant data sources.
  • Increase Training Duration: Train the model for more epochs to allow it sufficient time to learn the data patterns. Check model training tips for guidance.
  • Reduce Regularization: Decrease the strength of regularization techniques (e.g., lower the regularization parameter lambda, reduce dropout probability).
  • Ensure Sufficient Data: Gather more training examples. If collecting more data is infeasible, techniques like data augmentation can artificially increase the diversity of the training data. Managing datasets can be streamlined using platforms like Ultralytics HUB.

Underfitting vs. Overfitting

Underfitting and overfitting are two sides of the same coin, representing failures in model generalization.

  • Underfitting: The model is too simple (high bias). It fails to capture the underlying trends in the data, resulting in poor performance on both training and test sets.
  • Overfitting: The model is too complex (high variance). It learns the training data too well, including noise and random fluctuations, leading to excellent performance on the training set but poor performance on unseen data.

The goal in ML is to find a sweet spot between underfitting and overfitting, often discussed in the context of the bias-variance tradeoff, where the model learns the true underlying patterns without memorizing the noise.

Real-World Examples of Underfitting

  1. Simple Image Classifier: Training a very basic Convolutional Neural Network (CNN) (e.g., with only one or two convolutional layers) on a complex image classification task like classifying thousands of object categories in ImageNet. The model would likely underfit because its limited capacity prevents it from learning the intricate features needed to distinguish between many classes effectively. Both training and validation accuracy would remain low.
  2. Basic Predictive Maintenance: Using a simple linear model to predict machine failure based only on operating temperature. If failures are actually influenced by a complex interplay of factors like vibration, age, pressure, and temperature non-linearities, the linear model will underfit. It cannot capture the true complexity, leading to poor predictive modeling performance and failing to anticipate failures accurately. Utilizing more complex models or better features would be necessary. Frameworks like PyTorch or TensorFlow offer tools to build more sophisticated models.
Read all