AI Insights: Overfitting vs Underfitting: How ML Models Go Wrong
Posted On: July 13, 2025 | 2 min read
Building a machine learning model isn’t just about achieving high accuracy on training data—it’s about ensuring it performs well on new, unseen data. Two common pitfalls that prevent this are overfitting and underfitting. 🤖
What is Underfitting?:
Underfitting happens when the model is too simple to capture the underlying patterns in the data.
- Symptoms:
- Low accuracy on both training and test sets.
- Model ignores important features or relationships.
- Example: Using a straight line to approximate non-linear data.
What is Overfitting?:
Overfitting occurs when the model memorizes the training data, including noise and outliers, instead of learning the general pattern.
- Symptoms:
- High training accuracy but poor test accuracy.
- Model fails to generalize to new data.
- Example: A decision tree that keeps branching until every training point is perfectly classified.
Figure: Training vs Testing Accuracy curves showing Overfitting and Underfitting
How to Detect & Prevent?:
- Detect: Compare training vs. test accuracy. A big gap usually indicates overfitting.
- Prevent:
- Use regularization (like Lasso or Ridge).
- Reduce model complexity.
- Use more training data or apply cross-validation.
Example Code (Detecting Overfitting):
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Model with very deep tree (likely to overfit)
clf = DecisionTreeClassifier(max_depth=None)
clf.fit(X_train, y_train)
print("Train Accuracy:", clf.score(X_train, y_train))
print("Test Accuracy:", clf.score(X_test, y_test))
A large difference between train and test accuracy here often indicates overfitting.
Conclusion:
Overfitting and underfitting are common problems, but with careful model selection, regularization, and validation strategies, you can build models that strike the right balance and deliver reliable predictions. 🚀
No comments yet. Be the first to comment!