Overfitting vs Underfitting with Pictures

Introduction

Overfitting and underfitting are the most common reasons machine learning models fail to generalize. This CDPL picture first guide shows exactly what each one looks like, explains the bias variance trade off, and gives quick checks and fixes you can apply in scikit learn.

Quick Definitions

Underfitting: the model is too simple to learn the pattern. It has high bias and performs poorly on both train and test data.
Overfitting: the model is too complex and memorizes noise. It has high variance, high train accuracy, but poor test accuracy.

Visual Intuition: Lines and Curves

The same dataset can be modeled with different complexity. The pictures below are the fastest way to build intuition.

Underfitting (linear line on curved data): the model misses curvature and shows large errors everywhere.
Good fit: the curve follows the trend without chasing every point.
Overfitting (wiggly curve): the model bends through every point, including noise, and fails on new data.

Bias Variance Trade Off

Model error = bias² + variance + irreducible noise. Increasing complexity reduces bias but increases variance; simplifying reduces variance but increases bias. You want the minimum of total error on unseen data.

How to Detect Overfitting and Underfitting

Fast checks

Train vs validation gap: huge gap = overfitting; both bad = underfitting.
Learning curves: plot score vs number of training examples.
Cross validation: stable scores across folds signal generalization.

# Learning curve example (scikit-learn)
from sklearn.model_selection import learning_curve
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.metrics import r2_score
import numpy as np

X, y = make_regression(n_samples=1500, n_features=20, noise=15, random_state=42)
est = Ridge(alpha=1.0)
train_sizes, train_scores, val_scores = learning_curve(
    est, X, y, cv=5, scoring="r2", train_sizes=np.linspace(0.1, 1.0, 6), random_state=42
)
print(train_sizes, train_scores.mean(axis=1), val_scores.mean(axis=1))

Use learning curves to see if more data helps and if the gap is shrinking

How to Fix Underfitting

Increase model capacity (polynomial features, deeper trees, kernels).
Reduce regularization strength (lower alpha for Ridge/Lasso, lower C in SVM).
Add better features (domain signals, interactions, non linear transforms).
Decrease bias in algorithms (switch from linear to tree based or kernel methods).

How to Fix Overfitting

Regularize: L1/L2, dropout for deep nets, pruning for trees.
Simplify: reduce depth, fewer parameters, early stopping.
More and cleaner data: collect more examples, remove label noise, stratify splits.
Cross validation and ensembling: average models to reduce variance.

# Regularization and polynomial features
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LassoCV

model = make_pipeline(
    PolynomialFeatures(degree=8, include_bias=False),
    LassoCV(cv=5, random_state=42)
)
model.fit(X, y)  # high-degree curve with regularization to tame variance

Use regularization when you need expressive features but want control over variance

Pictures You Can Recreate Quickly

# Underfit vs good fit vs overfit on 1D curve
import numpy as np, matplotlib.pyplot as plt
rng = np.random.default_rng(42)
X = np.linspace(-3, 3, 80).reshape(-1, 1)
y = np.sin(X).ravel() + 0.2 * rng.normal(size=X.shape[0])

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge

m_under = LinearRegression()
m_good = make_pipeline(PolynomialFeatures(5), Ridge(alpha=0.5))
m_over = make_pipeline(PolynomialFeatures(16), Ridge(alpha=1e-6))

for m in [m_under, m_good, m_over]:
    m.fit(X, y)

xx = np.linspace(-3, 3, 400).reshape(-1, 1)
plt.scatter(X, y, s=18)
plt.plot(xx, m_under.predict(xx), label="Underfit")
plt.plot(xx, m_good.predict(xx), label="Good fit")
plt.plot(xx, m_over.predict(xx), label="Overfit")
plt.legend(); plt.show()

Three curves on the same data make the differences obvious for teams and reports

Checklist for Projects at CDPL

Always keep a validation set separate from training.
Plot learning curves before scaling up complexity.
Track train vs validation metrics in one dashboard.
Prefer simple models that meet the target metric; complexity must justify maintenance cost.

FAQ

Does more data always fix overfitting Often but not always. If labels are noisy or features leak, more data will not help.

Is regularization mandatory When features are many or correlated, yes. It stabilizes estimates and improves generalization.

Tree models do not need scaling; can they still overfit Yes. Limit depth, use min samples per split, and try ensembles like Random Forests.

Conclusion

Overfitting and underfitting are two sides of the same generalization problem. Use visuals, validation, and learning curves to diagnose quickly, then apply regularization, data improvements, and the right level of model complexity. With these habits, CDPL learners and partner teams can ship models that perform well on real world data.

Introduction

Quick Definitions

Underfitting: the model is too simple to learn the pattern. It has high bias and performs poorly on both train and test data.
Overfitting: the model is too complex and memorizes noise. It has high variance, high train accuracy, but poor test accuracy.

Visual Intuition: Lines and Curves

The same dataset can be modeled with different complexity. The pictures below are the fastest way to build intuition.

Underfitting (linear line on curved data): the model misses curvature and shows large errors everywhere.
Good fit: the curve follows the trend without chasing every point.
Overfitting (wiggly curve): the model bends through every point, including noise, and fails on new data.

Bias Variance Trade Off

How to Detect Overfitting and Underfitting

Fast checks

Train vs validation gap: huge gap = overfitting; both bad = underfitting.
Learning curves: plot score vs number of training examples.
Cross validation: stable scores across folds signal generalization.

# Learning curve example (scikit-learn)
from sklearn.model_selection import learning_curve
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.metrics import r2_score
import numpy as np

X, y = make_regression(n_samples=1500, n_features=20, noise=15, random_state=42)
est = Ridge(alpha=1.0)
train_sizes, train_scores, val_scores = learning_curve(
    est, X, y, cv=5, scoring="r2", train_sizes=np.linspace(0.1, 1.0, 6), random_state=42
)
print(train_sizes, train_scores.mean(axis=1), val_scores.mean(axis=1))

Use learning curves to see if more data helps and if the gap is shrinking

How to Fix Underfitting

Increase model capacity (polynomial features, deeper trees, kernels).
Reduce regularization strength (lower alpha for Ridge/Lasso, lower C in SVM).
Add better features (domain signals, interactions, non linear transforms).
Decrease bias in algorithms (switch from linear to tree based or kernel methods).

How to Fix Overfitting

Regularize: L1/L2, dropout for deep nets, pruning for trees.
Simplify: reduce depth, fewer parameters, early stopping.
More and cleaner data: collect more examples, remove label noise, stratify splits.
Cross validation and ensembling: average models to reduce variance.

# Regularization and polynomial features
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LassoCV

model = make_pipeline(
    PolynomialFeatures(degree=8, include_bias=False),
    LassoCV(cv=5, random_state=42)
)
model.fit(X, y)  # high-degree curve with regularization to tame variance

Use regularization when you need expressive features but want control over variance

Pictures You Can Recreate Quickly

# Underfit vs good fit vs overfit on 1D curve
import numpy as np, matplotlib.pyplot as plt
rng = np.random.default_rng(42)
X = np.linspace(-3, 3, 80).reshape(-1, 1)
y = np.sin(X).ravel() + 0.2 * rng.normal(size=X.shape[0])

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge

m_under = LinearRegression()
m_good = make_pipeline(PolynomialFeatures(5), Ridge(alpha=0.5))
m_over = make_pipeline(PolynomialFeatures(16), Ridge(alpha=1e-6))

for m in [m_under, m_good, m_over]:
    m.fit(X, y)

xx = np.linspace(-3, 3, 400).reshape(-1, 1)
plt.scatter(X, y, s=18)
plt.plot(xx, m_under.predict(xx), label="Underfit")
plt.plot(xx, m_good.predict(xx), label="Good fit")
plt.plot(xx, m_over.predict(xx), label="Overfit")
plt.legend(); plt.show()

Three curves on the same data make the differences obvious for teams and reports

Checklist for Projects at CDPL

Always keep a validation set separate from training.
Plot learning curves before scaling up complexity.
Track train vs validation metrics in one dashboard.
Prefer simple models that meet the target metric; complexity must justify maintenance cost.

FAQ

Does more data always fix overfitting Often but not always. If labels are noisy or features leak, more data will not help.

Is regularization mandatory When features are many or correlated, yes. It stabilizes estimates and improves generalization.

Tree models do not need scaling; can they still overfit Yes. Limit depth, use min samples per split, and try ensembles like Random Forests.

Introduction

Quick Definitions

Visual Intuition: Lines and Curves

Bias Variance Trade Off

How to Detect Overfitting and Underfitting

Fast checks

How to Fix Underfitting

How to Fix Overfitting

Pictures You Can Recreate Quickly

Checklist for Projects at CDPL

FAQ

Conclusion

Tags

Share this article

Overfitting vs Underfitting with Pictures

Introduction

Quick Definitions

Visual Intuition: Lines and Curves

Bias Variance Trade Off

How to Detect Overfitting and Underfitting

Fast checks

How to Fix Underfitting

How to Fix Overfitting

Pictures You Can Recreate Quickly

Checklist for Projects at CDPL

FAQ

Conclusion

Tags

Share this article

Ready for Career Guidance?