🧠 AI with Python – 📉 Residuals vs Predicted Plot (Regression)
Posted on: March 17, 2026
Description:
When working with regression models, evaluating model performance goes beyond metrics like R² or Mean Squared Error. Even a model with strong numerical performance can still violate important assumptions.
One of the most useful diagnostic tools in regression analysis is the Residuals vs Predicted plot, which helps visualize how prediction errors behave across the range of predicted values.
Understanding the Problem
A regression model predicts continuous values, but the true test of its reliability lies in how the errors (residuals) behave.
Residuals are defined as:
Residual = Actual Value − Predicted Value
Ideally, residuals should be randomly scattered around zero. If patterns appear in the residuals, it may indicate that the model is missing important relationships in the data.
What Is a Residual Plot?
A Residuals vs Predicted plot displays:
- Predicted values on the x-axis
- Residual errors on the y-axis
This visualization helps detect issues such as:
- Non-linear relationships
- Unequal error variance (heteroscedasticity)
- Outliers
- Model misspecification
1. Training a Regression Model
We first train a regression model using a structured dataset.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
The model learns a linear relationship between the features and the target variable.
2. Generating Predictions
After training, we generate predictions on unseen test data.
y_pred = model.predict(X_test)
These predictions will be compared with the actual values to compute residuals.
3. Computing Residuals
Residuals represent the error between actual and predicted values.
residuals = y_test - y_pred
These values form the basis for the residual analysis.
4. Visualizing Residuals vs Predictions
We now plot residuals against predicted values.
plt.scatter(y_pred, residuals)
plt.axhline(0, linestyle="--")
The horizontal line represents zero error, making it easier to observe deviations.
How to Interpret the Residual Plot
A good regression model typically produces:
- Random scatter around zero → model fits well
However, certain patterns reveal problems:
- Curved pattern → missing non-linear relationship
- Funnel shape → heteroscedasticity (changing variance)
- Clusters or structure → missing variables
- Extreme points → potential outliers
These insights help guide model improvements.
Why Residual Analysis Matters
Residual diagnostics allow us to:
- Validate regression assumptions
- Identify model limitations
- Detect non-linear relationships
- Improve feature engineering
Without residual analysis, important model issues may go unnoticed.
Key Takeaways
- Residuals measure prediction errors in regression models.
- Random scatter around zero indicates a well-fitted model.
- Patterns in residuals signal model problems.
- Residual plots help detect heteroscedasticity and non-linearity.
- A fundamental diagnostic tool for regression analysis.
Conclusion
Residuals vs predicted plots provide a powerful visual diagnostic for regression models. By examining how prediction errors behave, we gain deeper insight into model assumptions and potential weaknesses. This makes residual analysis an essential part of building reliable regression models within the Advanced Visualization & Interpretability module of the AI with Python series.
Code Snippet:
# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# 🧩 Load the Dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# ✂️ Split Data into Train and Test Sets
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.3,
random_state=42
)
# 🤖 Train the Regression Model
model = LinearRegression()
model.fit(X_train, y_train)
# 📊 Generate Predictions
y_pred = model.predict(X_test)
# 📉 Compute Residuals
residuals = y_test - y_pred
# 📈 Plot Residuals vs Predicted Values
plt.figure(figsize=(6, 6))
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.title("Residuals vs Predicted Plot")
plt.tight_layout()
plt.show()
No comments yet. Be the first to comment!