AW Dev Rethought

🌟 The best way to predict the future is to invent it - Alan Kay

🧠 AI with Python – 📉 Residuals vs Predicted Plot (Regression)


Description:

When working with regression models, evaluating model performance goes beyond metrics like R² or Mean Squared Error. Even a model with strong numerical performance can still violate important assumptions.

One of the most useful diagnostic tools in regression analysis is the Residuals vs Predicted plot, which helps visualize how prediction errors behave across the range of predicted values.


Understanding the Problem

A regression model predicts continuous values, but the true test of its reliability lies in how the errors (residuals) behave.

Residuals are defined as:

Residual = Actual Value − Predicted Value

Ideally, residuals should be randomly scattered around zero. If patterns appear in the residuals, it may indicate that the model is missing important relationships in the data.


What Is a Residual Plot?

A Residuals vs Predicted plot displays:

  • Predicted values on the x-axis
  • Residual errors on the y-axis

This visualization helps detect issues such as:

  • Non-linear relationships
  • Unequal error variance (heteroscedasticity)
  • Outliers
  • Model misspecification

1. Training a Regression Model

We first train a regression model using a structured dataset.

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

The model learns a linear relationship between the features and the target variable.


2. Generating Predictions

After training, we generate predictions on unseen test data.

y_pred = model.predict(X_test)

These predictions will be compared with the actual values to compute residuals.


3. Computing Residuals

Residuals represent the error between actual and predicted values.

residuals = y_test - y_pred

These values form the basis for the residual analysis.


4. Visualizing Residuals vs Predictions

We now plot residuals against predicted values.

plt.scatter(y_pred, residuals)
plt.axhline(0, linestyle="--")

The horizontal line represents zero error, making it easier to observe deviations.


How to Interpret the Residual Plot

A good regression model typically produces:

  • Random scatter around zero → model fits well

However, certain patterns reveal problems:

  • Curved pattern → missing non-linear relationship
  • Funnel shape → heteroscedasticity (changing variance)
  • Clusters or structure → missing variables
  • Extreme points → potential outliers

These insights help guide model improvements.


Why Residual Analysis Matters

Residual diagnostics allow us to:

  • Validate regression assumptions
  • Identify model limitations
  • Detect non-linear relationships
  • Improve feature engineering

Without residual analysis, important model issues may go unnoticed.


Key Takeaways

  1. Residuals measure prediction errors in regression models.
  2. Random scatter around zero indicates a well-fitted model.
  3. Patterns in residuals signal model problems.
  4. Residual plots help detect heteroscedasticity and non-linearity.
  5. A fundamental diagnostic tool for regression analysis.

Conclusion

Residuals vs predicted plots provide a powerful visual diagnostic for regression models. By examining how prediction errors behave, we gain deeper insight into model assumptions and potential weaknesses. This makes residual analysis an essential part of building reliable regression models within the Advanced Visualization & Interpretability module of the AI with Python series.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression


# 🧩 Load the Dataset
data = fetch_california_housing()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data into Train and Test Sets
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42
)


# 🤖 Train the Regression Model
model = LinearRegression()
model.fit(X_train, y_train)


# 📊 Generate Predictions
y_pred = model.predict(X_test)


# 📉 Compute Residuals
residuals = y_test - y_pred


# 📈 Plot Residuals vs Predicted Values
plt.figure(figsize=(6, 6))

plt.scatter(y_pred, residuals, alpha=0.6)

plt.axhline(0, linestyle="--")

plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.title("Residuals vs Predicted Plot")

plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!