AW Dev Rethought

Code is read far more often than it is written - Guido van Rossum

🧠 AI with Python – 🔍 SHAP Values for Model Explanation


Description:

As machine learning models grow more complex, understanding why they make certain predictions becomes just as important as prediction accuracy itself. In many real-world systems — healthcare, finance, customer analytics — black-box models are not acceptable.

In this project, we use SHAP (SHapley Additive exPlanations) to interpret a trained machine learning model and clearly explain how each feature contributes to its predictions.


Understanding the Problem

Modern models such as Random Forests and Gradient Boosting capture non-linear patterns extremely well. However, they do not naturally explain their decisions.

This creates several challenges:

  • stakeholders want reasoning, not just outputs
  • debugging incorrect predictions becomes difficult
  • trust and compliance requirements are unmet

Model interpretability is no longer optional — it is part of responsible ML system design.


What Is SHAP?

SHAP is an interpretability technique based on Shapley values from game theory.

The central idea is simple:

Each feature receives a contribution score based on how much it influenced the prediction.

SHAP provides:

  • global explanations — which features matter most overall
  • local explanations — why a specific prediction occurred
  • consistency — explanations are mathematically grounded

1. Training a Model to Explain

We begin by training a tree-based classification model on tabular data.

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

model = RandomForestClassifier(
    n_estimators=200,
    random_state=42
)
model.fit(X_train, y_train)

Tree-based models integrate especially well with SHAP.


2. Generating SHAP Values

Once the model is trained, we initialize the SHAP explainer.

import shap

explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test)

Since this is a binary classification problem, SHAP values are generated for each class.


3. Global Feature Importance

To understand which features influence predictions overall, we visualize SHAP values for the positive class.

shap.plots.beeswarm(
    shap_values[:, :, 1],
    max_display=15
)

This plot reveals:

  • the most influential features
  • whether higher or lower values increase predictions
  • how consistent each feature’s impact is across samples

4. Explaining a Single Prediction

SHAP also allows us to explain individual predictions in detail.

sample_index = 0

shap.plots.waterfall(
    shap_values[sample_index, :, 1],
    max_display=15
)

This visualization shows how each feature pushes the prediction towards or away from the positive class.


Why SHAP Matters in Real-World ML

Interpretability helps answer critical questions:

  • Why did the model make this decision?
  • Is the model relying on sensible features?
  • Are there hidden biases or leakage?

SHAP makes ML systems auditable, debuggable, and trustworthy, which is essential for production deployment.


Key Takeaways

  1. SHAP explains complex ML models using principled, consistent logic.
  2. It supports both global and local model interpretability.
  3. Classification models require selecting a specific class for explanation.
  4. Visual explanations improve trust and debugging.
  5. Interpretability is a core requirement for real-world ML systems.

Conclusion

SHAP transforms black-box models into transparent decision systems. By clearly showing how features contribute to predictions, SHAP bridges the gap between high model performance and human understanding.

This project demonstrates how advanced visualization and interpretability elevate machine learning from experimentation to responsible, production-ready systems — making SHAP an essential tool in the AI with Python series.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt
import shap

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier


# 🧩 Load Dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# 🤖 Train Classification Model
model = RandomForestClassifier(
    n_estimators=200,
    random_state=42
)
model.fit(X_train, y_train)


# 🔎 Initialize SHAP Explainer (Modern & Stable)
explainer = shap.Explainer(model, X_train)

# Compute SHAP values
shap_values = explainer(X_test)

# NOTE:
# shap_values shape = (samples, features, classes)
# This is expected for binary classification


# 📊 Global Feature Importance (Beeswarm Plot)
# Explaining the positive class (class = 1)
shap.plots.beeswarm(
    shap_values[:, :, 1],
    max_display=15
)


# 📊 Alternative Global Importance (Bar Plot)
shap.plots.bar(
    shap_values[:, :, 1],
    max_display=15
)


# 🔍 Local Explanation (Single Prediction)
sample_index = 0

shap.plots.waterfall(
    shap_values[sample_index, :, 1],
    max_display=15
)


# 📌 Optional: Show plots explicitly (for non-Jupyter environments)
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!