AW Dev Rethought

🌟 The best way to predict the future is to invent it - Alan Kay

🧠 AI with Python – 📈 Comparative ROC Curves


Description:

When building classification models, accuracy alone is often not enough to determine which model performs best. Different models may behave differently depending on the decision threshold used.

This is where ROC curves (Receiver Operating Characteristic) and AUC (Area Under the Curve) come in — providing a deeper, threshold-independent way to evaluate model performance.

In this project, we compare multiple models using ROC curves plotted on the same graph to visually identify the strongest performer.


Understanding the Problem

Classification models output probabilities, but converting those probabilities into class labels requires a threshold (commonly 0.5).

Changing this threshold affects:

  • True Positive Rate (Sensitivity)
  • False Positive Rate

A single accuracy score cannot capture this trade-off.

ROC curves solve this by evaluating model performance across all possible thresholds.


What Is a ROC Curve?

A ROC curve plots:

  • False Positive Rate (FPR) → x-axis
  • True Positive Rate (TPR) → y-axis

Each point on the curve represents a different classification threshold.

The AUC (Area Under Curve) summarizes overall performance:

  • AUC = 1.0 → perfect model
  • AUC = 0.5 → random guessing
  • Higher AUC → better class separation

1. Training Multiple Models

We begin by training different classification models on the same dataset.

lr.fit(X_train, y_train)
dt.fit(X_train, y_train)
rf.fit(X_train, y_train)

Each model learns a different representation of the data.


2. Generating Prediction Probabilities

ROC curves require probability scores, not class labels.

y_probs = model.predict_proba(X_test)[:, 1]

These probabilities represent how confident the model is about the positive class.


3. Computing ROC Curves and AUC

We calculate FPR, TPR, and AUC for each model.

from sklearn.metrics import roc_curve, roc_auc_score

fpr, tpr, _ = roc_curve(y_test, y_probs)
auc_score = roc_auc_score(y_test, y_probs)

These values define the ROC curve for a model.


4. Plotting Comparative ROC Curves

All models are plotted on the same graph for comparison.

plt.plot(fpr, tpr, label=f"Model (AUC = {auc_score:.3f})")
plt.plot([0, 1], [0, 1], linestyle="--")

The diagonal line represents random guessing.


How to Interpret the Plot

  • Curve closer to top-left corner → better model
  • Higher AUC → stronger performance
  • Curves overlapping → similar models
  • Curve below diagonal → worse than random

Comparative plots help quickly identify the best-performing model.


Why Comparative ROC Matters

Using ROC curves across multiple models allows us to:

  • Compare models fairly across thresholds
  • Understand trade-offs between sensitivity and specificity
  • Select models based on real-world needs (not just accuracy)
  • Build more reliable classification systems

Key Takeaways

  1. ROC curves evaluate model performance across all thresholds.
  2. AUC summarizes how well a model separates classes.
  3. Comparative ROC plots help rank multiple models visually.
  4. Better models stay closer to the top-left corner.
  5. A more reliable metric than accuracy for classification problems.

Conclusion

Comparative ROC curves provide a powerful and intuitive way to evaluate multiple classification models. By visualizing how models behave across thresholds, we gain deeper insight into their strengths and weaknesses.

This makes ROC analysis an essential tool in the Advanced Visualization & Interpretability module of the AI with Python series.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve, roc_auc_score


# 🧩 Load the Dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data into Train and Test Sets
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# 🤖 Train Multiple Classification Models
lr = LogisticRegression(max_iter=5000)
dt = DecisionTreeClassifier(random_state=42)
rf = RandomForestClassifier(n_estimators=200, random_state=42)

lr.fit(X_train, y_train)
dt.fit(X_train, y_train)
rf.fit(X_train, y_train)


# 📊 Compute ROC Curves and AUC Scores
models = {
    "Logistic Regression": lr,
    "Decision Tree": dt,
    "Random Forest": rf
}

roc_data = {}

for name, model in models.items():
    y_probs = model.predict_proba(X_test)[:, 1]
    fpr, tpr, _ = roc_curve(y_test, y_probs)
    auc_score = roc_auc_score(y_test, y_probs)

    roc_data[name] = {
        "fpr": fpr,
        "tpr": tpr,
        "auc": auc_score
    }


# 📈 Plot Comparative ROC Curves
plt.figure(figsize=(7, 6))

for name, values in roc_data.items():
    plt.plot(
        values["fpr"],
        values["tpr"],
        label=f"{name} (AUC = {values['auc']:.3f})"
    )

# Reference line for random guessing
plt.plot([0, 1], [0, 1], linestyle="--", label="Random Guess")

plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Comparative ROC Curves Across Multiple Models")
plt.legend()
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!