⚡️ Saturday ML Sparks – ROC Curve & AUC Comparison 📈🧠

Posted on: November 22, 2025

Description:

Understanding the Problem

Most classification models output a probability score.

But choosing a fixed threshold like 0.5 may not always be ideal — especially in:

imbalanced datasets
medical predictions
fraud detection
risk assessment

ROC curves show the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR) at every possible threshold.

AUC gives a single number summarizing performance:

AUC = 1.0 → perfect classifier
AUC = 0.5 → random guessing
Higher AUC = better ability to separate classes

1. Load Dataset & Split into Train/Test Sets

We’ll use the Breast Cancer dataset — a classic binary classification dataset.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

2. Train Two Models for Comparison

We’ll compare Logistic Regression vs Random Forest.

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

log_reg = LogisticRegression(max_iter=5000)
rf = RandomForestClassifier(n_estimators=300, random_state=42)

log_reg.fit(X_train, y_train)
rf.fit(X_train, y_train)

3. Compute ROC Curves & AUC Scores

from sklearn.metrics import roc_curve, roc_auc_score

# Predicted probabilities for the positive class
log_proba = log_reg.predict_proba(X_test)[:, 1]
rf_proba = rf.predict_proba(X_test)[:, 1]

log_fpr, log_tpr, _ = roc_curve(y_test, log_proba)
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_proba)

log_auc = roc_auc_score(y_test, log_proba)
rf_auc = roc_auc_score(y_test, rf_proba)

print("Logistic Regression AUC:", log_auc)
print("Random Forest AUC:", rf_auc)

3. Compute ROC Curves & AUC Scores

from sklearn.metrics import roc_curve, roc_auc_score

# Predicted probabilities for the positive class
log_proba = log_reg.predict_proba(X_test)[:, 1]
rf_proba = rf.predict_proba(X_test)[:, 1]

log_fpr, log_tpr, _ = roc_curve(y_test, log_proba)
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_proba)

log_auc = roc_auc_score(y_test, log_proba)
rf_auc = roc_auc_score(y_test, rf_proba)

print("Logistic Regression AUC:", log_auc)
print("Random Forest AUC:", rf_auc)

4. Plot ROC Curves

import matplotlib.pyplot as plt

plt.figure(figsize=(7,6))

plt.plot(log_fpr, log_tpr, label=f"Logistic Regression (AUC = {log_auc:.3f})")
plt.plot(rf_fpr, rf_tpr, label=f"Random Forest (AUC = {rf_auc:.3f})")

plt.plot([0,1], [0,1], "k--", label="Random Guess")

plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve Comparison")
plt.legend()
plt.grid(True)
plt.show()

This visualization immediately shows which model separates classes better.

Key Takeaways

ROC curves show performance across all thresholds, not just at 0.5.
AUC is a powerful single-number metric that summarizes separation quality.
Higher AUC = better classifier, regardless of threshold tuning.
Logistic Regression often performs well in linear problems, while Random Forests capture non-linear patterns, often boosting AUC.
ROC curves are essential for medical, fraud, risk, and imbalanced classification tasks.

Conclusion

ROC and AUC are fundamental to evaluating classification models beyond accuracy and precision/recall.

They help visualize threshold behavior, compare classifiers, and understand model robustness.

By comparing Logistic Regression and Random Forest, we see how different models behave across probability thresholds — a critical insight for real-world deployments.

Code Snippet:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve, roc_auc_score


# Load the Breast Cancer dataset (binary classification)
data = load_breast_cancer()
X, y = data.data, data.target

# Split train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42, stratify=y
)


# Initialize models
log_reg = LogisticRegression(max_iter=5000)
rf = RandomForestClassifier(n_estimators=300, random_state=42)

# Train
log_reg.fit(X_train, y_train)
rf.fit(X_train, y_train)


# Get predicted probabilities for the positive class
log_proba = log_reg.predict_proba(X_test)[:, 1]
rf_proba = rf.predict_proba(X_test)[:, 1]

# ROC curve points
log_fpr, log_tpr, _ = roc_curve(y_test, log_proba)
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_proba)

# AUC scores
log_auc = roc_auc_score(y_test, log_proba)
rf_auc = roc_auc_score(y_test, rf_proba)

print("Logistic Regression AUC:", log_auc)
print("Random Forest AUC:", rf_auc)


plt.figure(figsize=(7,6))

plt.plot(log_fpr, log_tpr, label=f"Logistic Regression (AUC = {log_auc:.3f})")
plt.plot(rf_fpr, rf_tpr, label=f"Random Forest (AUC = {rf_auc:.3f})")

# Random baseline
plt.plot([0,1], [0,1], "k--", label="Random Guess")

plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve Comparison")
plt.legend()
plt.grid(True)
plt.show()

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

⚡️ Saturday ML Sparks – ROC Curve & AUC Comparison 📈🧠

Description:

Understanding the Problem

1. Load Dataset & Split into Train/Test Sets

2. Train Two Models for Comparison

3. Compute ROC Curves & AUC Scores

3. Compute ROC Curves & AUC Scores

4. Plot ROC Curves

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

⚡️ Saturday ML Sparks – ROC Curve & AUC Comparison 📈🧠

Description:

Understanding the Problem

1. Load Dataset & Split into Train/Test Sets

2. Train Two Models for Comparison

3. Compute ROC Curves & AUC Scores

3. Compute ROC Curves & AUC Scores

4. Plot ROC Curves

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

⚡️ Saturday ML Sparks – Clustering with KMeans 🔷🧠

⚡️ Saturday ML Sparks – Hyperparameter Tuning with GridSearchCV 🎛🧠

⚡️ Saturday ML Sparks – Cross-Validation Made Easy 🔄🧠

Comments