🧠 AI with Python - 📈 ROC Curve & AUC Score


Description:

Visualizing ROC Curve with scikit-learn

Evaluating a classification model goes beyond accuracy and confusion matrices.

One of the most powerful tools is the ROC Curve (Receiver Operating Characteristic), which helps us understand how well a model distinguishes between classes across different thresholds.


What is the ROC Curve?

The ROC Curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various classification thresholds.

  • TPR (Recall): Out of all actual positives, how many did the model correctly predict?
  • FPR: Out of all actual negatives, how many were incorrectly predicted as positive?

An ideal model will have a curve that hugs the top-left corner, showing high TPR with low FPR.


Generating the ROC Curve

Scikit-learn provides an easy way to compute the FPR, TPR, and thresholds using the roc_curve function.

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_test, y_proba[:, 1])

Here:

  • y_test → True labels.
  • y_proba[:, 1] → Predicted probability of the positive class.

Plotting the ROC Curve

Once we have FPR and TPR, we can visualize the curve using matplotlib.

import matplotlib.pyplot as plt

plt.plot(fpr, tpr, label="ROC Curve")
plt.plot([0, 1], [0, 1], linestyle="--", color="gray", label="Random Guess")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()

The diagonal line represents a random classifier. The closer the ROC curve is to the top-left, the better.


AUC – Summarizing the ROC Curve

The Area Under the Curve (AUC) condenses the ROC performance into a single number between 0 and 1.

  • 1.0 = Perfect classifier
  • 0.5 = Random guessing
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_test, y_proba[:, 1])
print("AUC Score:", auc)

Sample Output

You should see a curve that rises above the diagonal line, with the AUC score printed in the console.

Example output:

AUC: 0.9956

Example of ROC Curve

Figure: ROC Curve – Breast Cancer Classification


Why It Matters

  • ROC Curve shows performance across thresholds, not just one fixed cut-off.
  • AUC provides a single, interpretable metric of separability.
  • Useful when dealing with imbalanced datasets, where accuracy alone is misleading.

Code Snippet:

# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt


data = load_breast_cancer()
X, y = data.data, data.target  # target: 0 = malignant, 1 = benign

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)


model = LogisticRegression(max_iter=2000, random_state=42)
model.fit(X_train, y_train)

# Predicted probabilities for the positive class (label 1)
y_proba = model.predict_proba(X_test)[:, 1]


fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)
print("AUC:", round(auc, 4))


plt.figure(figsize=(7, 6))
plt.plot(fpr, tpr, label=f"Logistic Regression (AUC = {auc:.3f})", linewidth=2)
plt.plot([0, 1], [0, 1], 'k--', label="Random baseline")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve – Breast Cancer Classification")
plt.legend(loc="lower right")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!