AW Dev Rethought

Truth can only be found in one place: the code - Robert C. Martin

🧠 AI with Python – 🎯 Threshold Tuning


Description:

Most classification models don’t directly predict classes — they predict probabilities. The final decision of whether something belongs to class 0 or class 1 depends on a threshold.

By default, this threshold is set to 0.5. But in real-world scenarios, this default is often far from optimal.

In this project, we explore how threshold tuning can significantly improve model usefulness by aligning predictions with real-world requirements.


Understanding the Problem

A classification model outputs probabilities like:

  • 0.2 → likely class 0
  • 0.8 → likely class 1

To convert probabilities into predictions, we apply a rule: If probability ≥ 0.5 → classify as positive

However, this assumption is arbitrary and may not match the actual needs of the problem.


Why the Default Threshold Fails

Different problems have different priorities:

  • Fraud detection → missing fraud is costly → prefer high recall
  • Spam detection → false alerts are annoying → prefer high precision
  • Medical diagnosis → avoid missing cases → minimize false negatives

A single threshold (0.5) cannot satisfy all these scenarios.


Using Probabilities Instead of Labels

Instead of directly using predictions:

y_pred = model.predict(X_test)

we use probabilities:

y_probs = model.predict_proba(X_test)[:, 1]

This gives us full control over decision-making.


Applying Custom Thresholds

We can define our own threshold:

y_pred = (y_probs >= threshold).astype(int)

Now, changing the threshold directly changes how the model behaves.


Understanding the Trade-Off

Adjusting the threshold affects key metrics:

  • Lower threshold:
    • more positives predicted
    • higher recall
    • lower precision
  • Higher threshold:
    • fewer positives predicted
    • higher precision
    • lower recall

This trade-off is central to classification problems.


Evaluating Multiple Thresholds

We test multiple threshold values and compute metrics.

for threshold in thresholds:
    y_pred = (y_probs >= threshold).astype(int)

For each threshold, we measure:

  • Precision
  • Recall
  • F1 Score

Visualizing the Impact

Plotting metrics across thresholds helps identify the best balance.

plt.plot(thresholds, precision)
plt.plot(thresholds, recall)
plt.plot(thresholds, f1)

This makes it easier to select the right threshold.


Choosing the Best Threshold

The “best” threshold depends on the objective:

  • maximize F1 → balanced performance
  • maximize recall → detect more positives
  • maximize precision → reduce false positives

There is no universal best value — it depends on the problem.


Why This Matters in Real Systems

Threshold tuning is widely used in:

  • fraud detection systems
  • medical diagnosis
  • recommendation engines
  • anomaly detection

It allows models to behave differently without retraining.


Key Takeaways

  1. Classification models output probabilities, not final decisions.
  2. The default threshold of 0.5 is not always optimal.
  3. Lower thresholds increase recall but reduce precision.
  4. Higher thresholds increase precision but reduce recall.
  5. Threshold tuning aligns ML models with real-world goals.

Conclusion

Threshold tuning is a simple yet powerful technique that transforms how classification models are used in practice. By adjusting the decision boundary, we can align model predictions with real-world priorities, making them far more useful and reliable.

This is an essential concept in the Advanced ML track of the AI with Python series — helping you move from model training to decision optimization.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# =========================================================
# 🤖 Train Model
# =========================================================

model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)


# =========================================================
# 📊 Get Prediction Probabilities
# =========================================================

y_probs = model.predict_proba(X_test)[:, 1]


# =========================================================
# 🎯 Default Threshold (0.5)
# =========================================================

y_pred_default = (y_probs >= 0.5).astype(int)

print("=== Default Threshold = 0.5 ===")
print("Precision:", precision_score(y_test, y_pred_default))
print("Recall:", recall_score(y_test, y_pred_default))
print("F1 Score:", f1_score(y_test, y_pred_default))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_default))


# =========================================================
# 🔁 Evaluate Multiple Thresholds
# =========================================================

thresholds = np.arange(0.1, 0.91, 0.1)

results = []

for threshold in thresholds:
    y_pred = (y_probs >= threshold).astype(int)

    results.append({
        "threshold": threshold,
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })

results_df = pd.DataFrame(results)

print("\n=== Threshold Tuning Results ===")
print(results_df)


# =========================================================
# 📈 Plot Metrics vs Threshold
# =========================================================

plt.figure(figsize=(8, 5))

plt.plot(results_df["threshold"], results_df["precision"], marker="o", label="Precision")
plt.plot(results_df["threshold"], results_df["recall"], marker="o", label="Recall")
plt.plot(results_df["threshold"], results_df["f1"], marker="o", label="F1 Score")

plt.xlabel("Threshold")
plt.ylabel("Score")
plt.title("Threshold Tuning for Classification")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# ✅ Select Best Threshold (based on F1 Score)
# =========================================================

best_threshold = results_df.loc[results_df["f1"].idxmax(), "threshold"]

print("\nBest Threshold based on F1 Score:", best_threshold)


# =========================================================
# 🔍 Predictions using Best Threshold
# =========================================================

y_pred_best = (y_probs >= best_threshold).astype(int)

print("\n=== Using Best Threshold ===")
print("Precision:", precision_score(y_test, y_pred_best))
print("Recall:", recall_score(y_test, y_pred_best))
print("F1 Score:", f1_score(y_test, y_pred_best))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_best))

Link copied!

Comments

Add Your Comment

Comment Added!