🧠 AI with Python – 🎯 Threshold Tuning

Posted on: May 12, 2026

Description:

Most classification models don’t directly predict classes — they predict probabilities. The final decision of whether something belongs to class 0 or class 1 depends on a threshold.

By default, this threshold is set to 0.5. But in real-world scenarios, this default is often far from optimal.

In this project, we explore how threshold tuning can significantly improve model usefulness by aligning predictions with real-world requirements.

Understanding the Problem

A classification model outputs probabilities like:

0.2 → likely class 0
0.8 → likely class 1

To convert probabilities into predictions, we apply a rule: If probability ≥ 0.5 → classify as positive

However, this assumption is arbitrary and may not match the actual needs of the problem.

Why the Default Threshold Fails

Different problems have different priorities:

Fraud detection → missing fraud is costly → prefer high recall
Spam detection → false alerts are annoying → prefer high precision
Medical diagnosis → avoid missing cases → minimize false negatives

A single threshold (0.5) cannot satisfy all these scenarios.

Using Probabilities Instead of Labels

Instead of directly using predictions:

y_pred = model.predict(X_test)

we use probabilities:

y_probs = model.predict_proba(X_test)[:, 1]

This gives us full control over decision-making.

Applying Custom Thresholds

We can define our own threshold:

y_pred = (y_probs >= threshold).astype(int)

Now, changing the threshold directly changes how the model behaves.

Understanding the Trade-Off

Adjusting the threshold affects key metrics:

Lower threshold:
- more positives predicted
- higher recall
- lower precision
Higher threshold:
- fewer positives predicted
- higher precision
- lower recall

This trade-off is central to classification problems.

Evaluating Multiple Thresholds

We test multiple threshold values and compute metrics.

for threshold in thresholds:
    y_pred = (y_probs >= threshold).astype(int)

For each threshold, we measure:

Precision
Recall
F1 Score

Visualizing the Impact

Plotting metrics across thresholds helps identify the best balance.

plt.plot(thresholds, precision)
plt.plot(thresholds, recall)
plt.plot(thresholds, f1)

This makes it easier to select the right threshold.

Choosing the Best Threshold

The “best” threshold depends on the objective:

maximize F1 → balanced performance
maximize recall → detect more positives
maximize precision → reduce false positives

There is no universal best value — it depends on the problem.

Why This Matters in Real Systems

Threshold tuning is widely used in:

fraud detection systems
medical diagnosis
recommendation engines
anomaly detection

It allows models to behave differently without retraining.

Key Takeaways

Classification models output probabilities, not final decisions.
The default threshold of 0.5 is not always optimal.
Lower thresholds increase recall but reduce precision.
Higher thresholds increase precision but reduce recall.
Threshold tuning aligns ML models with real-world goals.

Conclusion

Threshold tuning is a simple yet powerful technique that transforms how classification models are used in practice. By adjusting the decision boundary, we can align model predictions with real-world priorities, making them far more useful and reliable.

This is an essential concept in the Advanced ML track of the AI with Python series — helping you move from model training to decision optimization.

Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# =========================================================
# 🤖 Train Model
# =========================================================

model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)


# =========================================================
# 📊 Get Prediction Probabilities
# =========================================================

y_probs = model.predict_proba(X_test)[:, 1]


# =========================================================
# 🎯 Default Threshold (0.5)
# =========================================================

y_pred_default = (y_probs >= 0.5).astype(int)

print("=== Default Threshold = 0.5 ===")
print("Precision:", precision_score(y_test, y_pred_default))
print("Recall:", recall_score(y_test, y_pred_default))
print("F1 Score:", f1_score(y_test, y_pred_default))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_default))


# =========================================================
# 🔁 Evaluate Multiple Thresholds
# =========================================================

thresholds = np.arange(0.1, 0.91, 0.1)

results = []

for threshold in thresholds:
    y_pred = (y_probs >= threshold).astype(int)

    results.append({
        "threshold": threshold,
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })

results_df = pd.DataFrame(results)

print("\n=== Threshold Tuning Results ===")
print(results_df)


# =========================================================
# 📈 Plot Metrics vs Threshold
# =========================================================

plt.figure(figsize=(8, 5))

plt.plot(results_df["threshold"], results_df["precision"], marker="o", label="Precision")
plt.plot(results_df["threshold"], results_df["recall"], marker="o", label="Recall")
plt.plot(results_df["threshold"], results_df["f1"], marker="o", label="F1 Score")

plt.xlabel("Threshold")
plt.ylabel("Score")
plt.title("Threshold Tuning for Classification")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# ✅ Select Best Threshold (based on F1 Score)
# =========================================================

best_threshold = results_df.loc[results_df["f1"].idxmax(), "threshold"]

print("\nBest Threshold based on F1 Score:", best_threshold)


# =========================================================
# 🔍 Predictions using Best Threshold
# =========================================================

y_pred_best = (y_probs >= best_threshold).astype(int)

print("\n=== Using Best Threshold ===")
print("Precision:", precision_score(y_test, y_pred_best))
print("Recall:", recall_score(y_test, y_pred_best))
print("F1 Score:", f1_score(y_test, y_pred_best))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_best))

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🎯 Threshold Tuning

Description:

Understanding the Problem

Why the Default Threshold Fails

Using Probabilities Instead of Labels

Applying Custom Thresholds

Understanding the Trade-Off

Evaluating Multiple Thresholds

Visualizing the Impact

Choosing the Best Threshold

Why This Matters in Real Systems

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 🎯 Threshold Tuning

Description:

Understanding the Problem

Why the Default Threshold Fails

Using Probabilities Instead of Labels

Applying Custom Thresholds

Understanding the Trade-Off

Evaluating Multiple Thresholds

Visualizing the Impact

Choosing the Best Threshold

Why This Matters in Real Systems

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – ⚖️ Handling Imbalanced Data

🧠 AI with Python – ⚔️ LightGBM vs RandomForest

🧠 AI with Python – 🚀 XGBoost for Tabular Data

Comments