⚡️ Saturday ML Spark – 📈 Model Monitoring & Performance Tracking
Posted on: June 13, 2026
Description:
Training a machine learning model is only the beginning of its lifecycle. Once deployed, a model starts interacting with real users, real data, and constantly changing environments.
A model that performs well today may gradually become less effective tomorrow. This is why model monitoring is a critical part of production machine learning systems.
In this project, we explore how to track model performance over time and detect potential issues before they impact business outcomes.
Why Monitoring Matters
Many machine learning projects focus heavily on model training but ignore what happens after deployment.
In reality, deployed models face:
- changing user behaviour
- evolving business conditions
- new data patterns
- seasonal trends
- unexpected anomalies
Without monitoring, performance issues may remain unnoticed for weeks or months.
What Is Model Monitoring?
Model monitoring is the process of continuously tracking the behaviour and performance of machine learning systems after deployment.
The goal is to answer questions such as:
- Is model accuracy changing?
- Are prediction patterns shifting?
- Is confidence decreasing?
- Has user behaviour changed?
- Is retraining required?
Monitoring helps maintain trust in production ML systems.
1. Generate Predictions
The first step is generating predictions from the deployed model.
predictions = model.predict(X_test)
Along with predictions, we also collect confidence scores.
prediction_probs = model.predict_proba(X_test)[:, 1]
These probabilities provide valuable insights into model certainty.
2. Create Monitoring Metrics
Monitoring systems typically track metrics over time.
Example metrics include:
- accuracy
- prediction volume
- confidence scores
monitoring_data = pd.DataFrame(...)
These metrics are usually collected daily or hourly.
3. Track Accuracy Trends
One of the simplest monitoring signals is accuracy.
plt.plot(
monitoring_data["date"],
monitoring_data["accuracy"]
)
A gradual decline may indicate:
- changing data distributions
- feature drift
- concept drift
4. Monitor Prediction Volume
Monitoring prediction volume helps identify traffic changes.
plt.plot(
monitoring_data["date"],
monitoring_data["prediction_volume"]
)
Sudden spikes or drops can signal:
- upstream issues
- user behavior changes
- data pipeline failures
5. Monitor Confidence Scores
Prediction confidence is another valuable signal.
plt.plot(
monitoring_data["date"],
monitoring_data["avg_confidence"]
)
Declining confidence often indicates the model is encountering unfamiliar data.
6. Detect Performance Degradation
Monitoring systems frequently include simple alerting logic.
if latest_accuracy < 0.90:
print("Alert")
Real-world systems typically trigger:
- emails
- Slack notifications
- PagerDuty alerts
- monitoring dashboards
when thresholds are breached.
Common Reasons Models Degrade
Performance degradation may occur because of:
Data Drift
Input data distribution changes over time.
Feature Drift
Feature values begin behaving differently than during training.
Concept Drift
The relationship between inputs and outputs changes.
Business Changes
New products, users, or workflows alter model assumptions.
Key Metrics to Track
Production ML teams often monitor:
- Accuracy
- Precision
- Recall
- F1 Score
- Prediction Volume
- Confidence Scores
- Latency
- Error Rate
Different applications prioritize different metrics.
Where Model Monitoring Is Used
Monitoring is essential in:
- fraud detection systems
- recommendation engines
- healthcare ML
- financial risk models
- customer analytics platforms
Any production ML system requires visibility into performance.
Key Takeaways
- Model performance can change after deployment.
- Monitoring helps detect issues before users are affected.
- Accuracy alone is not sufficient for production systems.
- Confidence and prediction volume provide valuable signals.
- Monitoring is a core component of MLOps and production ML.
Conclusion
Deploying a model is not the end of the machine learning journey. Real-world ML systems require continuous monitoring to ensure predictions remain accurate, reliable, and aligned with changing business conditions.
This strengthens the ML Systems track in Saturday ML Spark ⚡️, focusing on operational practices that keep machine learning systems healthy long after deployment.
Code Snippet:
# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 🧩 Load Dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.3,
random_state=42,
stratify=y
)
# =========================================================
# 🤖 Train Model
# =========================================================
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)
# =========================================================
# 📊 Generate Predictions
# =========================================================
predictions = model.predict(X_test)
prediction_probs = model.predict_proba(X_test)[:, 1]
accuracy = accuracy_score(y_test, predictions)
print("Current Model Accuracy:", round(accuracy, 4))
# =========================================================
# 📝 Simulated Monitoring Data
# =========================================================
monitoring_data = pd.DataFrame({
"date": pd.date_range(
start="2025-01-01",
periods=10,
freq="D"
),
"accuracy": [
0.97, 0.96, 0.97, 0.95, 0.94,
0.93, 0.92, 0.91, 0.90, 0.89
],
"prediction_volume": [
500, 520, 510, 540, 550,
560, 590, 600, 610, 620
],
"avg_confidence": [
0.95, 0.95, 0.94, 0.93, 0.92,
0.91, 0.90, 0.89, 0.88, 0.87
]
})
print("\nMonitoring Data:")
print(monitoring_data)
# =========================================================
# 📈 Accuracy Trend
# =========================================================
plt.figure(figsize=(8, 4))
plt.plot(
monitoring_data["date"],
monitoring_data["accuracy"],
marker="o"
)
plt.title("Model Accuracy Trend")
plt.xlabel("Date")
plt.ylabel("Accuracy")
plt.grid(True)
plt.tight_layout()
plt.show()
# =========================================================
# 📈 Prediction Volume Trend
# =========================================================
plt.figure(figsize=(8, 4))
plt.plot(
monitoring_data["date"],
monitoring_data["prediction_volume"],
marker="o"
)
plt.title("Prediction Volume Trend")
plt.xlabel("Date")
plt.ylabel("Prediction Count")
plt.grid(True)
plt.tight_layout()
plt.show()
# =========================================================
# 📈 Confidence Trend
# =========================================================
plt.figure(figsize=(8, 4))
plt.plot(
monitoring_data["date"],
monitoring_data["avg_confidence"],
marker="o"
)
plt.title("Average Prediction Confidence")
plt.xlabel("Date")
plt.ylabel("Confidence")
plt.grid(True)
plt.tight_layout()
plt.show()
# =========================================================
# 🚨 Simple Monitoring Alerts
# =========================================================
latest_accuracy = monitoring_data["accuracy"].iloc[-1]
latest_confidence = monitoring_data["avg_confidence"].iloc[-1]
print("\n=== Monitoring Alerts ===")
if latest_accuracy < 0.90:
print("⚠️ ALERT: Accuracy has dropped below threshold")
else:
print("✅ Accuracy healthy")
if latest_confidence < 0.88:
print("⚠️ ALERT: Confidence score decreasing")
else:
print("✅ Confidence healthy")
# =========================================================
# 📊 Monitoring Summary
# =========================================================
summary = {
"Average Accuracy":
monitoring_data["accuracy"].mean(),
"Average Prediction Volume":
monitoring_data["prediction_volume"].mean(),
"Average Confidence":
monitoring_data["avg_confidence"].mean()
}
print("\nMonitoring Summary:")
for metric, value in summary.items():
print(f"{metric}: {round(value, 4)}")
# =========================================================
# 💾 Save Monitoring Metrics
# =========================================================
monitoring_data.to_csv(
"model_monitoring_metrics.csv",
index=False
)
print("\nMonitoring metrics saved successfully.")
# =========================================================
# 📂 Read Saved Metrics
# =========================================================
saved_metrics = pd.read_csv(
"model_monitoring_metrics.csv"
)
print("\nSaved Metrics Preview:")
print(saved_metrics.head())
No comments yet. Be the first to comment!