AW Dev Rethought

🕵️ Debugging is like being the detective in a crime movie where you are also the murderer - Filipe Fortes

⚡️ Saturday ML Spark – 📈 Model Monitoring & Performance Tracking


Description:

Training a machine learning model is only the beginning of its lifecycle. Once deployed, a model starts interacting with real users, real data, and constantly changing environments.

A model that performs well today may gradually become less effective tomorrow. This is why model monitoring is a critical part of production machine learning systems.

In this project, we explore how to track model performance over time and detect potential issues before they impact business outcomes.


Why Monitoring Matters

Many machine learning projects focus heavily on model training but ignore what happens after deployment.

In reality, deployed models face:

  • changing user behaviour
  • evolving business conditions
  • new data patterns
  • seasonal trends
  • unexpected anomalies

Without monitoring, performance issues may remain unnoticed for weeks or months.


What Is Model Monitoring?

Model monitoring is the process of continuously tracking the behaviour and performance of machine learning systems after deployment.

The goal is to answer questions such as:

  • Is model accuracy changing?
  • Are prediction patterns shifting?
  • Is confidence decreasing?
  • Has user behaviour changed?
  • Is retraining required?

Monitoring helps maintain trust in production ML systems.


1. Generate Predictions

The first step is generating predictions from the deployed model.

predictions = model.predict(X_test)

Along with predictions, we also collect confidence scores.

prediction_probs = model.predict_proba(X_test)[:, 1]

These probabilities provide valuable insights into model certainty.


2. Create Monitoring Metrics

Monitoring systems typically track metrics over time.

Example metrics include:

  • accuracy
  • prediction volume
  • confidence scores
monitoring_data = pd.DataFrame(...)

These metrics are usually collected daily or hourly.


3. Track Accuracy Trends

One of the simplest monitoring signals is accuracy.

plt.plot(
    monitoring_data["date"],
    monitoring_data["accuracy"]
)

A gradual decline may indicate:

  • changing data distributions
  • feature drift
  • concept drift

4. Monitor Prediction Volume

Monitoring prediction volume helps identify traffic changes.

plt.plot(
    monitoring_data["date"],
    monitoring_data["prediction_volume"]
)

Sudden spikes or drops can signal:

  • upstream issues
  • user behavior changes
  • data pipeline failures

5. Monitor Confidence Scores

Prediction confidence is another valuable signal.

plt.plot(
    monitoring_data["date"],
    monitoring_data["avg_confidence"]
)

Declining confidence often indicates the model is encountering unfamiliar data.


6. Detect Performance Degradation

Monitoring systems frequently include simple alerting logic.

if latest_accuracy < 0.90:
    print("Alert")

Real-world systems typically trigger:

  • emails
  • Slack notifications
  • PagerDuty alerts
  • monitoring dashboards

when thresholds are breached.


Common Reasons Models Degrade

Performance degradation may occur because of:

Data Drift

Input data distribution changes over time.

Feature Drift

Feature values begin behaving differently than during training.

Concept Drift

The relationship between inputs and outputs changes.

Business Changes

New products, users, or workflows alter model assumptions.


Key Metrics to Track

Production ML teams often monitor:

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Prediction Volume
  • Confidence Scores
  • Latency
  • Error Rate

Different applications prioritize different metrics.


Where Model Monitoring Is Used

Monitoring is essential in:

  • fraud detection systems
  • recommendation engines
  • healthcare ML
  • financial risk models
  • customer analytics platforms

Any production ML system requires visibility into performance.


Key Takeaways

  1. Model performance can change after deployment.
  2. Monitoring helps detect issues before users are affected.
  3. Accuracy alone is not sufficient for production systems.
  4. Confidence and prediction volume provide valuable signals.
  5. Monitoring is a core component of MLOps and production ML.

Conclusion

Deploying a model is not the end of the machine learning journey. Real-world ML systems require continuous monitoring to ensure predictions remain accurate, reliable, and aligned with changing business conditions.

This strengthens the ML Systems track in Saturday ML Spark ⚡️, focusing on operational practices that keep machine learning systems healthy long after deployment.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# =========================================================
# 🤖 Train Model
# =========================================================

model = LogisticRegression(max_iter=5000)

model.fit(X_train, y_train)


# =========================================================
# 📊 Generate Predictions
# =========================================================

predictions = model.predict(X_test)

prediction_probs = model.predict_proba(X_test)[:, 1]

accuracy = accuracy_score(y_test, predictions)

print("Current Model Accuracy:", round(accuracy, 4))


# =========================================================
# 📝 Simulated Monitoring Data
# =========================================================

monitoring_data = pd.DataFrame({
    "date": pd.date_range(
        start="2025-01-01",
        periods=10,
        freq="D"
    ),

    "accuracy": [
        0.97, 0.96, 0.97, 0.95, 0.94,
        0.93, 0.92, 0.91, 0.90, 0.89
    ],

    "prediction_volume": [
        500, 520, 510, 540, 550,
        560, 590, 600, 610, 620
    ],

    "avg_confidence": [
        0.95, 0.95, 0.94, 0.93, 0.92,
        0.91, 0.90, 0.89, 0.88, 0.87
    ]
})

print("\nMonitoring Data:")
print(monitoring_data)


# =========================================================
# 📈 Accuracy Trend
# =========================================================

plt.figure(figsize=(8, 4))

plt.plot(
    monitoring_data["date"],
    monitoring_data["accuracy"],
    marker="o"
)

plt.title("Model Accuracy Trend")
plt.xlabel("Date")
plt.ylabel("Accuracy")

plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# 📈 Prediction Volume Trend
# =========================================================

plt.figure(figsize=(8, 4))

plt.plot(
    monitoring_data["date"],
    monitoring_data["prediction_volume"],
    marker="o"
)

plt.title("Prediction Volume Trend")
plt.xlabel("Date")
plt.ylabel("Prediction Count")

plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# 📈 Confidence Trend
# =========================================================

plt.figure(figsize=(8, 4))

plt.plot(
    monitoring_data["date"],
    monitoring_data["avg_confidence"],
    marker="o"
)

plt.title("Average Prediction Confidence")
plt.xlabel("Date")
plt.ylabel("Confidence")

plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# 🚨 Simple Monitoring Alerts
# =========================================================

latest_accuracy = monitoring_data["accuracy"].iloc[-1]

latest_confidence = monitoring_data["avg_confidence"].iloc[-1]

print("\n=== Monitoring Alerts ===")

if latest_accuracy < 0.90:
    print("⚠️ ALERT: Accuracy has dropped below threshold")
else:
    print("✅ Accuracy healthy")

if latest_confidence < 0.88:
    print("⚠️ ALERT: Confidence score decreasing")
else:
    print("✅ Confidence healthy")


# =========================================================
# 📊 Monitoring Summary
# =========================================================

summary = {
    "Average Accuracy":
        monitoring_data["accuracy"].mean(),

    "Average Prediction Volume":
        monitoring_data["prediction_volume"].mean(),

    "Average Confidence":
        monitoring_data["avg_confidence"].mean()
}

print("\nMonitoring Summary:")
for metric, value in summary.items():
    print(f"{metric}: {round(value, 4)}")


# =========================================================
# 💾 Save Monitoring Metrics
# =========================================================

monitoring_data.to_csv(
    "model_monitoring_metrics.csv",
    index=False
)

print("\nMonitoring metrics saved successfully.")


# =========================================================
# 📂 Read Saved Metrics
# =========================================================

saved_metrics = pd.read_csv(
    "model_monitoring_metrics.csv"
)

print("\nSaved Metrics Preview:")
print(saved_metrics.head())

Link copied!

Comments

Add Your Comment

Comment Added!