AW Dev Rethought

🕵️ Debugging is like being the detective in a crime movie where you are also the murderer - Filipe Fortes

🧠 AI with Python – 📉 Detecting Concept Drift


Description:

One of the biggest challenges in machine learning is that the world does not stay the same.

Customer behaviour changes. Markets evolve. Fraudsters adapt. Business processes shift. As a result, a model that performs exceptionally well today may become less effective months later. This phenomenon is known as concept drift.

In this project, we explore what concept drift is, why it happens, and how we can detect it by monitoring model performance over time.


Understanding the Problem

Machine learning models learn relationships from historical data.

For example:

  • income → loan repayment
  • customer activity → churn risk
  • transaction patterns → fraud detection

These relationships are assumed to remain relatively stable after deployment.

However, in real-world systems, those relationships often change.

When that happens, model predictions become less reliable.


What Is Concept Drift?

Concept drift occurs when:

The relationship between input features and the target variable changes over time.

The model continues using patterns learned from historical data, but those patterns are no longer valid.

This leads to performance degradation.


A Simple Example

Imagine a loan approval model. During training:

Higher income → Lower default risk

The model learns this relationship.

Months later, an economic downturn occurs.

Now:

Higher income ≠ Lower default risk

The underlying relationship has changed.

Even though the data still looks similar, prediction quality declines.

This is concept drift.


Why Concept Drift Matters

Concept drift can cause:

  • declining accuracy
  • incorrect business decisions
  • customer dissatisfaction
  • increased operational risk

If drift is not detected early, model performance may continue deteriorating unnoticed.


Monitoring Model Performance

One of the simplest ways to detect concept drift is by tracking performance metrics over time.

Example:

accuracy = [
    0.95, 0.94, 0.93,
    0.91, 0.89, 0.85
]

A steady decline may indicate that the model is becoming less effective.


Visualising Performance Trends

Plotting accuracy over time often reveals patterns.

plt.plot(
    performance_data["date"],
    performance_data["accuracy"]
)

Monitoring dashboards commonly display:

  • accuracy trends
  • precision trends
  • recall trends
  • confidence scores

These visualisations help teams spot degradation quickly.


Measuring Performance Drop

A common approach is comparing current performance against a baseline.

baseline_accuracy = 0.95
latest_accuracy = 0.80

Calculate the drop:

accuracy_drop = (
    baseline_accuracy -
    latest_accuracy
)

A significant decline may indicate drift.


Simple Drift Detection

Threshold-based monitoring is often used as an initial safeguard.

if accuracy_drop > 0.10:
    print("Concept Drift Detected")

Although simple, this approach can provide valuable early warnings.


Types of Concept Drift

Sudden Drift

Performance changes abruptly.

Examples:

  • policy changes
  • regulatory updates
  • market crashes

Gradual Drift

Performance slowly declines over time.

Examples:

  • changing customer preferences
  • evolving user behaviour

Recurring Drift

Patterns disappear and later return.

Examples:

  • seasonal demand
  • holiday purchasing behaviour

How Production Systems Detect Drift

Modern ML systems use more sophisticated methods such as:

  • rolling performance windows
  • Population Stability Index (PSI)
  • statistical hypothesis testing
  • drift detection libraries
  • shadow model evaluation

These approaches provide earlier and more reliable drift detection.


Where Concept Drift Appears

Concept drift is common in:

  • fraud detection systems
  • recommendation engines
  • loan approval models
  • customer churn prediction
  • advertising platforms

Any environment that changes over time can experience drift.


Key Takeaways

  1. Concept drift occurs when feature-target relationships change over time.
  2. A model can degrade even if the data structure remains unchanged.
  3. Monitoring performance trends helps identify drift early.
  4. Accuracy decline is often a useful warning signal.
  5. Detecting concept drift is a critical responsibility in production ML systems.

Conclusion

Machine learning models operate in dynamic environments where patterns and relationships constantly evolve. Concept drift is one of the primary reasons models degrade after deployment. By continuously monitoring performance and tracking changes over time, teams can identify drift early and take corrective action before it impacts users or business outcomes.

This strengthens the ML Systems track in the AI with Python series — focusing on maintaining reliable and trustworthy machine learning systems in production.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt


# =========================================================
# 📝 Simulate Model Performance Over Time
# =========================================================

performance_data = pd.DataFrame({
    "date": pd.date_range(
        start="2025-01-01",
        periods=12,
        freq="ME"   # Month End
    ),

    "accuracy": [
        0.95, 0.95, 0.94, 0.94,
        0.93, 0.92, 0.91, 0.89,
        0.87, 0.85, 0.83, 0.80
    ]
})


# =========================================================
# 🔍 View Performance Data
# =========================================================

print("Performance Data:\n")
print(performance_data)


# =========================================================
# 📈 Visualize Accuracy Trend
# =========================================================

plt.figure(figsize=(8, 4))

plt.plot(
    performance_data["date"],
    performance_data["accuracy"],
    marker="o"
)

plt.title("Model Accuracy Over Time")
plt.xlabel("Date")
plt.ylabel("Accuracy")

plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# 📊 Measure Performance Drop
# =========================================================

baseline_accuracy = performance_data["accuracy"].iloc[0]

latest_accuracy = performance_data["accuracy"].iloc[-1]

accuracy_drop = (
    baseline_accuracy -
    latest_accuracy
)

print(
    "\nAccuracy Drop:",
    round(accuracy_drop, 4)
)


# =========================================================
# 🚨 Detect Concept Drift
# =========================================================

DRIFT_THRESHOLD = 0.10

print("\n=== Drift Detection ===")

if accuracy_drop > DRIFT_THRESHOLD:
    print("⚠️ Concept Drift Detected")
else:
    print("✅ No Significant Drift")


# =========================================================
# 📊 Calculate Rolling Average Accuracy
# =========================================================

performance_data["rolling_accuracy"] = (
    performance_data["accuracy"]
    .rolling(window=3)
    .mean()
)

print("\nRolling Accuracy:")
print(
    performance_data[
        ["date", "rolling_accuracy"]
    ]
)


# =========================================================
# 📈 Plot Rolling Accuracy
# =========================================================

plt.figure(figsize=(8, 4))

plt.plot(
    performance_data["date"],
    performance_data["accuracy"],
    marker="o",
    label="Actual Accuracy"
)

plt.plot(
    performance_data["date"],
    performance_data["rolling_accuracy"],
    marker="s",
    label="Rolling Average"
)

plt.title("Concept Drift Monitoring")
plt.xlabel("Date")
plt.ylabel("Accuracy")

plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


# =========================================================
# 📉 Calculate Degradation Percentage
# =========================================================

degradation_percent = (
    (baseline_accuracy - latest_accuracy)
    / baseline_accuracy
) * 100

print(
    "\nPerformance Degradation (%):",
    round(degradation_percent, 2)
)


# =========================================================
# 💾 Save Monitoring Results
# =========================================================

performance_data.to_csv(
    "concept_drift_monitoring.csv",
    index=False
)

print(
    "\nConcept drift monitoring data saved successfully."
)


# =========================================================
# 📂 Read Saved Results
# =========================================================

saved_data = pd.read_csv(
    "concept_drift_monitoring.csv"
)

print("\nSaved Monitoring Data Preview:\n")
print(saved_data.head())

Link copied!

Comments

Add Your Comment

Comment Added!