🧠 AI with Python – 📈 Predictions vs Actuals Over Time


Description:

When working with regression models or forecasting tasks, one of the most effective ways to evaluate model performance is by visualizing actual vs predicted values over time.

This approach helps identify trends, drift, lagging predictions, systematic errors, and overall model alignment.

In this project, we use Matplotlib and Pandas to build a clean, time-based line plot comparing model predictions against true values.


Understanding the Problem

Regression models generate continuous predictions — but summary metrics like MSE or R² only provide a numeric score.

To truly understand how a model performs dynamically, we need to visualize:

  • where predictions follow the real trend
  • when they fall behind or overshoot
  • how errors evolve over time
  • how stable the model is across data points

Time-based plots are especially useful in:

  • energy forecasting
  • sales forecasting
  • medical progression
  • sensor readings
  • any temporal ML application

1. Load and Prepare the Dataset

We use the Diabetes dataset, which contains 10 numerical medical features and a continuous target value representing disease progression.

from sklearn.datasets import load_diabetes
import pandas as pd

diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

df = pd.DataFrame(X, columns=diabetes.feature_names)
df["target"] = y

2. Train/Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

3. Train the Regression Model

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

4. Create a Time Index for Visualization

For demonstration, we treat each sample as a daily observation.

In real use cases, you would use the actual timestamp column.

time_index = pd.date_range(
    start="2024-01-01", periods=len(y_test), freq="D"
)

plot_df = pd.DataFrame({
    "date": time_index,
    "actual": y_test,
    "predicted": y_pred
})

5. Visualize Actual vs Predicted Values

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))

plt.plot(plot_df["date"], plot_df["actual"], label="Actual", marker="o")
plt.plot(plot_df["date"], plot_df["predicted"], label="Predicted", marker="x")

plt.title("Model Predictions vs Actual Values Over Time")
plt.xlabel("Date")
plt.ylabel("Disease Progression Score")
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

This overlapping plot reveals how closely the model tracks real values — and where it diverges.


6. Optional: Plot Prediction Error Over Time

plot_df["error"] = plot_df["actual"] - plot_df["predicted"]

plt.figure(figsize=(12,5))
plt.plot(plot_df["date"], plot_df["error"], color="red")
plt.axhline(0, color="black", linestyle="--")
plt.title("Prediction Error Over Time")
plt.xlabel("Date")
plt.ylabel("Error")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

This highlights under- and over-prediction patterns clearly.


Key Takeaways

  1. Line plots of predictions vs actuals reveal trend alignment that metrics alone cannot capture.
  2. Time-based visualization helps identify model drift, lag, or bias.
  3. Adding an error-over-time plot provides deeper insight into model consistency.
  4. This approach is especially useful for forecasting, optimization, and temporal ML tasks.
  5. Combining numerical metrics with visual diagnostics leads to more robust model evaluation.

Conclusion

Visualizing model predictions alongside actual values provides an intuitive and powerful way to diagnose model performance.

It highlights not just how accurate your model is, but how it behaves across time-based data points — enabling better improvement and refinement.

This technique is foundational for real-world machine learning, especially in forecasting and continuous prediction systems.


Code Snippet:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression


diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

df = pd.DataFrame(X, columns=diabetes.feature_names)
df["target"] = y


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)


time_index = pd.date_range(
    start="2024-01-01", periods=len(y_test), freq="D"
)

plot_df = pd.DataFrame({
    "date": time_index,
    "actual": y_test,
    "predicted": y_pred
})


plt.figure(figsize=(12, 6))

plt.plot(plot_df["date"], plot_df["actual"], label="Actual", marker="o")
plt.plot(plot_df["date"], plot_df["predicted"], label="Predicted", marker="x")

plt.title("Model Predictions vs Actual Values Over Time")
plt.xlabel("Date")
plt.ylabel("Disease Progression Score")
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


plot_df["error"] = plot_df["actual"] - plot_df["predicted"]

plt.figure(figsize=(12,5))
plt.plot(plot_df["date"], plot_df["error"], color="red")
plt.axhline(0, color="black", linestyle="--")
plt.title("Prediction Error Over Time")
plt.xlabel("Date")
plt.ylabel("Error")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!