🧠 AI with Python – 📈 Predictions vs Actuals Over Time
Posted on: December 4, 2025
Description:
When working with regression models or forecasting tasks, one of the most effective ways to evaluate model performance is by visualizing actual vs predicted values over time.
This approach helps identify trends, drift, lagging predictions, systematic errors, and overall model alignment.
In this project, we use Matplotlib and Pandas to build a clean, time-based line plot comparing model predictions against true values.
Understanding the Problem
Regression models generate continuous predictions — but summary metrics like MSE or R² only provide a numeric score.
To truly understand how a model performs dynamically, we need to visualize:
- where predictions follow the real trend
- when they fall behind or overshoot
- how errors evolve over time
- how stable the model is across data points
Time-based plots are especially useful in:
- energy forecasting
- sales forecasting
- medical progression
- sensor readings
- any temporal ML application
1. Load and Prepare the Dataset
We use the Diabetes dataset, which contains 10 numerical medical features and a continuous target value representing disease progression.
from sklearn.datasets import load_diabetes
import pandas as pd
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target
df = pd.DataFrame(X, columns=diabetes.feature_names)
df["target"] = y
2. Train/Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
3. Train the Regression Model
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
4. Create a Time Index for Visualization
For demonstration, we treat each sample as a daily observation.
In real use cases, you would use the actual timestamp column.
time_index = pd.date_range(
start="2024-01-01", periods=len(y_test), freq="D"
)
plot_df = pd.DataFrame({
"date": time_index,
"actual": y_test,
"predicted": y_pred
})
5. Visualize Actual vs Predicted Values
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(plot_df["date"], plot_df["actual"], label="Actual", marker="o")
plt.plot(plot_df["date"], plot_df["predicted"], label="Predicted", marker="x")
plt.title("Model Predictions vs Actual Values Over Time")
plt.xlabel("Date")
plt.ylabel("Disease Progression Score")
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
This overlapping plot reveals how closely the model tracks real values — and where it diverges.
6. Optional: Plot Prediction Error Over Time
plot_df["error"] = plot_df["actual"] - plot_df["predicted"]
plt.figure(figsize=(12,5))
plt.plot(plot_df["date"], plot_df["error"], color="red")
plt.axhline(0, color="black", linestyle="--")
plt.title("Prediction Error Over Time")
plt.xlabel("Date")
plt.ylabel("Error")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
This highlights under- and over-prediction patterns clearly.
Key Takeaways
- Line plots of predictions vs actuals reveal trend alignment that metrics alone cannot capture.
- Time-based visualization helps identify model drift, lag, or bias.
- Adding an error-over-time plot provides deeper insight into model consistency.
- This approach is especially useful for forecasting, optimization, and temporal ML tasks.
- Combining numerical metrics with visual diagnostics leads to more robust model evaluation.
Conclusion
Visualizing model predictions alongside actual values provides an intuitive and powerful way to diagnose model performance.
It highlights not just how accurate your model is, but how it behaves across time-based data points — enabling better improvement and refinement.
This technique is foundational for real-world machine learning, especially in forecasting and continuous prediction systems.
Code Snippet:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target
df = pd.DataFrame(X, columns=diabetes.feature_names)
df["target"] = y
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
time_index = pd.date_range(
start="2024-01-01", periods=len(y_test), freq="D"
)
plot_df = pd.DataFrame({
"date": time_index,
"actual": y_test,
"predicted": y_pred
})
plt.figure(figsize=(12, 6))
plt.plot(plot_df["date"], plot_df["actual"], label="Actual", marker="o")
plt.plot(plot_df["date"], plot_df["predicted"], label="Predicted", marker="x")
plt.title("Model Predictions vs Actual Values Over Time")
plt.xlabel("Date")
plt.ylabel("Disease Progression Score")
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
plot_df["error"] = plot_df["actual"] - plot_df["predicted"]
plt.figure(figsize=(12,5))
plt.plot(plot_df["date"], plot_df["error"], color="red")
plt.axhline(0, color="black", linestyle="--")
plt.title("Prediction Error Over Time")
plt.xlabel("Date")
plt.ylabel("Error")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
No comments yet. Be the first to comment!