🧠 AI with Python – 📦 Online Sales Demand Forecasting
Posted on: January 22, 2026
Description:
Demand forecasting is a critical capability for any online business.
Accurate demand predictions help prevent stockouts, reduce excess inventory, and improve supply chain efficiency — all of which directly impact revenue and customer satisfaction.
In this project, we build a machine learning model to forecast online sales demand using historical sales data, pricing, and promotional signals.
Understanding the Problem
Online sales demand fluctuates due to several factors:
- pricing changes
- discounts and promotions
- day-of-week effects
- seasonal patterns
The goal is to learn how these factors influence units sold, making this a regression problem rather than classification.
Unlike pure time-series forecasting, we approach this as a feature-driven ML problem, which is common in real-world e-commerce systems.
1. Loading the Sales Dataset
We begin by loading historical online sales data stored in CSV format.
import pandas as pd
df = pd.read_csv("online_sales.csv")
df.head()
Each row represents daily sales information, including price, discount, promotion flags, and units sold.
2. Creating Time-Based Features
Time-based features help the model capture demand patterns.
df["date"] = pd.to_datetime(df["date"])
df["day"] = df["date"].dt.day
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.weekday
These features allow the model to learn weekly and monthly demand trends.
3. Preparing Features and Target
We separate input features from the demand target.
X = df.drop(["date", "units_sold"], axis=1)
y = df["units_sold"]
The target variable represents daily demand.
4. Train/Test Split
We split historical data into training and testing sets to evaluate generalization.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.3,
random_state=42
)
This simulates predicting demand on unseen future data.
5. Training a Demand Forecasting Model
Demand patterns are often non-linear, especially during promotions.
We use a Random Forest Regressor to capture these effects.
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(
n_estimators=300,
max_depth=12,
random_state=42
)
model.fit(X_train, y_train)
Random Forest models handle complex feature interactions well.
6. Evaluating Forecast Accuracy
We evaluate predictions using regression metrics.
from sklearn.metrics import mean_absolute_error, r2_score
y_pred = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))
- MAE shows average error in units sold
- R² measures how much variance the model explains
7. Visualizing Predictions vs Actual Demand
A visual comparison helps validate forecast quality.
import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Demand")
plt.ylabel("Predicted Demand")
plt.title("Online Sales Demand: Actual vs Predicted")
plt.grid(True)
plt.show()
A tighter diagonal indicates stronger predictive performance.
Key Takeaways
- Online sales demand forecasting is a core business ML problem.
- Pricing and promotions strongly influence demand patterns.
- Feature-based ML models complement traditional time-series methods.
- Random Forest models capture non-linear demand relationships effectively.
- Accurate demand forecasts improve inventory and supply chain decisions.
Conclusion
Online sales demand forecasting demonstrates how machine learning drives real-world business decisions.
By combining historical sales data with pricing and promotion signals, ML models can provide reliable demand estimates that support smarter inventory planning and revenue optimization.
This project represents a realistic end-to-end demand forecasting workflow, making it a strong addition to the AI with Python – Real-World Mini Projects (Advanced) series.
Code Snippet:
import pandas as pd
df = pd.read_csv("online_sales.csv")
df.head()
df["date"] = pd.to_datetime(df["date"])
df["day"] = df["date"].dt.day
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.weekday
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy="median")
df[df.columns] = imputer.fit_transform(df)
X = df.drop(["date", "units_sold"], axis=1)
y = df["units_sold"]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.3,
random_state=42
)
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(
n_estimators=300,
max_depth=12,
random_state=42
)
model.fit(X_train, y_train)
from sklearn.metrics import mean_absolute_error, r2_score
y_pred = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))
import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Demand")
plt.ylabel("Predicted Demand")
plt.title("Online Sales Demand: Actual vs Predicted")
plt.grid(True)
plt.show()
No comments yet. Be the first to comment!