AW Dev Rethought

Code is read far more often than it is written - Guido van Rossum

🧠 AI with Python – 📦 Online Sales Demand Forecasting


Description:

Demand forecasting is a critical capability for any online business.

Accurate demand predictions help prevent stockouts, reduce excess inventory, and improve supply chain efficiency — all of which directly impact revenue and customer satisfaction.

In this project, we build a machine learning model to forecast online sales demand using historical sales data, pricing, and promotional signals.


Understanding the Problem

Online sales demand fluctuates due to several factors:

  • pricing changes
  • discounts and promotions
  • day-of-week effects
  • seasonal patterns

The goal is to learn how these factors influence units sold, making this a regression problem rather than classification.

Unlike pure time-series forecasting, we approach this as a feature-driven ML problem, which is common in real-world e-commerce systems.


1. Loading the Sales Dataset

We begin by loading historical online sales data stored in CSV format.

import pandas as pd

df = pd.read_csv("online_sales.csv")
df.head()

Each row represents daily sales information, including price, discount, promotion flags, and units sold.


2. Creating Time-Based Features

Time-based features help the model capture demand patterns.

df["date"] = pd.to_datetime(df["date"])
df["day"] = df["date"].dt.day
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.weekday

These features allow the model to learn weekly and monthly demand trends.


3. Preparing Features and Target

We separate input features from the demand target.

X = df.drop(["date", "units_sold"], axis=1)
y = df["units_sold"]

The target variable represents daily demand.


4. Train/Test Split

We split historical data into training and testing sets to evaluate generalization.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    random_state=42
)

This simulates predicting demand on unseen future data.


5. Training a Demand Forecasting Model

Demand patterns are often non-linear, especially during promotions.

We use a Random Forest Regressor to capture these effects.

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=300,
    max_depth=12,
    random_state=42
)

model.fit(X_train, y_train)

Random Forest models handle complex feature interactions well.


6. Evaluating Forecast Accuracy

We evaluate predictions using regression metrics.

from sklearn.metrics import mean_absolute_error, r2_score

y_pred = model.predict(X_test)

print("MAE:", mean_absolute_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))
  • MAE shows average error in units sold
  • measures how much variance the model explains

7. Visualizing Predictions vs Actual Demand

A visual comparison helps validate forecast quality.

import matplotlib.pyplot as plt

plt.scatter(y_test, y_pred)
plt.xlabel("Actual Demand")
plt.ylabel("Predicted Demand")
plt.title("Online Sales Demand: Actual vs Predicted")
plt.grid(True)
plt.show()

A tighter diagonal indicates stronger predictive performance.


Key Takeaways

  1. Online sales demand forecasting is a core business ML problem.
  2. Pricing and promotions strongly influence demand patterns.
  3. Feature-based ML models complement traditional time-series methods.
  4. Random Forest models capture non-linear demand relationships effectively.
  5. Accurate demand forecasts improve inventory and supply chain decisions.

Conclusion

Online sales demand forecasting demonstrates how machine learning drives real-world business decisions.

By combining historical sales data with pricing and promotion signals, ML models can provide reliable demand estimates that support smarter inventory planning and revenue optimization.

This project represents a realistic end-to-end demand forecasting workflow, making it a strong addition to the AI with Python – Real-World Mini Projects (Advanced) series.


Code Snippet:

import pandas as pd

df = pd.read_csv("online_sales.csv")
df.head()


df["date"] = pd.to_datetime(df["date"])
df["day"] = df["date"].dt.day
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.weekday


from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy="median")
df[df.columns] = imputer.fit_transform(df)


X = df.drop(["date", "units_sold"], axis=1)
y = df["units_sold"]


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    random_state=42
)


from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=300,
    max_depth=12,
    random_state=42
)

model.fit(X_train, y_train)


from sklearn.metrics import mean_absolute_error, r2_score

y_pred = model.predict(X_test)

print("MAE:", mean_absolute_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))


import matplotlib.pyplot as plt

plt.scatter(y_test, y_pred)
plt.xlabel("Actual Demand")
plt.ylabel("Predicted Demand")
plt.title("Online Sales Demand: Actual vs Predicted")
plt.grid(True)
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!