🧠 AI with Python – 🍷 Wine Quality Prediction (RandomForest + SHAP)

Posted on: January 29, 2026

Description:

Predicting wine quality is a classic real-world machine learning problem that combines tabular data modeling with model interpretability.

Wine quality depends on multiple physicochemical properties, and understanding how these factors influence quality scores is just as important as making accurate predictions.

In this project, we build a Random Forest regression model to predict wine quality and then use SHAP (SHapley Additive exPlanations) to explain why the model makes certain predictions.

Understanding the Problem

The wine quality dataset contains laboratory measurements such as acidity, sugar, sulphates, and alcohol content.

The target variable, quality, is a numeric score assigned by human tasters.

This makes the task a regression problem, with additional challenges:

features interact non-linearly
multiple features contribute simultaneously
predictions must be explainable, not just accurate

1. Loading the Wine Quality Dataset

We begin by loading the wine quality dataset in CSV format.

import pandas as pd

df = pd.read_csv("winequality-red.csv", sep=";")
df.head()

Each row represents a wine sample with measured chemical properties and a corresponding quality score.

2. Exploring the Target Variable

Before modeling, it’s useful to understand how quality scores are distributed.

print(df["quality"].value_counts().sort_index())

Wine quality typically ranges between 3 and 8, with most samples clustered around mid-range values.

3. Preparing Features and Target

We separate input features from the target variable.

X = df.drop("quality", axis=1)
y = df["quality"]

All features are numeric, making them well-suited for tree-based models.

4. Train/Test Split

We split the dataset to evaluate how well the model generalizes to unseen samples.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    random_state=42
)

5. Training a Random Forest Regressor

Wine quality depends on complex, non-linear interactions between features.

Random Forest models are well-suited for capturing such patterns.

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=300,
    max_depth=12,
    random_state=42
)

model.fit(X_train, y_train)

Random Forests also provide strong performance without extensive feature engineering.

6. Evaluating Model Performance

We evaluate predictions using regression metrics.

from sklearn.metrics import mean_absolute_error, r2_score

y_pred = model.predict(X_test)

print("MAE:", mean_absolute_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

MAE shows the average error in predicted quality score
R² indicates how much variance in quality is explained by the model

7. Explaining Predictions with SHAP

High-performing models are not enough — we must also understand their decisions.

SHAP provides a unified framework to explain model predictions.

import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

SHAP assigns each feature a contribution value for every prediction.

8. Global Feature Importance with SHAP

To understand which features influence wine quality the most, we use a SHAP summary plot.

shap.summary_plot(shap_values, X_test)

This visualization reveals:

which features matter most overall
whether higher or lower values increase predicted quality

Alcohol content and sulphates often emerge as strong predictors.

Key Takeaways

Wine quality prediction is a strong real-world regression use case.
Random Forest models handle non-linear feature interactions effectively.
MAE and R² are essential metrics for evaluating regression performance.
SHAP explains both global and individual model decisions.
Combining performance with interpretability makes models production-ready.

Conclusion

Wine quality prediction demonstrates how machine learning can deliver both accurate predictions and transparent explanations.

By pairing a powerful ensemble model with SHAP-based interpretability, we gain insight into not just what the model predicts, but why it predicts it.

This project highlights an end-to-end workflow for explainable machine learning on tabular data, making it a valuable addition to the AI with Python – Real-World Mini Projects (Advanced) series.

Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score

import shap


# 🧩 Load the Wine Quality Dataset
df = pd.read_csv("winequality-red.csv", sep=";")

print("Dataset Shape:", df.shape)
print("\nQuality Distribution:")
print(df["quality"].value_counts().sort_index())


# 🔍 Separate Features and Target
X = df.drop("quality", axis=1)
y = df["quality"]


# ✂️ Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42
)


# 🌲 Train Random Forest Regressor
model = RandomForestRegressor(
    n_estimators=300,
    max_depth=12,
    random_state=42,
    n_jobs=-1
)

model.fit(X_train, y_train)


# 📊 Evaluate Model Performance
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("\nModel Performance:")
print("MAE:", round(mae, 3))
print("R² Score:", round(r2, 3))


# 📈 Actual vs Predicted Visualization
plt.figure(figsize=(6, 5))
plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel("Actual Wine Quality")
plt.ylabel("Predicted Wine Quality")
plt.title("Wine Quality: Actual vs Predicted")
plt.grid(True)
plt.show()


# 🔎 SHAP Explainability
shap.initjs()

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)


# 📊 SHAP Summary Plot (Global Feature Importance)
shap.summary_plot(shap_values, X_test)


# 🔍 SHAP Explanation for a Single Prediction (Optional)
sample_index = 0

print("\nExplaining prediction for sample index:", sample_index)
print("Actual Quality:", y_test.iloc[sample_index])
print("Predicted Quality:", round(y_pred[sample_index], 2))

shap.force_plot(
    explainer.expected_value,
    shap_values[sample_index],
    X_test.iloc[sample_index],
    matplotlib=True
)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🍷 Wine Quality Prediction (RandomForest + SHAP)

Description:

Understanding the Problem

1. Loading the Wine Quality Dataset

2. Exploring the Target Variable

3. Preparing Features and Target

4. Train/Test Split

5. Training a Random Forest Regressor

6. Evaluating Model Performance

7. Explaining Predictions with SHAP

8. Global Feature Importance with SHAP

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 🍷 Wine Quality Prediction (RandomForest + SHAP)

Description:

Understanding the Problem

1. Loading the Wine Quality Dataset

2. Exploring the Target Variable

3. Preparing Features and Target

4. Train/Test Split

5. Training a Random Forest Regressor

6. Evaluating Model Performance

7. Explaining Predictions with SHAP

8. Global Feature Importance with SHAP

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – 🧩 Customer Segmentation using KMeans

🧠 AI with Python – 📦 Online Sales Demand Forecasting

🧠 AI with Python – ☀️ Solar Energy Output Prediction

Comments