AW Dev Rethought

🕵️ Debugging is like being the detective in a crime movie where you are also the murderer - Filipe Fortes

🧠 AI with Python – ⚔️ Blending vs Stacking


Description:

Ensemble learning is one of the most powerful ideas in machine learning. Instead of relying on a single model, ensemble techniques combine multiple models to create stronger predictive systems.

Two advanced ensemble approaches are Blending and Stacking.

While both aim to combine multiple models intelligently, they differ in how predictions are generated and how the final meta-model is trained.

In this project, we explore the practical difference between blending and stacking using a real classification workflow.


Understanding the Problem

Different machine learning models learn different patterns.

For example:

  • RandomForest → captures non-linear relationships
  • SVM → strong margin-based separation
  • Logistic Regression → stable linear decision boundaries

Instead of selecting only one model, ensemble learning combines them to improve performance.


What Is Blending?

Blending is an ensemble strategy where:

  1. Base models are trained on training data
  2. Predictions are generated on a validation set
  3. A meta-model learns from those validation predictions

The validation predictions become the input for the blender model.


1. Train Base Models

We first train independent base learners.

rf_model.fit(X_train, y_train)
svc_model.fit(X_train, y_train)

Each model produces probability predictions.


2. Generate Validation Predictions

rf_val_probs = rf_model.predict_proba(X_val)[:, 1]
svc_val_probs = svc_model.predict_proba(X_val)[:, 1]

These predictions are used as features for blending.


3. Train the Blender

blender.fit(blend_X_val, y_val)

The blender learns how to combine outputs from multiple models.


What Is Stacking?

Stacking is a more advanced ensemble strategy.

Instead of using a simple validation holdout, stacking uses:

  • cross-validated predictions
  • internally generated meta-features
  • a meta-model trained on out-of-fold predictions

This often improves generalisation.


1. Define Base Models

base_models = [
    ("rf", RandomForestClassifier()),
    ("svc", SVC(probability=True))
]

2. Define Meta-Model

meta_model = LogisticRegression()

3. Build the Stacking Ensemble

stacking_model = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model
)

The stacking system automatically manages cross-validated prediction flow.


Why Stacking Usually Performs Better

Stacking often generalizes better because:

  • all training data contributes to learning
  • cross-validation reduces overfitting
  • meta-model receives more robust predictions

However, it is computationally more expensive.


Blending vs Stacking

🔹 Blending

  • simpler implementation
  • faster training
  • uses validation holdout
  • may waste part of training data

🔹 Stacking

  • more advanced
  • uses cross-validation internally
  • better generalization
  • higher computational cost

Where These Techniques Are Used

Blending and stacking are heavily used in:

  • Kaggle competitions
  • fraud detection systems
  • recommendation engines
  • financial prediction
  • high-performance ML systems

They are common in advanced ensemble pipelines.


Key Takeaways


  1. Both blending and stacking combine multiple models.
  2. Blending uses validation predictions for meta-learning.
  3. Stacking uses cross-validated predictions internally.
  4. Stacking usually generalizes better but is more complex.
  5. Ensemble learning can significantly improve predictive performance.

Conclusion

Blending and stacking are powerful ensemble learning techniques that help combine the strengths of multiple machine learning models. While blending offers simplicity and speed, stacking provides stronger generalization through more advanced training strategies.

This strengthens the Advanced ML track in the AI with Python series — helping you move from single-model systems toward advanced ensemble architectures used in real-world ML workflows.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import numpy as np

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.svm import SVC


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# =========================================================
# ✂️ Split Data
# =========================================================

# Final test set
X_temp, X_test, y_temp, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

# Separate validation set for blending
X_train, X_val, y_train, y_val = train_test_split(
    X_temp,
    y_temp,
    test_size=0.25,
    random_state=42,
    stratify=y_temp
)


# =========================================================
# 🧠 PART 1 – BLENDING
# =========================================================

# ---------------------------------------------------------
# 🤖 Train Base Models
# ---------------------------------------------------------

rf_model = RandomForestClassifier(random_state=42)

svc_model = SVC(
    probability=True,
    random_state=42
)

rf_model.fit(X_train, y_train)
svc_model.fit(X_train, y_train)


# ---------------------------------------------------------
# 📊 Generate Validation Predictions
# ---------------------------------------------------------

rf_val_probs = rf_model.predict_proba(X_val)[:, 1]
svc_val_probs = svc_model.predict_proba(X_val)[:, 1]


# ---------------------------------------------------------
# 🧩 Create Blending Dataset
# ---------------------------------------------------------

blend_X_val = pd.DataFrame({
    "rf": rf_val_probs,
    "svc": svc_val_probs
})


# ---------------------------------------------------------
# 🧠 Train Blender (Meta-Model)
# ---------------------------------------------------------

blender = LogisticRegression(max_iter=5000)

blender.fit(blend_X_val, y_val)


# ---------------------------------------------------------
# 🚀 Evaluate Blending
# ---------------------------------------------------------

rf_test_probs = rf_model.predict_proba(X_test)[:, 1]
svc_test_probs = svc_model.predict_proba(X_test)[:, 1]

blend_X_test = pd.DataFrame({
    "rf": rf_test_probs,
    "svc": svc_test_probs
})

blend_pred = blender.predict(blend_X_test)

print("=== Blending Results ===")
print("Accuracy:", accuracy_score(y_test, blend_pred))

print("\nClassification Report:\n")
print(classification_report(y_test, blend_pred))


# =========================================================
# 🧠 PART 2 – STACKING
# =========================================================

# ---------------------------------------------------------
# 🤖 Define Base Models
# ---------------------------------------------------------

base_models = [
    ("rf", RandomForestClassifier(random_state=42)),

    ("svc", SVC(
        probability=True,
        random_state=42
    ))
]


# ---------------------------------------------------------
# 🧠 Define Meta-Model
# ---------------------------------------------------------

meta_model = LogisticRegression(max_iter=5000)


# ---------------------------------------------------------
# 🚀 Build Stacking Model
# ---------------------------------------------------------

stacking_model = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model
)


# ---------------------------------------------------------
# 🤖 Train Stacking Model
# ---------------------------------------------------------

stacking_model.fit(X_train, y_train)


# ---------------------------------------------------------
# 📊 Evaluate Stacking
# ---------------------------------------------------------

stack_pred = stacking_model.predict(X_test)

print("\n=== Stacking Results ===")
print("Accuracy:", accuracy_score(y_test, stack_pred))

print("\nClassification Report:\n")
print(classification_report(y_test, stack_pred))

Link copied!

Comments

Add Your Comment

Comment Added!