AW Dev Rethought

Truth can only be found in one place: the code - Robert C. Martin

⚡️ Saturday ML Spark – 🔗 Creating Interaction Features


Description:

In machine learning, better models don’t always come from more complex algorithms. Sometimes, the biggest improvements come from better features.

One powerful yet simple technique in feature engineering is creating interaction features — combining existing variables to capture hidden relationships.

In this project, we explore how interaction features can improve model performance on tabular data.


Understanding the Problem

Most basic models assume that features independently influence the target.

However, in real-world data:

  • features often interact with each other
  • relationships are rarely purely linear
  • combined effects can be stronger than individual effects

For example:

  • income alone → moderate signal
  • education alone → moderate signal
  • income × education → strong predictive signal

What Are Interaction Features?

Interaction features are created by combining two or more features, usually through multiplication.

X["A_B"] = X["A"] * X["B"]

This allows the model to learn relationships that depend on multiple variables together.


Baseline Model

We first train a model without interaction features.

model = LinearRegression()
model.fit(X_train, y_train)

This serves as a reference for comparison.


Creating Interaction Features

We generate new features by combining existing ones.

X_train["RM_LSTAT"] = X_train["RM"] * X_train["LSTAT"]
X_train["CRIM_NOX"] = X_train["CRIM"] * X_train["NOX"]

These features represent interactions between variables.


Training with Interaction Features

model.fit(X_train_interact, y_train)

After adding interaction features, the model can capture more complex patterns.


Why Interaction Features Matter

Interaction features help in:

  • capturing non-linear relationships
  • improving performance of simple models
  • uncovering hidden patterns in data
  • enhancing predictive power without changing algorithms

They are especially useful when using:

  • Linear Regression
  • Logistic Regression
  • simpler ML models

When to Use Interaction Features

  • when relationships between features are expected
  • when model performance is limited
  • when using simpler models
  • when domain knowledge suggests feature combinations

Key Takeaways

  1. Interaction features combine multiple variables into new features.
  2. They help capture relationships that individual features cannot.
  3. Simple models benefit significantly from interaction features.
  4. Feature engineering can improve performance without complex models.
  5. A practical and powerful technique for tabular data.

Conclusion

Interaction features are a simple yet highly effective way to improve machine learning models. By combining existing features, we can reveal hidden patterns and enhance model performance without increasing algorithm complexity.

This marks an important step in the Feature Engineering track of Saturday ML Spark ⚡️, helping you move from just using models to designing better data representations.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score


# 🧩 Load Dataset (Boston is deprecated → using California Housing)
data = fetch_california_housing()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42
)


# =========================================================
# 🚨 Baseline Model (No Interaction Features)
# =========================================================

baseline_model = LinearRegression()
baseline_model.fit(X_train, y_train)

baseline_pred = baseline_model.predict(X_test)

print("Baseline R2 Score:", r2_score(y_test, baseline_pred))


# =========================================================
# 🔗 Create Interaction Features
# =========================================================

X_train_interact = X_train.copy()
X_test_interact = X_test.copy()

# Example interaction features
X_train_interact["MedInc_HouseAge"] = X_train["MedInc"] * X_train["HouseAge"]
X_test_interact["MedInc_HouseAge"] = X_test["MedInc"] * X_test["HouseAge"]

X_train_interact["AveRooms_Population"] = X_train["AveRooms"] * X_train["Population"]
X_test_interact["AveRooms_Population"] = X_test["AveRooms"] * X_test["Population"]


# =========================================================
# 🤖 Train Model with Interaction Features
# =========================================================

interaction_model = LinearRegression()
interaction_model.fit(X_train_interact, y_train)

interaction_pred = interaction_model.predict(X_test_interact)

print("With Interaction Features R2 Score:", r2_score(y_test, interaction_pred))

Link copied!

Comments

Add Your Comment

Comment Added!