🧠 AI with Python – 📊 Correlation Matrix with Seaborn


Description:

Understanding relationships between numerical features is crucial for building effective machine learning models.

A correlation matrix provides a simple yet powerful way to measure how features interact with each other — revealing redundancy, direction of relationships, and potential drivers for prediction.

In this project, we visualize these correlations with a clean, annotated heatmap using Seaborn, making it easy to analyze patterns at a glance.


Understanding the Problem

Datasets often contain many numerical variables, and some of these may be strongly correlated.

Highly correlated features can:

  • cause multicollinearity
  • reduce model interpretability
  • unnecessarily increase model complexity
  • affect model stability

A correlation matrix helps you spot these relationships quickly.

Visualizing it as a heatmap makes insights clear and intuitive.


1. Load and Explore the Dataset

We use the Wine dataset, which contains chemical measurements for various wine samples.

It includes 13 numerical features — ideal for correlation analysis.

from sklearn.datasets import load_wine
import pandas as pd

data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)

df.head()

2. Compute the Correlation Matrix

Pearson correlation is used by default, measuring linear relationships.

corr_matrix = df.corr()
corr_matrix

Values range from:

  • +1.0 → perfect positive correlation
  • –1.0 → perfect negative correlation
  • 0.0 → no linear correlation

3. Visualize the Correlation Matrix with Seaborn

We generate a heatmap with annotations for easy interpretation.

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))

sns.heatmap(
    corr_matrix,
    annot=True,         # show correlation values
    fmt=".2f",
    cmap="coolwarm",
    linewidths=0.5,
    square=True
)

plt.title("Correlation Matrix – Wine Dataset", fontsize=14)
plt.xticks(rotation=45, ha="right")
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

This heatmap reveals:

  • which features correlate strongly
  • which pairs might be redundant
  • positive vs negative relationships
  • potential candidates for dimensionality reduction (PCA)

Key Takeaways

  1. Correlation matrices help identify redundancy — important for feature selection.
  2. Seaborn heatmaps make correlation values intuitive with color gradients and annotations.
  3. Strong correlations indicate features that may influence the target similarly.
  4. Negative correlations can reveal inverse relationships worth exploring.
  5. This visualization is essential during exploratory data analysis (EDA).

Conclusion

Correlation heatmaps are a simple yet powerful way to understand the structure of your dataset.

By identifying relationships between features, you can make better decisions about feature engineering, dimensionality reduction, and model selection.

This technique forms the foundation of effective exploratory data analysis and ensures your machine learning workflow starts with a clear understanding of feature behavior.


Code Snippet:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine


data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)

df.head()


corr_matrix = df.corr()
corr_matrix


plt.figure(figsize=(12, 8))

sns.heatmap(
    corr_matrix,
    annot=True,        # show correlation values
    fmt=".2f",         # format to 2 decimal places
    cmap="coolwarm",  
    linewidths=0.5,
    square=True
)

plt.title("Correlation Matrix – Wine Dataset", fontsize=14)
plt.xticks(rotation=45, ha="right")
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!