AW Dev Rethought

Code is read far more often than it is written - Guido van Rossum

🧠 AI with Python – 🧩 Customer Segmentation using KMeans


Description:

Understanding customers is at the heart of every successful business.

Instead of treating all customers the same, organizations segment them into groups based on behavior, spending patterns, and engagement levels.

In this project, we apply unsupervised machine learning to perform customer segmentation using KMeans clustering, a widely used technique in marketing, retail, and product analytics.


Understanding the Problem

Customer data often lacks explicit labels.

We don’t know in advance which customer belongs to which group — instead, we want the algorithm to discover patterns on its own.

This makes customer segmentation an unsupervised learning problem, where the objective is to group customers such that:

  • customers within a group are similar
  • customers across groups are different

1. Loading Customer Data

We begin with a dataset containing customer income and spending behavior.

import pandas as pd

df = pd.read_csv("customers.csv")
df.head()

Each row represents a customer with attributes like income, spending score, order value, and purchase frequency.


2. Inspecting Feature Ranges

Before clustering, it’s important to understand feature scales.

print(df.describe())

Features such as income and spending score exist on very different numeric ranges, which affects distance-based algorithms.


3. Feature Scaling

KMeans relies on distance calculations, so feature scaling is mandatory.

from sklearn.preprocessing import StandardScaler

X = df.drop("customer_id", axis=1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Scaling ensures that no single feature dominates the clustering process.


4. Applying KMeans Clustering

We apply KMeans to identify natural customer segments.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=4, random_state=42)
df["segment"] = kmeans.fit_predict(X_scaled)

Each customer is now assigned to a segment based on similarity.


5. Visualizing Customer Segments

Visualization helps translate clusters into business insight.

import matplotlib.pyplot as plt

plt.scatter(
    df["annual_income"],
    df["spending_score"],
    c=df["segment"]
)
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation using KMeans")
plt.show()

This plot reveals distinct customer groups with different income and spending behaviors.


Interpreting the Segments

Typical customer segments might include:

  • low-income, low-spending customers
  • moderate-income, regular customers
  • high-income, selective buyers
  • high-value, frequent purchasers

These insights help tailor marketing campaigns, pricing strategies, and customer experiences.


Key Takeaways

  1. Customer segmentation is a core unsupervised ML use case.
  2. KMeans groups customers based on similarity, not predefined labels.
  3. Feature scaling is essential for distance-based clustering algorithms.
  4. Visualizing clusters makes results interpretable and actionable.
  5. Segmentation enables personalization and data-driven business decisions.

Conclusion

Customer segmentation using KMeans demonstrates how machine learning uncovers hidden structure in customer data.

By grouping customers based on behavioral patterns, businesses can move beyond generic strategies and deliver targeted, personalized experiences.

This project showcases a practical end-to-end unsupervised learning workflow, making it a strong addition to the AI with Python – Real-World Mini Projects (Advanced) series.


Code Snippet:

import pandas as pd

df = pd.read_csv("customers.csv")
df.head()


print(df.info())
print(df.describe())


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)


from sklearn.cluster import KMeans

kmeans = KMeans(
    n_clusters=4,
    random_state=42
)


df["segment"] = kmeans.fit_predict(X_scaled)


import matplotlib.pyplot as plt

plt.scatter(
    df.iloc[:, 0],
    df.iloc[:, 1],
    c=df["segment"]
)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Customer Segmentation using KMeans")
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!