⚡️ Saturday ML Sparks – Clustering with KMeans 🔷🧠

Posted on: December 13, 2025

Description:

Unsupervised learning is all about uncovering hidden patterns in data when we don’t have labels.

Among all unsupervised algorithms, KMeans is one of the simplest, fastest, and most widely used techniques for tasks like:

customer segmentation
anomaly detection
grouping similar behaviors
pattern discovery in large datasets

In this Saturday ML Spark, we explore how KMeans divides data into clusters and how to visualize the results.

Understanding the Problem

Unlike supervised learning, where each sample has a known target label, unsupervised learning operates without predefined classes.

KMeans attempts to:

Group data into k clusters
Assign each point to the nearest cluster centroid
Iteratively refine centroids until convergence

The result is a natural grouping of data based on similarity — incredibly useful when labels aren’t available.

1. Generate Synthetic Unlabeled Data

We create a dataset with three natural clusters.

from sklearn.datasets import make_blobs
import numpy as np

X, y_true = make_blobs(
    n_samples=300,
    centers=3,
    cluster_std=0.60,
    random_state=42
)

2. Apply KMeans Clustering

Initialize KMeans and fit it to the data.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

KMeans learns:

the cluster assignments
the centroid positions

3. Retrieve Cluster Labels and Centroids

After training, we extract the results.

labels = kmeans.labels_
centroids = kmeans.cluster_centers_

These labels represent the cluster each sample belongs to.

4. Visualize the Clusters

Plot the grouped points and the learned centroids.

import matplotlib.pyplot as plt

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="viridis")
plt.scatter(
    centroids[:, 0], centroids[:, 1],
    c="red", s=200, marker="X"
)
plt.title("KMeans Clustering Result")
plt.show()

Visualizing clusters makes the grouping intuitive and interpretable.

5. Predict Clusters for New Points

KMeans can also classify new unseen data points.

new_points = np.array([[0, 2], [3, 4], [-1, -2]])
preds = kmeans.predict(new_points)
print(preds)

This is useful for real-world segmentation tasks.

Key Takeaways

KMeans is a foundational unsupervised algorithm used widely in data science.
It discovers natural groupings in data without needing labels.
Centroids represent “cluster centers” — key to understanding the grouping.
Visualization helps interpret clustering results clearly.
KMeans is efficient, scalable, and ideal for segmentation tasks in business and analytics.

Conclusion

Clustering is an essential part of unsupervised learning, and KMeans provides a simple yet powerful way to uncover structure in unlabeled datasets. Whether you’re analyzing customers, patterns, behaviors, or high-dimensional data, this algorithm is often the first and most effective choice.

With just a few lines of code, KMeans reveals meaningful insights that would otherwise remain hidden — a perfect tool for exploration and discovery in machine learning.

Code Snippet:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans


X, y_true = make_blobs(
    n_samples=300,
    centers=3,
    cluster_std=0.60,
    random_state=42
)


kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)


labels = kmeans.labels_
centroids = kmeans.cluster_centers_


plt.figure(figsize=(8, 5))
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap="viridis")
plt.scatter(centroids[:, 0], centroids[:, 1], c="red", s=200, marker="X")
plt.title("KMeans Clustering Result")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()


new_points = np.array([[0, 2], [3, 4], [-1, -2]])
preds = kmeans.predict(new_points)
print("Cluster predictions for new points:", preds)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

⚡️ Saturday ML Sparks – Clustering with KMeans 🔷🧠

Description:

Understanding the Problem

1. Generate Synthetic Unlabeled Data

2. Apply KMeans Clustering

3. Retrieve Cluster Labels and Centroids

4. Visualize the Clusters

5. Predict Clusters for New Points

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

⚡️ Saturday ML Sparks – Clustering with KMeans 🔷🧠

Description:

Understanding the Problem

1. Generate Synthetic Unlabeled Data

2. Apply KMeans Clustering

3. Retrieve Cluster Labels and Centroids

4. Visualize the Clusters

5. Predict Clusters for New Points

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

⚡️ Saturday ML Sparks – Hyperparameter Tuning with GridSearchCV 🎛🧠

⚡️ Saturday ML Sparks – Cross-Validation Made Easy 🔄🧠

⚡️ Saturday ML Sparks – ROC Curve & AUC Comparison 📈🧠

Comments