⚡️ Saturday ML Sparks – 🎬 Movie Recommendation with Cosine Similarity

Posted on: January 17, 2026

Description:

Movie recommendation systems are everywhere — from streaming platforms to e-commerce sites.

At their core, many recommendation engines start with a simple idea: recommend items that are similar to what a user already likes.

In this Saturday ML Spark, we build a content-based movie recommendation system using cosine similarity, one of the most widely used similarity measures in machine learning.

Understanding the Problem

Unlike collaborative filtering, which relies on user behaviour, content-based recommendation focuses on item attributes such as:

genres
tags
descriptions
keywords

The challenge is to represent these attributes numerically and then measure how similar two items are.

Cosine similarity helps answer a simple question:

How similar are two movies based on their content?

1. Preparing a Movie Dataset

We start with a small dataset of movies and their descriptions.

import pandas as pd

movies = {
    "title": [
        "Inception",
        "The Dark Knight",
        "Interstellar",
        "The Matrix",
        "Avengers: Endgame"
    ],
    "description": [
        "dream manipulation sci-fi thriller",
        "dark gritty superhero action crime",
        "space exploration time sci-fi drama",
        "virtual reality dystopian sci-fi action",
        "superheroes action time travel battle"
    ]
}

df = pd.DataFrame(movies)

In real-world systems, these descriptions could come from genres, plot summaries, or user-generated tags.

2. Converting Text to Numerical Features

Machine learning models cannot work directly with raw text.

We convert movie descriptions into numerical vectors using TF-IDF.

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(df["description"])

TF-IDF assigns higher weights to words that are informative and less common across the dataset.

3. Measuring Similarity with Cosine Similarity

Once movies are represented as vectors, we compute similarity between them.

from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

Cosine similarity measures the angle between vectors:

1.0 → identical content
0.0 → no similarity

4. Building the Recommendation Logic

We now create a function that returns movies most similar to a given title.

def recommend(movie_title, top_n=3):
    idx = df.index[df["title"] == movie_title][0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    return [
        df["title"][i]
        for i, score in sim_scores[1 : top_n + 1]
    ]

This function:

finds the selected movie
ranks all other movies by similarity
returns the top recommendations

5. Generating Movie Recommendations

We can now test the recommender system.

print(recommend("Inception"))

The output includes movies with similar themes and keywords, such as sci-fi, action, or time-based concepts.

Why Cosine Similarity Works Well

Cosine similarity is especially effective for recommendation systems because:

it ignores document length
it focuses on relative word importance
it scales efficiently to large datasets
it works well with sparse text vectors

This makes it a natural choice for content-based recommenders.

Key Takeaways

Content-based recommendation relies on item similarity, not user behavior.
TF-IDF converts text metadata into meaningful numeric vectors.
Cosine similarity measures how close items are in feature space.
Simple recommendation systems can be built with minimal data.
This approach forms the foundation of many real-world recommender engines.

Conclusion

Movie recommendation systems don’t always need complex deep learning models to be effective.

By combining TF-IDF vectorization with cosine similarity, we can build a practical and interpretable recommender system that captures meaningful relationships between items.

This Saturday ML Spark demonstrates how a simple mathematical concept can power real-world applications — making it a great starting point for understanding recommendation systems before moving on to more advanced approaches.

Code Snippet:

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


movies = {
    "title": [
        "Inception",
        "The Dark Knight",
        "Interstellar",
        "The Matrix",
        "Avengers: Endgame"
    ],
    "description": [
        "dream manipulation sci-fi thriller",
        "dark gritty superhero action crime",
        "space exploration time sci-fi drama",
        "virtual reality dystopian sci-fi action",
        "superheroes action time travel battle"
    ]
}

df = pd.DataFrame(movies)


vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(df["description"])


cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


def recommend(movie_title, top_n=3):
    idx = df.index[df["title"] == movie_title][0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    recommendations = [
        df["title"][i]
        for i, score in sim_scores[1 : top_n + 1]
    ]

    return recommendations


print("Recommended movies for Inception:")
print(recommend("Inception"))

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

⚡️ Saturday ML Sparks – 🎬 Movie Recommendation with Cosine Similarity

Description:

Understanding the Problem

1. Preparing a Movie Dataset

2. Converting Text to Numerical Features

3. Measuring Similarity with Cosine Similarity

4. Building the Recommendation Logic

5. Generating Movie Recommendations

Why Cosine Similarity Works Well

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

⚡️ Saturday ML Sparks – 🎬 Movie Recommendation with Cosine Similarity

Description:

Understanding the Problem

1. Preparing a Movie Dataset

2. Converting Text to Numerical Features

3. Measuring Similarity with Cosine Similarity

4. Building the Recommendation Logic

5. Generating Movie Recommendations

Why Cosine Similarity Works Well

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

⚡️ Saturday ML Spark – 🚨 Anomaly Detection with Isolation Forest

⚡️ Saturday ML Sparks – 💬 Sentiment Analysis (ML Approach)

⚡️ Saturday ML Sparks – 🧩 Customer Segmentation with Clustering

Comments