⚡️ Saturday ML Sparks – 💬 Sentiment Analysis (ML Approach)
Posted on: January 24, 2026
Description:
Sentiment analysis is one of the most common real-world applications of Natural Language Processing (NLP).
From product reviews to customer feedback and social media posts, understanding sentiment helps businesses gauge user opinion at scale.
In this project, we build a sentiment analysis system using a classical machine learning approach — without deep learning or transformers — relying instead on TF-IDF vectorization and Logistic Regression.
Understanding the Problem
Text data is inherently unstructured, which makes it difficult for machine learning models to interpret directly.
To classify sentiment, we must first convert text into a numerical representation that captures meaning and importance.
The challenge lies in:
- representing text numerically
- identifying sentiment-driving words
- building a model that generalizes to unseen text
This makes sentiment analysis a perfect example of applied machine learning for NLP.
1. Preparing a Text Dataset
We start with a small labeled dataset of text samples and their sentiments.
import pandas as pd
data = {
"text": [
"I absolutely loved this product",
"Worst experience ever",
"Very happy with the service",
"The product quality is terrible",
"Amazing experience and great support",
"I will never buy this again"
],
"sentiment": ["positive", "negative", "positive", "negative", "positive", "negative"]
}
df = pd.DataFrame(data)
Each text sample is associated with a sentiment label, making this a supervised learning task.
2. Train/Test Split
We split the dataset to evaluate how well the model performs on unseen text.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
df["text"],
df["sentiment"],
test_size=0.3,
stratify=df["sentiment"],
random_state=42
)
Stratification ensures both sentiments are represented fairly in training and testing sets.
3. Converting Text to TF-IDF Features
Machine learning models require numeric input.
We use TF-IDF (Term Frequency–Inverse Document Frequency) to transform text into numerical vectors.
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(stop_words="english")
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
TF-IDF gives higher importance to words that are informative and less common across documents.
4. Training a Sentiment Classifier
We use Logistic Regression, a widely used baseline model for text classification.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=1000)
model.fit(X_train_tfidf, y_train)
Despite its simplicity, Logistic Regression performs remarkably well for many NLP classification tasks.
5. Evaluating Model Performance
We evaluate the classifier using standard metrics.
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test_tfidf)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
These metrics help understand how well the model distinguishes between positive and negative sentiment.
6. Testing on New Text
Finally, we test the model on unseen sentences.
new_text = [
"I really enjoyed using this",
"This is a complete waste of money"
]
new_tfidf = vectorizer.transform(new_text)
predictions = model.predict(new_tfidf)
for text, sentiment in zip(new_text, predictions):
print(f"{text} → {sentiment}")
This demonstrates how the trained model generalizes beyond the training data.
Key Takeaways
- Sentiment analysis is a core real-world NLP application.
- Text must be converted into numerical features before modeling.
- TF-IDF effectively captures word importance in documents.
- Logistic Regression is a strong baseline for sentiment classification.
- Classical ML approaches remain relevant and widely used in production NLP systems.
Conclusion
Sentiment analysis does not always require complex deep learning models.
By combining TF-IDF vectorization with Logistic Regression, we can build an efficient and interpretable sentiment classifier suitable for many real-world use cases.
This Saturday ML Spark highlights how traditional machine learning techniques continue to play a vital role in NLP — forming a strong foundation before moving on to more advanced models and architectures.
Code Snippet:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
data = {
"text": [
"I absolutely loved this product",
"Worst experience ever",
"Very happy with the service",
"The product quality is terrible",
"Amazing experience and great support",
"I will never buy this again"
],
"sentiment": ["positive", "negative", "positive", "negative", "positive", "negative"]
}
df = pd.DataFrame(data)
X_train, X_test, y_train, y_test = train_test_split(
df["text"],
df["sentiment"],
test_size=0.3,
random_state=42,
stratify=df["sentiment"]
)
vectorizer = TfidfVectorizer(stop_words="english")
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
model = LogisticRegression(max_iter=1000)
model.fit(X_train_tfidf, y_train)
y_pred = model.predict(X_test_tfidf)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
new_text = [
"I really enjoyed using this",
"This is a complete waste of money"
]
new_tfidf = vectorizer.transform(new_text)
predictions = model.predict(new_tfidf)
for text, sentiment in zip(new_text, predictions):
print(f"{text} → {sentiment}")
No comments yet. Be the first to comment!