⚡️ Saturday ML Sparks – Hyperparameter Tuning with GridSearchCV 🎛🧠


Description:

Tuning hyperparameters is one of the most important steps in improving your ML model.

Instead of manually trying values, GridSearchCV systematically searches combinations of parameters to find the best-performing model.

Today’s ML Spark makes hyperparameter tuning simple, structured, and beginner-friendly.


Understanding the Problem

Most ML models depend heavily on hyperparameters:

  • Random Forest → number of trees, depth
  • SVM → kernel choice, C value, gamma
  • Logistic Regression → regularization strength
  • KNN → number of neighbors

Using wrong hyperparameters = poor model performance.

GridSearchCV evaluates multiple parameter combinations using cross-validation, ensuring consistent and fair comparisons.

You get:

  • best hyperparameters
  • best cross-validated score
  • the tuned model ready to use

1. Load the Dataset

We’ll use the Breast Cancer dataset (binary classification).

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

2. Choose a Model to Tune

We’ll tune a RandomForestClassifier.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

3. Define the Hyperparameter Grid

GridSearch will try every combination.

param_grid = {
    "n_estimators": [100, 200, 300],
    "max_depth": [None, 5, 10],
    "min_samples_split": [2, 5],
    "min_samples_leaf": [1, 2]
}

4. Run GridSearchCV

We use 5-fold CV for stable evaluation.

from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring="accuracy",
    cv=5,
    n_jobs=-1
)

grid.fit(X_train, y_train)

5. Check the Best Params + Best Score

print("Best Hyperparameters:", grid.best_params_)
print("Best CV Score:", grid.best_score_)

6. Evaluate the Tuned Model

from sklearn.metrics import accuracy_score, classification_report

best_model = grid.best_estimator_

y_pred = best_model.predict(X_test)

print("Test Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Key Takeaways

  1. GridSearchCV automates hyperparameter tuning, saving hours of guesswork.
  2. It uses cross-validation to ensure each hyperparameter set is evaluated fairly.
  3. The output includes the best parameters and the best performing model.
  4. Can be used with any estimator — Logistic Regression, SVM, Random Forest, XGBoost, etc.
  5. Useful for both beginners and professionals to improve model performance systematically.

Conclusion

Hyperparameter tuning is essential for modern ML workflows.

GridSearchCV provides a structured, reliable, and automated way to improve models using systematic search and cross-validation.

This technique is foundational for performance optimization across all machine learning projects.


Code Snippet:

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report


data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)


model = RandomForestClassifier(random_state=42)

param_grid = {
    "n_estimators": [100, 200, 300],
    "max_depth": [None, 5, 10],
    "min_samples_split": [2, 5],
    "min_samples_leaf": [1, 2]
}


grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring="accuracy",
    cv=5,
    n_jobs=-1
)

grid.fit(X_train, y_train)


print("Best Hyperparameters:", grid.best_params_)
print("Best Cross-Validated Score:", grid.best_score_)


best_model = grid.best_estimator_

y_pred = best_model.predict(X_test)

print("Test Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Link copied!

Comments

Add Your Comment

Comment Added!