🧠 AI with Python - 🔄 Cross-Validation with cross_val_score


Description:

Model evaluation isn’t just about splitting into one train/test set. A single split may not represent the true generalization ability of a model.

That’s where Cross-Validation comes in — it splits data into multiple folds, trains on some, tests on others, and repeats the process. This gives a more reliable performance estimate.


Loading the Dataset

We’ll use the Iris dataset, a classic for classification tasks.

Only features (X) and labels (y) are needed here.

from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

Applying Cross-Validation

We train a Logistic Regression classifier and evaluate it with cross_val_score using 5-fold cross-validation.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

# Initialize model
model = LogisticRegression(max_iter=200)

# Apply 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)

Interpreting Results

The output is an array of accuracies from each fold.

We usually report the mean accuracy to summarize overall model performance.

print("Cross-validation scores:", scores)
print("Mean accuracy:", scores.mean())

Why Cross-Validation Matters

  • Reduces bias: Uses multiple train/test splits.
  • More reliable: Avoids luck of a single split.
  • Standard practice: Especially in model comparison & hyperparameter tuning.

Code Snippet:

# Import dataset, model, and cross-validation utilities
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
import numpy as np


# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target


# Initialize the Logistic Regression model
model = LogisticRegression(max_iter=200)


# Perform 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)
print("Cross-validation scores:", scores)


# Print the mean accuracy across folds
print("Mean accuracy:", np.mean(scores))

Link copied!

Comments

Add Your Comment

Comment Added!