🧠 AI with Python – 🔄 Retraining Strategies (Batch vs Online Learning)

Posted on: June 16, 2026

Description:

A machine learning model is not a one-time asset that can be trained and forgotten. Once deployed, the world around the model continues to change.

Customer behaviour evolves. Business processes change. Market conditions shift. New data patterns emerge. As a result, model performance gradually degrades unless the model is updated. This is where retraining strategies become important.

In this project, we explore two common approaches used in production ML systems:

Batch Retraining
Online Learning

Both aim to keep models accurate, but they do so in very different ways.

Why Retraining Is Necessary

Machine learning models learn patterns from historical data.

Over time, those patterns may become outdated because of:

concept drift
feature drift
changing user behaviour
new products or services
evolving business environments

Without retraining, even highly accurate models can become ineffective.

What Is Batch Retraining?

Batch retraining involves rebuilding a model periodically using a larger, updated dataset.

The workflow usually looks like:

Collect New Data
        ↓
Combine Historical Data
        ↓
Retrain Model
        ↓
Deploy New Version

The old model is replaced by a newly trained model.

Training the Initial Model

We first train a model using available data.

batch_model.fit(
    X_initial,
    y_initial
)

This represents the initial production model.

Periodic Retraining

After collecting new data, we retrain the model.

batch_model.fit(
    X,
    y
)

The model learns from the complete updated dataset.

Advantages of Batch Retraining

Batch retraining provides:

stable training process
access to full historical context
often higher accuracy
easier validation and testing

It is widely used in traditional ML pipelines.

Limitations of Batch Retraining

However, batch retraining:

requires more compute resources
may take longer to execute
updates only at scheduled intervals

A model may remain outdated between retraining cycles.

What Is Online Learning?

Online learning updates the model continuously as new data arrives.

Instead of retraining from scratch, the model learns incrementally.

The workflow becomes:

New Data Arrives
        ↓
Update Model
        ↓
Continue Serving

The model evolves continuously.

Initial Online Training

online_model.partial_fit(
    X_initial,
    y_initial,
    classes=[0, 1]
)

The model starts with an initial training phase.

Incremental Updates

As new data arrives:

online_model.partial_fit(
    X_batch,
    y_batch
)

The model learns without rebuilding itself entirely.

Advantages of Online Learning

Online learning offers:

continuous adaptation
lower retraining costs
support for streaming data
faster reaction to changing environments

It is especially useful when data changes rapidly.

Limitations of Online Learning

Online learning can also introduce challenges:

greater sensitivity to noisy data
harder debugging
more complex monitoring
risk of learning undesirable patterns

Careful monitoring becomes essential.

Batch vs Online Learning

Batch Retraining

Best when:

data changes slowly
training resources are available
model stability is important

Examples:

monthly forecasting
customer segmentation
demand prediction

Online Learning

Best when:

data changes rapidly
streaming data exists
real-time adaptation is required

Examples:

recommendation systems
fraud detection
ad-click prediction
personalization systems

How Companies Use Retraining

Many production systems use a hybrid approach.

Example:

online updates throughout the day
full batch retraining weekly

This combines adaptability with stability.

Why Retraining Matters in MLOps

Retraining is a core part of machine learning operations because it helps:

maintain model accuracy
combat concept drift
respond to changing environments
extend model lifespan

Without retraining, model performance inevitably declines.

Key Takeaways

Machine learning models require updates after deployment.
Batch retraining rebuilds models periodically using accumulated data.
Online learning updates models continuously as new data arrives.
Each strategy has different trade-offs in cost, speed, and adaptability.
Retraining is a critical component of production ML systems.

Conclusion

Machine learning models operate in environments that constantly change. Choosing the right retraining strategy is essential for maintaining performance over time. Batch retraining offers stability and comprehensive learning, while online learning provides rapid adaptation to new information. Understanding when to use each approach is a key skill in building reliable production ML systems.

Code Snippet:

# 📦 Import Required Libraries
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.linear_model import SGDClassifier


# =========================================================
# 🧩 Load Dataset
# =========================================================

data = load_breast_cancer()

X = pd.DataFrame(
    data.data,
    columns=data.feature_names
)

y = data.target


# =========================================================
# ✂️ Split Initial and Future Data
# =========================================================

# Initial training data
# Future data simulates newly arriving records

X_initial, X_future, y_initial, y_future = train_test_split(
    X,
    y,
    test_size=0.30,
    random_state=42,
    stratify=y
)


# =========================================================
# 🧠 PART 1 – BATCH RETRAINING
# =========================================================

print("=== Batch Retraining ===\n")

# ---------------------------------------------------------
# Initial Model Training
# ---------------------------------------------------------

batch_model = SGDClassifier(
    random_state=42
)

batch_model.fit(
    X_initial,
    y_initial
)

# ---------------------------------------------------------
# Initial Evaluation
# ---------------------------------------------------------

initial_predictions = batch_model.predict(
    X_future
)

initial_accuracy = accuracy_score(
    y_future,
    initial_predictions
)

print(
    "Initial Accuracy:",
    round(initial_accuracy, 4)
)


# ---------------------------------------------------------
# Simulate Batch Retraining
# ---------------------------------------------------------

# New data has arrived.
# Retrain using the full available dataset.

batch_model.fit(
    X,
    y
)

retrained_predictions = batch_model.predict(
    X_future
)

retrained_accuracy = accuracy_score(
    y_future,
    retrained_predictions
)

print(
    "Batch Retrained Accuracy:",
    round(retrained_accuracy, 4)
)


# =========================================================
# 🧠 PART 2 – ONLINE LEARNING
# =========================================================

print("\n=== Online Learning ===\n")

# ---------------------------------------------------------
# Create Online Model
# ---------------------------------------------------------

online_model = SGDClassifier(
    random_state=42
)

# ---------------------------------------------------------
# Initial Training
# ---------------------------------------------------------

online_model.partial_fit(
    X_initial,
    y_initial,
    classes=[0, 1]
)

# ---------------------------------------------------------
# Initial Accuracy
# ---------------------------------------------------------

online_initial_predictions = online_model.predict(
    X_future
)

online_initial_accuracy = accuracy_score(
    y_future,
    online_initial_predictions
)

print(
    "Initial Online Accuracy:",
    round(online_initial_accuracy, 4)
)


# ---------------------------------------------------------
# Simulate Streaming Updates
# ---------------------------------------------------------

batch_size = 20

for start in range(
    0,
    len(X_future),
    batch_size
):

    end = start + batch_size

    X_batch = X_future.iloc[start:end]
    y_batch = y_future.iloc[start:end]

    online_model.partial_fit(
        X_batch,
        y_batch
    )


# ---------------------------------------------------------
# Evaluate Updated Online Model
# ---------------------------------------------------------

online_predictions = online_model.predict(
    X_future
)

online_accuracy = accuracy_score(
    y_future,
    online_predictions
)

print(
    "Online Learning Accuracy:",
    round(online_accuracy, 4)
)


# =========================================================
# 📊 Comparison Summary
# =========================================================

summary = pd.DataFrame({
    "Strategy": [
        "Batch Retraining",
        "Online Learning"
    ],

    "Accuracy": [
        retrained_accuracy,
        online_accuracy
    ]
})

print("\n=== Strategy Comparison ===")
print(summary)


# =========================================================
# 💾 Save Results
# =========================================================

summary.to_csv(
    "retraining_strategy_comparison.csv",
    index=False
)

print(
    "\nResults saved to retraining_strategy_comparison.csv"
)


# =========================================================
# 📂 Load Results Back
# =========================================================

saved_results = pd.read_csv(
    "retraining_strategy_comparison.csv"
)

print("\nSaved Results:\n")
print(saved_results)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🔄 Retraining Strategies (Batch vs Online Learning)

Description:

Why Retraining Is Necessary

What Is Batch Retraining?

Training the Initial Model

Periodic Retraining

Advantages of Batch Retraining

Limitations of Batch Retraining

What Is Online Learning?

Initial Online Training

Incremental Updates

Advantages of Online Learning

Limitations of Online Learning

Batch vs Online Learning

Batch Retraining

Online Learning

How Companies Use Retraining

Why Retraining Matters in MLOps

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 🔄 Retraining Strategies (Batch vs Online Learning)

Description:

Why Retraining Is Necessary

What Is Batch Retraining?

Training the Initial Model

Periodic Retraining

Advantages of Batch Retraining

Limitations of Batch Retraining

What Is Online Learning?

Initial Online Training

Incremental Updates

Advantages of Online Learning

Limitations of Online Learning

Batch vs Online Learning

Batch Retraining

Online Learning

How Companies Use Retraining

Why Retraining Matters in MLOps

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – 🧪 A/B Testing ML Models

🧠 AI with Python – 📈 Monitoring Model Performance Over Time

🧠 AI with Python – 📉 Detecting Concept Drift

Comments