⚡️ Saturday ML Spark – 💾 Save & Load Models with joblib
Posted on: February 21, 2026
Description:
Training a machine learning model can take time and computational resources. In real-world systems, we don’t retrain a model every time we need predictions — instead, we save the trained model and reuse it.
In this project, we explore how to persist trained models using joblib, a lightweight and efficient tool for serializing Python objects.
Understanding the Problem
When you train a model:
- The model learns parameters
- Those parameters exist only in memory
- Once the program ends, they are lost
To use a model in production — APIs, dashboards, batch systems — we need a way to store and reload it.
That’s where model persistence comes in.
Why joblib?
While Python’s built-in pickle can serialize objects, joblib is optimized for:
- Large NumPy arrays
- scikit-learn models
- Faster serialization
- Efficient disk storage
It’s widely used in production ML workflows.
1. Train a Machine Learning Model
We begin by training a model as usual.
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(
n_estimators=200,
random_state=42
)
model.fit(X_train, y_train)
At this point, the model exists only in memory.
2. Save the Trained Model
We serialize the model into a file.
import joblib
joblib.dump(model, "random_forest_model.pkl")
This creates a .pkl file containing:
- Model parameters
- Learned weights
- Configuration settings
3. Load the Saved Model
Later — even in a different script — we can reload it.
loaded_model = joblib.load("random_forest_model.pkl")
No retraining required.
4. Use the Loaded Model for Predictions
preds = loaded_model.predict(X_test)
The predictions will match those from the original trained model.
Why Model Persistence Matters
Saving models enables:
- Deployment in APIs (FastAPI, Flask, etc.)
- Sharing models across teams
- Reproducible ML workflows
- Faster inference pipelines
Model persistence is the bridge between experimentation and real-world systems.
Key Takeaways
- joblib efficiently saves scikit-learn models.
- Saved models can be reused without retraining.
- .pkl files store model state and parameters.
- Critical for deployment and production systems.
- A foundational ML engineering skill.
Conclusion
Saving and loading models with joblib is a simple yet essential technique in practical machine learning. It ensures that trained models can be reused, deployed, and shared efficiently — making it a core component of production-ready ML systems.
This completes another topic in Saturday ML Spark ⚡️ – Advanced & Practical.
Code Snippet:
import joblib
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.3,
random_state=42,
stratify=y
)
model = RandomForestClassifier(
n_estimators=200,
random_state=42
)
model.fit(X_train, y_train)
joblib.dump(model, "random_forest_model.pkl")
loaded_model = joblib.load("random_forest_model.pkl")
preds = loaded_model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, preds))
No comments yet. Be the first to comment!