AW Dev Rethought

Truth can only be found in one place: the code - Robert C. Martin

🧠 AI with Python – 🚀 XGBoost for Tabular Data


Description:

When working with structured or tabular data, choosing the right model can make a huge difference in performance. While basic models work well for simple problems, real-world datasets often require more powerful techniques.

One of the most effective algorithms for such cases is XGBoost.

In this project, we explore how XGBoost works and why it is one of the most widely used models for tabular machine learning tasks.


Understanding the Problem

Tabular datasets often contain:

  • non-linear relationships
  • feature interactions
  • noise and inconsistencies

Traditional models like Logistic Regression or Decision Trees may struggle to capture these complexities effectively.

We need a model that can:

  • learn complex patterns
  • reduce errors iteratively
  • generalize well to unseen data

What Is XGBoost?

XGBoost stands for Extreme Gradient Boosting.

It is an advanced implementation of gradient boosting that builds models sequentially, where each new model focuses on correcting the mistakes of the previous ones.

This results in:

  • improved accuracy
  • better generalisation
  • strong performance on structured data

1. Training the Model

We initialise and train the XGBoost classifier.

from xgboost import XGBClassifier

xgb_model = XGBClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=4
)

xgb_model.fit(X_train, y_train)

Each parameter plays a role in controlling the learning process.


2. Making Predictions

y_pred = xgb_model.predict(X_test)
y_probs = xgb_model.predict_proba(X_test)[:, 1]

We use both class predictions and probabilities for evaluation.


3. Evaluating Performance

accuracy_score(y_test, y_pred)
roc_auc_score(y_test, y_probs)

Metrics like ROC-AUC help measure how well the model separates classes.


Why XGBoost Works So Well

XGBoost stands out because it:

  • builds models sequentially to reduce errors
  • captures complex feature interactions
  • uses regularisation to prevent overfitting
  • is optimised for speed and efficiency
  • scales well to large datasets

Where XGBoost Is Used

XGBoost is widely applied in:

  • financial risk modeling
  • fraud detection
  • recommendation systems
  • customer churn prediction
  • competitive machine learning (Kaggle)

Key Takeaways

  1. XGBoost is a powerful algorithm for tabular data.
  2. It improves predictions through sequential learning.
  3. It handles complex patterns better than basic models.
  4. Regularisation helps control overfitting.
  5. A must-know tool for advanced ML practitioners.

Conclusion

XGBoost is one of the most practical and high-performing algorithms in machine learning. Its ability to combine speed, flexibility, and accuracy makes it a top choice for real-world tabular problems.

This marks the beginning of the Advanced ML track in the AI with Python series — moving beyond basics into high-performance modeling.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, roc_auc_score

from xgboost import XGBClassifier


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# =========================================================
# 🚀 Initialize XGBoost Model
# =========================================================

xgb_model = XGBClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=4,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    eval_metric="logloss"
)


# =========================================================
# 🤖 Train Model
# =========================================================

xgb_model.fit(X_train, y_train)


# =========================================================
# 📊 Generate Predictions
# =========================================================

y_pred = xgb_model.predict(X_test)
y_probs = xgb_model.predict_proba(X_test)[:, 1]


# =========================================================
# ✅ Evaluate Model
# =========================================================

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

print("ROC-AUC:", roc_auc_score(y_test, y_probs))


# =========================================================
# 🔍 Predict on New Data
# =========================================================

sample = X_test.iloc[:5]
predictions = xgb_model.predict(sample)

print("\nSample Predictions:", predictions)

Link copied!

Comments

Add Your Comment

Comment Added!