📊 Python Data Workflows – 🌟 Feature Engineering Basics 🐍

Posted on: May 1, 2026

Description:

After cleaning a dataset, the next step is often feature engineering.

Feature engineering means creating new columns from existing data so the dataset becomes more useful for analysis. Sometimes the raw columns are not enough. You may need derived values, categories, date parts, or transformed fields to understand the data better.

Why Feature Engineering Matters

Raw data usually tells only part of the story.

For example, a dataset may contain sales, quantity, and order_date. These columns are useful, but we can make them more powerful by creating new features like:

revenue per item
order month
order weekday
sales category

These new columns make analysis easier and more meaningful.

Creating New Numeric Features

One simple transformation is creating a new metric from existing columns.

df["revenue_per_item"] = df["sales"] / df["quantity"]

Instead of only looking at total sales, this gives us a better understanding of average value per item sold.

Extracting Date Features

Dates are very useful, but they need to be transformed first.

df["order_month"] = df["order_date"].dt.month
df["order_weekday"] = df["order_date"].dt.day_name()

With these features, we can analyse monthly trends or weekday patterns.

Creating Business Labels

Feature engineering is not only about numbers. Sometimes it is useful to convert values into readable labels.

df["sales_category"] = df["sales"].apply(sales_category)

This helps group records into simple buckets like High, Medium, and Low.

Encoding Categories

Some workflows need text values converted into numbers.

df["category_code"] = df["category"].astype("category").cat.codes

This is useful when preparing data for analysis, dashboards, or machine learning models.

Key Takeaways

Feature engineering adds more meaning to raw data
New columns can be created from numbers, dates, and text
Date features help with time-based analysis
Encoded categories are useful for analysis and ML workflows

Code Snippet:

import pandas as pd

df = pd.read_csv("sample_data.csv")

print("✅ Data Loaded")
print(df.head())


df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")
df["sales"] = pd.to_numeric(df["sales"], errors="coerce")
df["quantity"] = pd.to_numeric(df["quantity"], errors="coerce")

print("✅ Data types fixed")


df["revenue_per_item"] = df["sales"] / df["quantity"]

print(df[["sales", "quantity", "revenue_per_item"]].head())


df["order_year"] = df["order_date"].dt.year
df["order_month"] = df["order_date"].dt.month
df["order_day"] = df["order_date"].dt.day
df["order_weekday"] = df["order_date"].dt.day_name()

print(df[["order_date", "order_year", "order_month", "order_weekday"]].head())


def sales_category(sales):
    if sales >= 2000:
        return "High"
    elif sales >= 1000:
        return "Medium"
    else:
        return "Low"


df["sales_category"] = df["sales"].apply(sales_category)

print(df[["sales", "sales_category"]].head())


df["category_code"] = df["category"].astype("category").cat.codes

print(df[["category", "category_code"]].head())


df.to_csv("feature_engineered_data.csv", index=False)

print("💾 Feature engineered dataset saved")

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

📊 Python Data Workflows – 🌟 Feature Engineering Basics 🐍

Description:

Why Feature Engineering Matters

Creating New Numeric Features

Extracting Date Features

Creating Business Labels

Encoding Categories

Key Takeaways

Code Snippet:

Comments

Add Your Comment

📊 Python Data Workflows – 🌟 Feature Engineering Basics 🐍

Description:

Why Feature Engineering Matters

Creating New Numeric Features

Extracting Date Features

Creating Business Labels

Encoding Categories

Key Takeaways

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

📊 Python Data Workflows – ✅ Data Validation 🐍

📊 Python Data Workflows – 📡 API Data Retriever 🐍

📊 Python Data Workflows – 🗄️ SQL + Python 🐍

Comments