📊 Python Data Workflows – 🌟 Feature Engineering Basics 🐍
Posted on: May 1, 2026
Description:
After cleaning a dataset, the next step is often feature engineering.
Feature engineering means creating new columns from existing data so the dataset becomes more useful for analysis. Sometimes the raw columns are not enough. You may need derived values, categories, date parts, or transformed fields to understand the data better.
Why Feature Engineering Matters
Raw data usually tells only part of the story.
For example, a dataset may contain sales, quantity, and order_date. These columns are useful, but we can make them more powerful by creating new features like:
- revenue per item
- order month
- order weekday
- sales category
These new columns make analysis easier and more meaningful.
Creating New Numeric Features
One simple transformation is creating a new metric from existing columns.
df["revenue_per_item"] = df["sales"] / df["quantity"]
Instead of only looking at total sales, this gives us a better understanding of average value per item sold.
Extracting Date Features
Dates are very useful, but they need to be transformed first.
df["order_month"] = df["order_date"].dt.month
df["order_weekday"] = df["order_date"].dt.day_name()
With these features, we can analyse monthly trends or weekday patterns.
Creating Business Labels
Feature engineering is not only about numbers. Sometimes it is useful to convert values into readable labels.
df["sales_category"] = df["sales"].apply(sales_category)
This helps group records into simple buckets like High, Medium, and Low.
Encoding Categories
Some workflows need text values converted into numbers.
df["category_code"] = df["category"].astype("category").cat.codes
This is useful when preparing data for analysis, dashboards, or machine learning models.
Key Takeaways
- Feature engineering adds more meaning to raw data
- New columns can be created from numbers, dates, and text
- Date features help with time-based analysis
- Encoded categories are useful for analysis and ML workflows
Code Snippet:
import pandas as pd
df = pd.read_csv("sample_data.csv")
print("✅ Data Loaded")
print(df.head())
df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")
df["sales"] = pd.to_numeric(df["sales"], errors="coerce")
df["quantity"] = pd.to_numeric(df["quantity"], errors="coerce")
print("✅ Data types fixed")
df["revenue_per_item"] = df["sales"] / df["quantity"]
print(df[["sales", "quantity", "revenue_per_item"]].head())
df["order_year"] = df["order_date"].dt.year
df["order_month"] = df["order_date"].dt.month
df["order_day"] = df["order_date"].dt.day
df["order_weekday"] = df["order_date"].dt.day_name()
print(df[["order_date", "order_year", "order_month", "order_weekday"]].head())
def sales_category(sales):
if sales >= 2000:
return "High"
elif sales >= 1000:
return "Medium"
else:
return "Low"
df["sales_category"] = df["sales"].apply(sales_category)
print(df[["sales", "sales_category"]].head())
df["category_code"] = df["category"].astype("category").cat.codes
print(df[["category", "category_code"]].head())
df.to_csv("feature_engineered_data.csv", index=False)
print("💾 Feature engineered dataset saved")
No comments yet. Be the first to comment!