AW Dev Rethought

🕵️ Debugging is like being the detective in a crime movie where you are also the murderer - Filipe Fortes

🧠 Python DeepCuts — 💡 Threading vs Multiprocessing Deep Dive


Description:

Python offers multiple ways to run tasks concurrently, but the two most common approaches are threading and multiprocessing. At first glance they seem similar. Both allow multiple tasks to run simultaneously. Internally, however, they are fundamentally different in terms of:

  • memory usage
  • process isolation
  • communication
  • CPU utilisation

This DeepCut explores those differences and when each approach should be used.


🧩 Threads Share Memory

Threads run inside the same process.

shared_data = []

def worker():
    shared_data.append("thread")

All threads:

  • share memory
  • access the same variables
  • use the same resources

This makes communication easy but introduces synchronisation challenges.


🧠 Processes Have Separate Memory

Processes are completely isolated.

data = []

def worker():
    data.append("process")

Each process gets:

  • its own memory space
  • its own Python interpreter
  • its own resources

Changes made inside a process are not visible to other processes unless explicitly shared.


🔄 The GIL Changes Everything

The Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time.

For CPU-heavy tasks:

def cpu_task():
    total = 0
    for i in range(10_000_000):
        total += i

Multiple threads still compete for the same GIL.

As a result:

  • CPU-bound threading rarely improves performance
  • threads take turns executing Python code

🧬 Multiprocessing Bypasses the GIL

Each process runs its own Python interpreter.

processes = [
    multiprocessing.Process(target=cpu_task)
    for _ in range(2)
]

Because every process has:

  • its own interpreter
  • its own GIL

Python can execute work across multiple CPU cores simultaneously.

This is why multiprocessing is preferred for computation-heavy workloads.


🔍 Threads vs Processes at the OS Level

Processes are visible to the operating system.

import os

print(os.getpid())

Each process receives a unique Process ID (PID).

Threads belong to an existing process and share that process’s resources.

This difference affects:

  • memory consumption
  • scheduling
  • isolation

⚠️ Inter-Process Communication (IPC)

Because processes do not share memory, communication must be explicit.

queue = multiprocessing.Queue()

Common IPC mechanisms include:

  • Queue
  • Pipe
  • Shared Memory
  • Manager Objects

These allow processes to exchange data safely.


🧠 When to Use Which?

Use Threading

  • API calls
  • Database operations
  • File I/O
  • Network requests
  • Waiting-heavy workloads

Use Multiprocessing

  • Data processing
  • Image processing
  • Scientific computing
  • Machine learning workloads
  • CPU-intensive calculations

The workload type usually determines the right choice.


✅ Key Points

  • Threads share memory and resources
  • Processes have isolated memory spaces
  • The GIL limits CPU-bound threading
  • Multiprocessing enables true parallel execution
  • Processes communicate through IPC mechanisms
  • Threading is best for I/O-bound tasks
  • Multiprocessing is best for CPU-bound tasks

Understanding this distinction is essential when designing scalable Python applications.


Code Snippet:

import threading
import multiprocessing
import os

# Thread memory sharing
shared_data = []

def thread_worker():
    shared_data.append("thread")

t = threading.Thread(target=thread_worker)
t.start()
t.join()

print(shared_data)

# Process isolation
data = []

def process_worker():
    data.append("process")
    print("Inside process:", data)
    print("PID:", os.getpid())

p = multiprocessing.Process(target=process_worker)
p.start()
p.join()

print("Main process:", data)

# CPU-bound task
def cpu_task():
    total = 0
    for i in range(10_000_000):
        total += i

# IPC example
def queue_worker(queue):
    queue.put("hello")

queue = multiprocessing.Queue()

p = multiprocessing.Process(
    target=queue_worker,
    args=(queue,)
)

p.start()
print(queue.get())
p.join()

print("Main PID:", os.getpid())

Link copied!

Comments

Add Your Comment

Comment Added!