🧠 Python DeepCuts — 💡 Garbage Collection & Reference Counting


Description:

Python hides memory management from developers — but behind the scenes, CPython uses a two-part garbage collection system:

  1. Reference Counting (primary mechanism)
  2. Cyclic Garbage Collector (fallback for reference cycles)

Most of the time, memory is freed instantly.

But in some cases — especially when cycles or destructors are involved — objects linger longer than expected and may cause memory leaks.

This DeepCut walks through how CPython frees memory, how cycles are detected, how to inspect reference counts, and what patterns cause real-world leaks.


🧩 Reference Counting — CPython’s First Line of Defense

Every Python object carries an internal counter tracking how many references point to it.

When that counter hits zero, the object is freed immediately.

x = [1, 2, 3]
print("Initial refcount:", sys.getrefcount(x))

y = x
print("After alias:", sys.getrefcount(x))

del y
print("After del y:", sys.getrefcount(x))

Because x and y refer to the same object, aliasing increases the count, and del decreases it again.

This is why Python appears to “magically” free objects as soon as they go out of scope.


🔍 Why sys.getrefcount() Is Always +1

When you call:

sys.getrefcount(obj)

the argument itself creates a temporary reference.

That’s why objects usually show a count one higher than expected:

a = []
print(sys.getrefcount(a))   # Usually 2 — (a + function arg)

Understanding this avoids confusion when debugging reference behavior.


🧠 The Limitation: Reference Cycles

Reference counting fails when two objects reference each other.

Example:

a = {}
b = {}

a["ref"] = b
b["ref"] = a

Example:

del a
del b

these objects remain in memory because each keeps the other alive.

This phenomenon creates memory that cannot be freed by refcounting alone.


🔄 The Cyclic Garbage Collector

To solve the cycle issue, Python’s cyclic GC periodically scans for groups of objects that:

  • reference each other
  • but are no longer reachable from the program

You can trigger it manually:

gc.set_debug(gc.DEBUG_SAVEALL)

x = []
x.append(x)

del x
gc.collect()

print("Unreachable objects:", gc.garbage)

The GC finds and collects these unreachable cycles, restoring memory.


⚠️ Memory Leaks in Python — Yes, They Exist

While Python prevents many memory issues, leaks can still happen when:

  1. A cycle includes an object with del

Destructors complicate GC because Python cannot easily determine a safe deletion order.

class Node:
    def __init__(self):
        self.ref = self
    def __del__(self):
        pass   # creates GC complications
  1. Objects are stored in long-lived globals or registries

Example:

LEAKS = []
LEAKS.append(Node())

Even after the local object is deleted, the global reference keeps it alive.

  1. Caches and registries without cleanup

This is why frameworks often use weak references.


🧹 Fixing Leaks Using weakref

A weakref allows an object to be stored without increasing its reference count.

When the object is no longer needed, it is freed automatically.

import weakref

registry = weakref.WeakValueDictionary()

class User:
    pass

u = User()
registry["user"] = u
print("Before GC:", dict(registry))

del u
gc.collect()

print("After GC:", dict(registry))  # empty — no leaks

This pattern is widely used in:

  • LRU caches
  • ORM registries
  • GUI frameworks
  • Object pools

✅ Key Points

  • CPython frees memory via reference counting.
  • Cycles require the cyclic garbage collector.
  • sys.getrefcount() shows temporary references.
  • Objects with del inside cycles can leak.
  • Global registries and caches can easily leak memory.
  • Weak references help prevent unintentional retention.

Understanding Python’s GC system is essential for writing efficient, leak-free applications — especially long-running services, ETL jobs, APIs, and ML pipelines.


Code Snippet:

import sys
import gc
import weakref

# 1 — Reference counting basics
x = [1, 2, 3]
print("Initial refcount:", sys.getrefcount(x))

y = x
print("After alias:", sys.getrefcount(x))

del y
print("After del y:", sys.getrefcount(x))

# 2 — getrefcount +1 effect
a = []
print(sys.getrefcount(a))

# 3 — Reference cycles
a = {}
b = {}
a["ref"] = b
b["ref"] = a

print("Refcount for a:", sys.getrefcount(a))
print("Refcount for b:", sys.getrefcount(b))

del a, b

# 4 — Detecting cycles via gc
gc.set_debug(gc.DEBUG_SAVEALL)

x = []
x.append(x)
del x

gc.collect()
print("Unreachable objects:", gc.garbage)

# 5 — Memory leak example
LEAKS = []

class Node:
    def __init__(self):
        self.ref = self
    def __del__(self):
        pass

n = Node()
LEAKS.append(n)
del n

gc.collect()
print("Leak count:", len(LEAKS))

# 6 — Using WeakRef to avoid leaks
registry = weakref.WeakValueDictionary()

class User:
    pass

u = User()
registry["user"] = u
print("Before GC:", dict(registry))

del u
gc.collect()

print("After GC:", dict(registry))

Link copied!

Comments

Add Your Comment

Comment Added!