I've been diving deep into Python's memory management lately, and honestly, it's been a bit of a rabbit hole. When I first heard terms like "reference counting" and "weak references," I nodded along like I understood. Spoiler: I didn't. So I decided to actually learn this stuff properly, and here's what I figured out.
How Python Memory Management Actually Works
Before we get to the cool weakref stuff, we need to understand how Python handles memory in the first place. Python uses reference counting as its primary memory management strategy, with a garbage collector as backup for tricky situations.
Reference Counting Basics
Every object in Python has a reference count—basically a number tracking how many things are pointing at it. When that number hits zero, Python immediately frees the memory.
import sys
# Create an object
data = [1, 2, 3]
print(sys.getrefcount(data)) # 2 (includes the temp ref from getrefcount)
# Add another reference
also_data = data
print(sys.getrefcount(data)) # 3
# Remove one
del also_data
print(sys.getrefcount(data)) # 2
# Remove the last real reference
del data
# Object is immediately deallocated - memory freed!The sys.getrefcount() function always returns one more than you'd expect because passing the object to the function temporarily creates another reference.
This system is elegant for most cases. Object goes out of scope? Reference count drops. Hits zero? Memory freed. No waiting, no periodic cleanup passes.
The Problem: Circular References
But here's where things get tricky. What if two objects reference each other?
class Node:
def __init__(self, name):
self.name = name
self.neighbor = None
a = Node("A")
b = Node("B")
# Create a cycle
a.neighbor = b
b.neighbor = a
del a
del b
# Uh oh... both objects still exist!
# Each has refcount of 1 (from the other)After del a and del b, both objects still have a reference count of 1—they're keeping each other alive! The variables are gone, but the objects are zombies, consuming memory forever.
This is where Python's garbage collector comes in. It periodically scans for these cycles and cleans them up. But there's a better approach: don't create the problem in the first place.
Enter weakref
The weakref module provides weak references—references that don't increase an object's reference count. The object can be garbage collected even if weak references to it exist.
import weakref
class BigData:
def __init__(self, data):
self.data = data
print(f"Created BigData with {len(data)} items")
def __del__(self):
print("BigData is being deleted!")
# Strong reference keeps object alive
obj = BigData([1] * 1000000)
# Create a weak reference
weak = weakref.ref(obj)
# Access through weak reference
print(weak()) # <BigData object>
# Delete strong reference
del obj
# Prints: BigData is being deleted!
# Weak reference now returns None
print(weak()) # NoneThe weak reference lets you access the object while it exists, but doesn't prevent it from being collected. When the object is gone, calling the weak reference returns None.
Adding Callbacks
You can get notified when the object dies:
import weakref
class Connection:
def __init__(self, host):
self.host = host
print(f"Connected to {host}")
def connection_lost(weak_ref):
print("Connection object was garbage collected!")
conn = Connection("localhost")
weak_conn = weakref.ref(conn, connection_lost)
del conn
# Prints: Connection object was garbage collected!This is useful for cleanup tasks—maybe you want to log when resources are released, or update some registry.
WeakValueDictionary: Caches That Don't Leak
This is where weakref really shines. Imagine you're building a cache:
# The naive approach - MEMORY LEAK!
class UserCache:
_cache = {}
@classmethod
def get_user(cls, user_id):
if user_id not in cls._cache:
cls._cache[user_id] = load_user_from_db(user_id)
return cls._cache[user_id]The problem? Users stay in _cache forever, even if no one else is using them. Your cache grows and grows until you restart the app.
Here's the fix:
import weakref
class User:
def __init__(self, user_id, name):
self.user_id = user_id
self.name = name
def __repr__(self):
return f"User({self.user_id}, {self.name!r})"
class UserCache:
_cache = weakref.WeakValueDictionary()
@classmethod
def get_user(cls, user_id):
if user_id in cls._cache:
print(f"Cache hit for {user_id}")
return cls._cache[user_id]
# Simulate database lookup
print(f"Loading user {user_id} from database...")
user = User(user_id, f"User{user_id}")
cls._cache[user_id] = user
return user
# Usage
user1 = UserCache.get_user(1) # Loads from database
user1_again = UserCache.get_user(1) # Cache hit!
print(f"Cache size: {len(UserCache._cache)}") # 1
# When no one holds user1 anymore...
del user1
del user1_again
# The entry disappears from cache automatically!
# (After garbage collection runs)
import gc
gc.collect()
print(f"Cache size: {len(UserCache._cache)}") # 0WeakValueDictionary holds weak references to its values. When a value is garbage collected, its key-value pair automatically disappears from the dictionary. No memory leak!
When to Use WeakValueDictionary
- Object caches: Cache expensive computations without preventing GC
- Object pools: Reuse objects while they exist
- ID → Object mappings: Look up objects by ID without keeping them alive
WeakKeyDictionary: Attaching Metadata
What if you want to store extra data about objects, without modifying those objects? And without keeping them alive forever?
import weakref
class Document:
def __init__(self, title, content):
self.title = title
self.content = content
# Store metadata without modifying Document class
metadata = weakref.WeakKeyDictionary()
doc1 = Document("README", "# Hello")
doc2 = Document("CHANGELOG", "## v1.0")
# Attach metadata
metadata[doc1] = {"views": 42, "last_accessed": "2026-03-22"}
metadata[doc2] = {"views": 7, "last_accessed": "2026-03-21"}
print(f"doc1 views: {metadata[doc1]['views']}") # 42
# Delete a document
del doc2
# Its metadata is automatically cleaned up
import gc
gc.collect()
print(list(metadata.keys())) # [<Document object>] - only doc1WeakKeyDictionary is the inverse of WeakValueDictionary—it holds weak references to its keys. When a key is garbage collected, the entry is removed.
When to Use WeakKeyDictionary
- Object metadata: Store extra info without modifying classes
- Memoization by object: Cache results keyed by mutable objects
- Tracking object state: Monitor objects without keeping them alive
weakref.finalize: Clean Cleanup
Sometimes __del__ methods get complicated—especially with circular references. weakref.finalize provides a cleaner alternative for running cleanup code when objects are collected.
import weakref
import os
import tempfile
class TempFileManager:
def __init__(self):
# Create a temp file
fd, self.path = tempfile.mkstemp()
os.close(fd)
with open(self.path, 'w') as f:
f.write("temporary data")
print(f"Created temp file: {self.path}")
# Register cleanup - will run when object is collected
# Note: cleanup function must not reference 'self'!
self._finalizer = weakref.finalize(
self,
self._cleanup,
self.path # Pass path as argument, not self.path
)
@staticmethod
def _cleanup(path):
"""Static method - can't reference self."""
if os.path.exists(path):
os.unlink(path)
print(f"Cleaned up temp file: {path}")
def __enter__(self):
return self
def __exit__(self, *args):
# Explicitly clean up
self._finalizer()
# Automatic cleanup on garbage collection
tmp = TempFileManager()
print(f"File exists: {os.path.exists(tmp.path)}") # True
del tmp
import gc
gc.collect()
# Prints: Cleaned up temp file: /tmp/xxxxxWhy finalize over __del__?
- No circular reference issues:
finalizedoesn't keep strong references - Guaranteed to run: Even if exceptions occurred
- Can be called manually:
_finalizer()triggers cleanup early - Deterministic: Check
_finalizer.aliveto see if cleanup happened
import weakref
class Resource:
def __init__(self, name):
self.name = name
self._finalizer = weakref.finalize(
self,
print,
f"Resource {name} cleaned up"
)
@property
def alive(self):
return self._finalizer.alive
def close(self):
"""Explicit cleanup."""
self._finalizer()
r = Resource("database")
print(f"Is alive: {r.alive}") # True
r.close() # Prints: Resource database cleaned up
print(f"Is alive: {r.alive}") # False
# Won't run again - already cleaned up
del rAvoiding Memory Leaks: Practical Patterns
Let me share some patterns I've learned for avoiding memory leaks.
Pattern 1: Observer Pattern with WeakSet
import weakref
class EventEmitter:
def __init__(self):
self._listeners = weakref.WeakSet()
def subscribe(self, listener):
self._listeners.add(listener)
def emit(self, event):
# Iterate safely - listeners might disappear mid-iteration
for listener in list(self._listeners):
listener.on_event(event)
class Logger:
def on_event(self, event):
print(f"LOG: {event}")
class Analytics:
def on_event(self, event):
print(f"TRACK: {event}")
emitter = EventEmitter()
logger = Logger()
analytics = Analytics()
emitter.subscribe(logger)
emitter.subscribe(analytics)
emitter.emit("user_login")
# LOG: user_login
# TRACK: user_login
# Delete logger - it automatically unsubscribes
del logger
emitter.emit("user_logout")
# TRACK: user_logout (only analytics receives it)With WeakSet, you don't need explicit unsubscribe logic. When a listener is garbage collected, it's automatically removed from the set.
Pattern 2: Back-References Without Cycles
import weakref
class Parent:
def __init__(self, name):
self.name = name
self.children = []
def add_child(self, child):
self.children.append(child)
child._parent = weakref.ref(self)
class Child:
def __init__(self, name):
self.name = name
self._parent = None
@property
def parent(self):
if self._parent is None:
return None
return self._parent() # Dereference weak ref
def __repr__(self):
parent_name = self.parent.name if self.parent else "None"
return f"Child({self.name!r}, parent={parent_name})"
p = Parent("mom")
c = Child("kid")
p.add_child(c)
print(c.parent.name) # mom
print(c) # Child('kid', parent=mom)
# No circular reference - child holds weak ref to parent
del p # Parent can be collected even though child exists
print(c.parent) # NonePattern 3: Self-Cleaning Cache with Size Limit
import weakref
from collections import OrderedDict
class LRUCache:
"""
Cache that:
1. Uses weak references (auto-cleanup when objects collected)
2. Has a max size (LRU eviction)
"""
def __init__(self, maxsize=100):
self._cache = weakref.WeakValueDictionary()
self._order = OrderedDict() # Tracks access order
self.maxsize = maxsize
def get(self, key):
if key in self._cache:
# Move to end (most recently used)
self._order.move_to_end(key)
return self._cache[key]
return None
def put(self, key, value):
# Add to cache
self._cache[key] = value
self._order[key] = None
self._order.move_to_end(key)
# Evict oldest if over size
while len(self._order) > self.maxsize:
oldest = next(iter(self._order))
del self._order[oldest]
# Don't delete from _cache - weak ref handles it
def __len__(self):
return len(self._cache)
cache = LRUCache(maxsize=3)
# Create some objects
objs = [object() for _ in range(5)]
for i, obj in enumerate(objs):
cache.put(f"key{i}", obj)
print(len(cache)) # 5 (weak refs don't evict automatically)
# Delete some objects
del objs[0]
del objs[1]
import gc
gc.collect()
print(len(cache)) # 3 (weak refs auto-removed)The gc Module: Your Debugging Friend
The gc module lets you peek under the hood and control Python's garbage collector.
Basic gc Operations
import gc
# Force a collection
collected = gc.collect()
print(f"Collected {collected} unreachable objects")
# Check GC status
print(f"GC enabled: {gc.isenabled()}")
# Temporarily disable (for performance-critical sections)
gc.disable()
# ... performance critical code ...
gc.enable()Understanding Generations
Python's GC uses a generational strategy. Objects start in generation 0 and get promoted as they survive collections:
import gc
# (gen0_threshold, gen1_threshold, gen2_threshold)
print(gc.get_threshold()) # (700, 10, 10)
# Meaning:
# - Collect gen 0 after 700 allocations - deallocations
# - Collect gen 1 after 10 gen-0 collections
# - Collect gen 2 after 10 gen-1 collections
# Current counts
print(gc.get_count()) # (123, 5, 2)
# 123 new allocations, 5 gen-0 collections since last gen-1, etc.Finding Memory Leaks
import gc
def find_objects_of_type(type_name):
"""Find all objects of a given type."""
return [
obj for obj in gc.get_objects()
if type(obj).__name__ == type_name
]
class LeakyClass:
instances = [] # Class variable - potential leak!
def __init__(self, data):
self.data = data
LeakyClass.instances.append(self) # Leak!
# Create and "delete" objects
for i in range(100):
obj = LeakyClass(f"data{i}")
del obj
gc.collect()
# Find the leaks
leaky = find_objects_of_type("LeakyClass")
print(f"Found {len(leaky)} LeakyClass objects!") # 100 - they leaked!Reference Analysis
import gc
class Mystery:
pass
obj = Mystery()
# Who references this object?
referrers = gc.get_referrers(obj)
print(f"Referenced by {len(referrers)} objects")
# What does this object reference?
obj.data = [1, 2, 3]
referents = gc.get_referents(obj)
print(f"References {len(referents)} objects") # The dict and listDebugging gc.garbage
Objects that can't be collected (usually due to __del__ cycles) end up in gc.garbage:
import gc
gc.set_debug(gc.DEBUG_SAVEALL) # Save uncollectable to gc.garbage
class Trouble:
def __init__(self, partner=None):
self.partner = partner
def __del__(self):
# Having __del__ used to cause issues with cycles
# Modern Python (3.4+) handles this better
pass
# Create uncollectable cycle
a = Trouble()
b = Trouble(a)
a.partner = b
del a, b
gc.collect()
# Check for trouble
if gc.garbage:
print(f"Uncollectable objects: {gc.garbage}")
else:
print("No garbage! (Python 3.4+ handles __del__ cycles)")
gc.set_debug(0) # Reset debugQuick Reference Table
| Tool | Use Case |
|---|---|
weakref.ref() | Single weak reference to an object |
WeakValueDictionary | Cache objects by key, auto-cleanup values |
WeakKeyDictionary | Store metadata keyed by objects |
WeakSet | Collection of objects (observers, listeners) |
weakref.finalize() | Cleanup callback when object dies |
gc.collect() | Force garbage collection |
gc.get_referrers() | Debug: who references this object? |
gc.get_objects() | Debug: all tracked objects |
What I Learned
Memory management in Python is mostly automatic—that's the beauty of it. But understanding what happens behind the scenes helps you:
- Avoid leaks by breaking circular references with weak refs
- Build efficient caches that don't balloon forever
- Implement patterns (like observers) without manual cleanup
- Debug memory issues when things go wrong
The key insight for me was this: weak references let objects die when they should. Regular references are a promise: "I need you to stay alive." Weak references are more like: "I'd like to use you if you're around, but don't stick around on my account."
Start simple—understand reference counting first. Then reach for weakref when you hit caching or observer patterns. And keep gc in your toolbox for when you need to debug memory issues.
That's it! I'm still learning, but this foundation has already helped me write better Python code. Hope it helps you too.