I stumbled onto weakref while debugging a memory leak in a caching system. The cache was supposed to improve performance, but instead it was hoarding objects forever, slowly eating RAM until the process crashed.
The fix? One import and about five lines of code. But understanding why it worked took me down a rabbit hole of Python memory management. Here's what I learned.
The Problem With Regular References
In Python, when you assign an object to a variable, you create a reference to it. The object stays alive as long as at least one reference exists:
class ExpensiveObject:
def __init__(self, name):
self.name = name
print(f"Created {name}")
def __del__(self):
print(f"Destroyed {self.name}")
# Create an object
obj = ExpensiveObject("heavy-data")
# Output: Created heavy-data
# Object stays alive because 'obj' references it
del obj
# Output: Destroyed heavy-dataSo far so good. But what happens when you cache that object?
cache = {}
def get_data(key):
if key not in cache:
cache[key] = ExpensiveObject(key)
return cache[key]
obj = get_data("user-123")
# Output: Created user-123
del obj # We're done with it locally...
# But nothing prints! The object is still alive in the cache.The cache holds a reference, so the object lives forever. This is often not what you want. If the object is only useful while something else is using it, the cache is just leaking memory.
Enter weakref.ref()
A weak reference refers to an object without keeping it alive. If the only remaining references are weak, the garbage collector can reclaim the object:
import weakref
class ExpensiveObject:
def __init__(self, name):
self.name = name
print(f"Created {name}")
def __del__(self):
print(f"Destroyed {self.name}")
# Create an object and a weak reference to it
obj = ExpensiveObject("important")
# Output: Created important
weak = weakref.ref(obj)
# Access the object through the weak reference
print(weak()) # <ExpensiveObject object>
print(weak().name) # "important"
# The strong reference goes away
del obj
# Output: Destroyed important
# Now the weak reference returns None
print(weak()) # NoneThe first time I saw this, I was confused. Why would you want a reference that might become None? It felt fragile.
What helped me understand: think of it like a lookup table that says "if this thing still exists, here it is." You're not claiming ownership—you're just keeping a way to find it.
WeakValueDictionary: Caches That Clean Themselves
This is where weakref really shines. A WeakValueDictionary holds weak references to its values. When nothing else references a value, it gets garbage collected and automatically removed from the dict:
import weakref
class UserSession:
def __init__(self, user_id):
self.user_id = user_id
print(f"Session created for {user_id}")
def __del__(self):
print(f"Session cleaned up for {self.user_id}")
# A cache that won't hold onto sessions forever
session_cache = weakref.WeakValueDictionary()
def get_session(user_id):
if user_id not in session_cache:
session_cache[user_id] = UserSession(user_id)
return session_cache[user_id]
# Get a session
session = get_session("user-42")
# Output: Session created for user-42
print(len(session_cache)) # 1
# User logs out, we drop our reference
del session
# Output: Session cleaned up for user-42
print(len(session_cache)) # 0 - automatically removed!This blew my mind. The cache cleans itself. No manual invalidation, no TTL management, no "have we already cached too much?" checks. If nothing needs the object, it disappears.
WeakKeyDictionary: When You Want to Annotate Objects
WeakKeyDictionary is the flip side—it holds weak references to its keys. This is useful when you want to attach metadata to objects without preventing their cleanup:
import weakref
# Track extra metadata for objects without modifying them
metadata = weakref.WeakKeyDictionary()
class Request:
def __init__(self, path):
self.path = path
def process_request(req):
# Attach timing info to the request
metadata[req] = {"started_at": time.time()}
# ... do work ...
# When the request is garbage collected,
# its metadata entry disappears tooI found this useful when I needed to track information about objects I didn't control (from a library) without subclassing or monkey-patching.
WeakSet: Observer Pattern Done Right
The observer pattern often creates accidental strong references. If your subject holds a list of observers, those observers can never be garbage collected while the subject exists:
# The naive approach - observers live forever
class Subject:
def __init__(self):
self.observers = [] # Strong references!
def attach(self, observer):
self.observers.append(observer)
def notify(self):
for obs in self.observers:
obs.update()With WeakSet, observers can come and go freely:
import weakref
class Subject:
def __init__(self):
self.observers = weakref.WeakSet()
def attach(self, observer):
self.observers.add(observer)
def notify(self):
# Dead observers automatically removed
for obs in self.observers:
obs.update()
class Observer:
def __init__(self, name):
self.name = name
def update(self):
print(f"{self.name} notified!")
def __del__(self):
print(f"{self.name} garbage collected")
subject = Subject()
obs1 = Observer("Observer-1")
obs2 = Observer("Observer-2")
subject.attach(obs1)
subject.attach(obs2)
print(f"Observers: {len(subject.observers)}") # 2
del obs1
# Output: Observer-1 garbage collected
print(f"Observers: {len(subject.observers)}") # 1No need to explicitly unsubscribe. When the observer is no longer needed elsewhere, it gets cleaned up and removed from the set automatically.
Breaking Circular References
This was a tricky concept for me. Consider two objects that reference each other:
class Parent:
def __init__(self):
self.child = None
class Child:
def __init__(self, parent):
self.parent = parent # Strong reference back to parent
parent = Parent()
child = Child(parent)
parent.child = child
# Now parent -> child -> parent
# Even after del parent, del child, they might stick aroundPython's cycle detector usually handles this, but it adds overhead and can delay cleanup. Using a weak reference for the "back" link is cleaner:
import weakref
class Child:
def __init__(self, parent):
self.parent = weakref.ref(parent) # Weak reference back
def get_parent(self):
# Might return None if parent was collected
return self.parent()When Does Garbage Collection Actually Happen?
This tripped me up. I expected objects to be collected immediately when the last strong reference disappeared. That's not guaranteed:
import weakref
import gc
class Thing:
def __del__(self):
print("Collected!")
thing = Thing()
weak = weakref.ref(thing)
del thing
# "Collected!" might not print immediately in all cases
# Force collection if you need it now
gc.collect()In CPython (the standard Python), reference counting usually triggers immediate cleanup. But:
- Circular references wait for the cycle collector
- Other Python implementations (PyPy, Jython) have different GC timing
__del__methods can have their own timing quirks
What I learned: Don't rely on when cleanup happens—just trust that it will happen.
Objects That Can't Be Weakly Referenced
Not everything supports weak references:
import weakref
# These work
weakref.ref([1, 2, 3]) # Error! list doesn't support weakref
weakref.ref((1, 2, 3)) # Error! tuple doesn't support weakref
weakref.ref("hello") # Error! str doesn't support weakref
# These do work
class MyClass: pass
weakref.ref(MyClass()) # Works
weakref.ref(lambda: None) # WorksBuilt-in types like list, dict, tuple, and str don't support weak references directly. For those, you'd need to subclass or wrap them.
Callbacks: Do Something When Objects Die
You can register a callback that runs when the referenced object is garbage collected:
import weakref
def on_finalize(ref):
print(f"Object was garbage collected!")
class Resource:
pass
obj = Resource()
weak = weakref.ref(obj, on_finalize)
del obj
# Output: Object was garbage collected!This is useful for cleanup logic—closing files, releasing locks, logging for debugging.
Memory Implications: Real Numbers
Here's a quick comparison of memory behavior:
import weakref
import sys
class Data:
def __init__(self):
self.payload = "x" * 10000 # 10KB of data
# Strong reference dict
strong_cache = {}
for i in range(1000):
strong_cache[i] = Data()
print(f"Strong cache entries: {len(strong_cache)}") # 1000
# Memory: ~10MB held indefinitely
# Weak reference dict
weak_cache = weakref.WeakValueDictionary()
refs = []
for i in range(1000):
obj = Data()
weak_cache[i] = obj
if i < 100: # Only keep strong refs to first 100
refs.append(obj)
import gc; gc.collect()
print(f"Weak cache entries: {len(weak_cache)}") # ~100
# Memory: ~1MB (only objects with strong refs remain)What I Learned
-
Weak references are for "if it exists" relationships. The cache doesn't own the data—it just knows where to find it if someone else is using it.
-
WeakValueDictionaryis a self-cleaning cache. Entries disappear when nothing else needs them. No TTL logic, no max-size eviction, no manual invalidation. -
WeakSetis perfect for observers. Subscribers come and go without memory leaks or explicit unsubscribe calls. -
Circular references are cleaner with weak back-references. Child-to-parent links often don't need to keep the parent alive.
-
Not everything supports weakref. Built-in types like
listandstrdon't. Custom classes do by default. -
Don't depend on GC timing. Objects will be collected eventually. Don't write code that requires it to happen at a specific moment.
The memory leak I was debugging? It was a cache of parsed config objects. The fix was changing dict() to WeakValueDictionary(). The configs stayed cached while in use, then disappeared when their owners went away.
Sometimes the best fix is just asking Python to handle it.
Have you used weakref in production? I'm curious what patterns others have found useful.