When I started writing Python, I did everything with lists and dicts. Need to count things? Loop through and increment a dict. Need to group items? Initialize empty lists everywhere. Need a queue? Use a list and call pop(0).
It all worked, but it was clunky. Then a senior engineer reviewed my code and asked, "Why aren't you using Counter?"
That question sent me down a rabbit hole into Python's collections module. Turns out there's a whole toolkit of specialized containers designed for exactly the patterns I was implementing by hand. Here's everything I've learned about them.
Counter: Stop Writing Counting Loops
I used to write counting code like this:
# The old way
word_counts = {}
for word in words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1Then I discovered Counter:
from collections import Counter
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
counts = Counter(words)
print(counts) # Counter({'apple': 3, 'banana': 2, 'cherry': 1})That's it. One line.
The Things That Blew My Mind
Missing keys return 0, not KeyError:
counts = Counter(["a", "b", "a"])
print(counts["a"]) # 2
print(counts["z"]) # 0, not KeyError!This is huge. No more if key in dict checks everywhere.
Most common elements, instantly:
text = "the quick brown fox jumps over the lazy dog"
letter_counts = Counter(text.replace(" ", ""))
print(letter_counts.most_common(3)) # [('o', 4), ('e', 3), ('u', 2)]I used to sort dicts manually with lambda functions. Never again.
Counter arithmetic:
This one I didn't expect. You can add and subtract Counters:
inventory = Counter(apples=10, oranges=5, bananas=7)
sold = Counter(apples=3, oranges=2)
remaining = inventory - sold
print(remaining) # Counter({'bananas': 7, 'apples': 7, 'oranges': 3})And it gets better—you can find common elements (intersection) or combined totals (union):
basket1 = Counter(["apple", "apple", "orange"])
basket2 = Counter(["apple", "banana", "banana"])
# What's common (minimum counts)
print(basket1 & basket2) # Counter({'apple': 1})
# Combined (maximum counts)
print(basket1 | basket2) # Counter({'apple': 2, 'banana': 2, 'orange': 1})Real-World Example: Log Analysis
I use Counter constantly for quick data analysis:
from collections import Counter
def analyze_access_log(log_lines):
"""Analyze web server access logs."""
status_codes = Counter()
endpoints = Counter()
for line in log_lines:
parts = line.split()
status = parts[8] # HTTP status code
endpoint = parts[6] # Request path
status_codes[status] += 1
endpoints[endpoint] += 1
print("Top 5 endpoints:")
for endpoint, count in endpoints.most_common(5):
print(f" {endpoint}: {count}")
print("\nStatus code breakdown:")
for status, count in sorted(status_codes.items()):
print(f" {status}: {count}")defaultdict: No More "Initialize If Missing"
Before defaultdict, grouping items looked like this:
# The old way
groups = {}
for name, category in items:
if category not in groups:
groups[category] = []
groups[category].append(name)With defaultdict:
from collections import defaultdict
groups = defaultdict(list)
for name, category in items:
groups[category].append(name) # Just append. The list is auto-created.Common Default Factories
List (grouping):
employees_by_dept = defaultdict(list)
for emp in employees:
employees_by_dept[emp.department].append(emp)Set (unique grouping):
users_by_role = defaultdict(set)
for user in users:
for role in user.roles:
users_by_role[role].add(user)Int (counting):
# Alternative to Counter for simple cases
counts = defaultdict(int)
for item in items:
counts[item] += 1Lambda for custom defaults:
# Default to a specific value
settings = defaultdict(lambda: "unknown")
settings["theme"] = "dark"
print(settings["theme"]) # "dark"
print(settings["missing"]) # "unknown"The Nested defaultdict Trick
This one is magic. Infinite nesting without initialization:
def nested_dict():
return defaultdict(nested_dict)
data = nested_dict()
data["users"]["alice"]["settings"]["theme"] = "dark"
data["users"]["bob"]["profile"]["name"] = "Bob"
# No KeyError, no pre-initialization needed!I use this for building tree structures from flat data:
def build_org_tree(employees):
tree = lambda: defaultdict(tree)
org = tree()
for emp in employees:
dept = emp.department
team = emp.team
org[dept][team][emp.name] = emp.title
return orgReal-World Example: Building an Index
from collections import defaultdict
def build_inverted_index(documents):
"""Build a search index from documents."""
index = defaultdict(set)
for doc_id, text in documents:
words = text.lower().split()
for word in words:
index[word].add(doc_id)
return index
documents = [
(1, "Python is great for data science"),
(2, "Data engineering with Python"),
(3, "Machine learning and data science"),
]
index = build_inverted_index(documents)
print(index["python"]) # {1, 2}
print(index["data"]) # {1, 2, 3}OrderedDict: When Order Actually Matters
"Wait," you might say, "dicts are ordered since Python 3.7. Why do I need OrderedDict?"
Good question. I wondered the same thing. Here's why OrderedDict still matters:
Equality Considers Order
Regular dicts don't care about order when comparing:
d1 = {"a": 1, "b": 2}
d2 = {"b": 2, "a": 1}
print(d1 == d2) # TrueOrderedDict does:
from collections import OrderedDict
od1 = OrderedDict([("a", 1), ("b", 2)])
od2 = OrderedDict([("b", 2), ("a", 1)])
print(od1 == od2) # False!This matters when order is semantically meaningful—like steps in a workflow.
move_to_end() and popitem()
These are the killer features:
from collections import OrderedDict
od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
# Move an item to the end
od.move_to_end("a")
print(list(od.keys())) # ['b', 'c', 'a']
# Move to the beginning
od.move_to_end("c", last=False)
print(list(od.keys())) # ['c', 'b', 'a']
# Pop from either end
od.popitem(last=True) # Removes 'a'
od.popitem(last=False) # Removes 'c'Real-World Example: LRU Cache
This is the classic use case—implementing a Least Recently Used cache:
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity: int):
self.cache = OrderedDict()
self.capacity = capacity
def get(self, key):
if key not in self.cache:
return None
# Move accessed item to end (most recently used)
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
# Evict oldest if over capacity
if len(self.cache) > self.capacity:
self.cache.popitem(last=False)
# Usage
cache = LRUCache(3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
cache.get("a") # Access 'a', moves to end
cache.put("d", 4) # Evicts 'b' (least recently used)(In practice, use functools.lru_cache for function memoization—but understanding this pattern helps!)
When to Use Regular dict vs OrderedDict
Use regular dict when:
- You just need fast key-value storage
- Order doesn't affect correctness
- You're on Python 3.7+
Use OrderedDict when:
- Order-aware equality matters
- You need
move_to_end()orpopitem(last=False) - Implementing ordered data structures (caches, queues)
deque: The Right Tool for Queues
I used to use lists as queues. Then I learned that list.pop(0) is O(n), not O(1).
For every item you remove from the front, Python shifts every other item down. With large lists, this kills performance.
Enter deque (pronounced "deck")—a double-ended queue with O(1) operations on both ends:
from collections import deque
q = deque()
q.append("first") # Add to right: O(1)
q.append("second")
q.appendleft("zero") # Add to left: O(1)
q.popleft() # Remove from left: O(1)
q.pop() # Remove from right: O(1)The maxlen Feature
This is incredibly useful. Create a fixed-size deque, and old items automatically drop off:
# Keep only the last 5 items
recent = deque(maxlen=5)
for i in range(10):
recent.append(i)
print(list(recent)) # [5, 6, 7, 8, 9]Perfect for:
- Recent activity logs
- Sliding windows
- Rolling averages
- Undo history (limited depth)
Rotation
Deques can rotate efficiently:
d = deque([1, 2, 3, 4, 5])
d.rotate(2) # Rotate right
print(list(d)) # [4, 5, 1, 2, 3]
d.rotate(-2) # Rotate left
print(list(d)) # [1, 2, 3, 4, 5]Real-World Example: BFS with deque
Breadth-first search is a classic deque use case:
from collections import deque
def bfs_shortest_path(graph, start, end):
"""Find shortest path using BFS."""
queue = deque([(start, [start])])
visited = {start}
while queue:
node, path = queue.popleft() # O(1)!
if node == end:
return path
for neighbor in graph.get(node, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append((neighbor, path + [neighbor]))
return None # No path found
graph = {
"A": ["B", "C"],
"B": ["D", "E"],
"C": ["F"],
"D": [],
"E": ["F"],
"F": [],
}
path = bfs_shortest_path(graph, "A", "F")
print(path) # ['A', 'C', 'F']Real-World Example: Moving Average
from collections import deque
class MovingAverage:
def __init__(self, window_size: int):
self.window = deque(maxlen=window_size)
def add(self, value: float) -> float:
self.window.append(value)
return sum(self.window) / len(self.window)
avg = MovingAverage(3)
print(avg.add(1)) # 1.0
print(avg.add(2)) # 1.5
print(avg.add(3)) # 2.0
print(avg.add(4)) # 3.0 (window is now [2, 3, 4])namedtuple vs dataclasses: When to Use Which
This one confused me for a while. Both create classes for holding data. When do you use which?
namedtuple: The Quick Option
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
print(p.x, p.y) # 3 4
print(p[0]) # 3 (indexable like a tuple)
x, y = p # Unpacking worksKey characteristics:
- Immutable (like tuples)
- Indexable and unpackable
- Memory efficient
- Works with tuple APIs
dataclass: The Full-Featured Option
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
p = Point(3, 4)
p.x = 5 # Mutable by defaultKey characteristics:
- Mutable by default (can be frozen)
- More features (defaults, field options, post_init)
- Better IDE support
- Type annotations are enforced in the definition
My Decision Framework
Use namedtuple when:
- You need tuple behavior (indexing, unpacking)
- Interoperating with code expecting tuples
- Simple, immutable records
- You want minimal overhead
# Good namedtuple use case: returning multiple values
from collections import namedtuple
Result = namedtuple("Result", ["success", "value", "error"])
def parse_config(path):
try:
# ... parsing logic ...
return Result(True, config, None)
except Exception as e:
return Result(False, None, str(e))
# Can unpack the result
success, value, error = parse_config("config.yaml")Use dataclass when:
- You need mutability
- You want defaults and computed fields
- You need validation (post_init)
- It's a domain object, not just a data carrier
# Good dataclass use case: domain object with behavior
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Task:
title: str
priority: int = 1
completed: bool = False
created_at: datetime = field(default_factory=datetime.now)
def complete(self):
self.completed = True
def __post_init__(self):
if not 1 <= self.priority <= 5:
raise ValueError(f"Priority must be 1-5")The Hybrid: NamedTuple from typing
There's also a typed version of namedtuple that feels more like a dataclass:
from typing import NamedTuple
class Point(NamedTuple):
x: float
y: float
label: str = "" # Defaults work!
p = Point(3, 4, "origin")This gives you type hints with tuple behavior. I use it when I want namedtuple with better IDE support.
ChainMap: Layered Configuration
ChainMap is the least-known collection, but it's brilliant for layered lookups.
from collections import ChainMap
defaults = {"color": "red", "size": "medium", "debug": False}
env_settings = {"color": "blue"} # Overrides from environment
user_prefs = {"debug": True} # User-specific overrides
config = ChainMap(user_prefs, env_settings, defaults)
print(config["color"]) # "blue" (from env_settings)
print(config["size"]) # "medium" (from defaults)
print(config["debug"]) # True (from user_prefs)The lookup checks each dict in order until it finds the key. First match wins.
Writes Go to the First Dict
This is important: when you write to a ChainMap, it only affects the first dict:
config["new_key"] = "value"
print(user_prefs) # {"debug": True, "new_key": "value"}
print(defaults) # Unchangednew_child(): Adding Layers
# Create a new layer for local overrides
local_config = config.new_child()
local_config["color"] = "green"
print(local_config["color"]) # "green"
print(config["color"]) # Still "blue"Real-World Example: CLI Configuration
from collections import ChainMap
import os
import json
def load_config():
"""Load config with precedence: CLI args > env vars > config file > defaults."""
defaults = {
"host": "localhost",
"port": 8080,
"debug": False,
"log_level": "INFO",
}
# Load from config file
file_config = {}
if os.path.exists("config.json"):
with open("config.json") as f:
file_config = json.load(f)
# Load from environment
env_config = {}
for key in defaults:
env_key = f"APP_{key.upper()}"
if env_key in os.environ:
value = os.environ[env_key]
# Convert types
if key in ("port",):
value = int(value)
elif key in ("debug",):
value = value.lower() == "true"
env_config[key] = value
return ChainMap(env_config, file_config, defaults)
config = load_config()
print(f"Running on {config['host']}:{config['port']}")Real-World Example: Variable Scoping
ChainMap models variable scoping perfectly:
from collections import ChainMap
class Interpreter:
def __init__(self):
self.scopes = ChainMap() # Global scope
def enter_scope(self):
"""Enter a new local scope."""
self.scopes = self.scopes.new_child()
def exit_scope(self):
"""Exit current scope."""
self.scopes = self.scopes.parents
def get_var(self, name):
return self.scopes.get(name)
def set_var(self, name, value):
self.scopes[name] = value # Sets in current scope
# Usage
interp = Interpreter()
interp.set_var("x", 1) # Global
interp.enter_scope()
interp.set_var("y", 2) # Local
interp.set_var("x", 10) # Shadows global x
print(interp.get_var("x")) # 10
interp.exit_scope()
print(interp.get_var("x")) # 1 (back to global)
print(interp.get_var("y")) # None (out of scope)Putting It All Together
Here's a real example that uses several collections together—a simple analytics tracker:
from collections import Counter, defaultdict, deque
from datetime import datetime
class AnalyticsTracker:
def __init__(self, recent_limit: int = 100):
# Count total events by type
self.event_counts = Counter()
# Group events by user
self.user_events = defaultdict(list)
# Keep recent events (sliding window)
self.recent_events = deque(maxlen=recent_limit)
# Track unique visitors per page
self.page_visitors = defaultdict(set)
def track(self, event_type: str, user_id: str, page: str = None):
timestamp = datetime.now()
# Update counters
self.event_counts[event_type] += 1
# Record for user
self.user_events[user_id].append({
"type": event_type,
"page": page,
"timestamp": timestamp,
})
# Add to recent events
self.recent_events.append({
"type": event_type,
"user": user_id,
"page": page,
"timestamp": timestamp,
})
# Track unique visitors
if page:
self.page_visitors[page].add(user_id)
def top_events(self, n: int = 5):
return self.event_counts.most_common(n)
def unique_visitors(self, page: str) -> int:
return len(self.page_visitors[page])
def recent_activity(self, n: int = 10):
# Get last n events (deque makes this easy)
return list(self.recent_events)[-n:]
# Usage
tracker = AnalyticsTracker()
tracker.track("page_view", "user_1", "/home")
tracker.track("page_view", "user_2", "/home")
tracker.track("click", "user_1", "/home")
tracker.track("page_view", "user_1", "/about")
print(tracker.top_events()) # [('page_view', 3), ('click', 1)]
print(tracker.unique_visitors("/home")) # 2Quick Reference: When to Use What
| Collection | Use When |
|---|---|
Counter | Counting occurrences, finding most common |
defaultdict | Grouping items, building nested structures |
OrderedDict | Order-sensitive equality, LRU caches |
deque | Queues, sliding windows, recent items |
namedtuple | Simple records, tuple interoperability |
ChainMap | Layered configs, scope chains |
Final Thoughts
Learning the collections module was one of those "level up" moments for me. Code that used to take 10 lines became 2. Performance issues from using lists as queues disappeared. And the code became more expressive—when I see Counter, I immediately know what's happening.
These aren't obscure tools. They're the right abstractions for common patterns. Use them.
The collections module is in the standard library. There's nothing to install. It's just waiting for you to import it.