Python collections Module Guide

When I started writing Python, I did everything with lists and dicts. Need to count things? Loop through and increment a dict. Need to group items? Initialize empty lists everywhere. Need a queue? Use a list and call pop(0).

It all worked, but it was clunky. Then a senior engineer reviewed my code and asked, "Why aren't you using Counter?"

That question sent me down a rabbit hole into Python's collections module. Turns out there's a whole toolkit of specialized containers designed for exactly the patterns I was implementing by hand. Here's everything I've learned about them.

Counter: Stop Writing Counting Loops

I used to write counting code like this:

# The old way
word_counts = {}
for word in words:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

Then I discovered Counter:

from collections import Counter
 
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
counts = Counter(words)
print(counts)  # Counter({'apple': 3, 'banana': 2, 'cherry': 1})

That's it. One line.

The Things That Blew My Mind

Missing keys return 0, not KeyError:

counts = Counter(["a", "b", "a"])
print(counts["a"])  # 2
print(counts["z"])  # 0, not KeyError!

This is huge. No more if key in dict checks everywhere.

Most common elements, instantly:

text = "the quick brown fox jumps over the lazy dog"
letter_counts = Counter(text.replace(" ", ""))
print(letter_counts.most_common(3))  # [('o', 4), ('e', 3), ('u', 2)]

I used to sort dicts manually with lambda functions. Never again.

Counter arithmetic:

This one I didn't expect. You can add and subtract Counters:

inventory = Counter(apples=10, oranges=5, bananas=7)
sold = Counter(apples=3, oranges=2)
remaining = inventory - sold
print(remaining)  # Counter({'bananas': 7, 'apples': 7, 'oranges': 3})

And it gets better—you can find common elements (intersection) or combined totals (union):

basket1 = Counter(["apple", "apple", "orange"])
basket2 = Counter(["apple", "banana", "banana"])
 
# What's common (minimum counts)
print(basket1 & basket2)  # Counter({'apple': 1})
 
# Combined (maximum counts)
print(basket1 | basket2)  # Counter({'apple': 2, 'banana': 2, 'orange': 1})

Real-World Example: Log Analysis

I use Counter constantly for quick data analysis:

from collections import Counter
 
def analyze_access_log(log_lines):
    """Analyze web server access logs."""
    status_codes = Counter()
    endpoints = Counter()
    
    for line in log_lines:
        parts = line.split()
        status = parts[8]  # HTTP status code
        endpoint = parts[6]  # Request path
        
        status_codes[status] += 1
        endpoints[endpoint] += 1
    
    print("Top 5 endpoints:")
    for endpoint, count in endpoints.most_common(5):
        print(f"  {endpoint}: {count}")
    
    print("\nStatus code breakdown:")
    for status, count in sorted(status_codes.items()):
        print(f"  {status}: {count}")

defaultdict: No More "Initialize If Missing"

Before defaultdict, grouping items looked like this:

# The old way
groups = {}
for name, category in items:
    if category not in groups:
        groups[category] = []
    groups[category].append(name)

With defaultdict:

from collections import defaultdict
 
groups = defaultdict(list)
for name, category in items:
    groups[category].append(name)  # Just append. The list is auto-created.

Common Default Factories

List (grouping):

employees_by_dept = defaultdict(list)
for emp in employees:
    employees_by_dept[emp.department].append(emp)

Set (unique grouping):

users_by_role = defaultdict(set)
for user in users:
    for role in user.roles:
        users_by_role[role].add(user)

Int (counting):

# Alternative to Counter for simple cases
counts = defaultdict(int)
for item in items:
    counts[item] += 1

Lambda for custom defaults:

# Default to a specific value
settings = defaultdict(lambda: "unknown")
settings["theme"] = "dark"
print(settings["theme"])   # "dark"
print(settings["missing"]) # "unknown"

The Nested defaultdict Trick

This one is magic. Infinite nesting without initialization:

def nested_dict():
    return defaultdict(nested_dict)
 
data = nested_dict()
data["users"]["alice"]["settings"]["theme"] = "dark"
data["users"]["bob"]["profile"]["name"] = "Bob"
 
# No KeyError, no pre-initialization needed!

I use this for building tree structures from flat data:

def build_org_tree(employees):
    tree = lambda: defaultdict(tree)
    org = tree()
    
    for emp in employees:
        dept = emp.department
        team = emp.team
        org[dept][team][emp.name] = emp.title
    
    return org

Real-World Example: Building an Index

from collections import defaultdict
 
def build_inverted_index(documents):
    """Build a search index from documents."""
    index = defaultdict(set)
    
    for doc_id, text in documents:
        words = text.lower().split()
        for word in words:
            index[word].add(doc_id)
    
    return index
 
documents = [
    (1, "Python is great for data science"),
    (2, "Data engineering with Python"),
    (3, "Machine learning and data science"),
]
 
index = build_inverted_index(documents)
print(index["python"])  # {1, 2}
print(index["data"])    # {1, 2, 3}

OrderedDict: When Order Actually Matters

"Wait," you might say, "dicts are ordered since Python 3.7. Why do I need OrderedDict?"

Good question. I wondered the same thing. Here's why OrderedDict still matters:

Equality Considers Order

Regular dicts don't care about order when comparing:

d1 = {"a": 1, "b": 2}
d2 = {"b": 2, "a": 1}
print(d1 == d2)  # True

OrderedDict does:

from collections import OrderedDict
 
od1 = OrderedDict([("a", 1), ("b", 2)])
od2 = OrderedDict([("b", 2), ("a", 1)])
print(od1 == od2)  # False!

This matters when order is semantically meaningful—like steps in a workflow.

move_to_end() and popitem()

These are the killer features:

from collections import OrderedDict
 
od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
 
# Move an item to the end
od.move_to_end("a")
print(list(od.keys()))  # ['b', 'c', 'a']
 
# Move to the beginning
od.move_to_end("c", last=False)
print(list(od.keys()))  # ['c', 'b', 'a']
 
# Pop from either end
od.popitem(last=True)   # Removes 'a'
od.popitem(last=False)  # Removes 'c'

Real-World Example: LRU Cache

This is the classic use case—implementing a Least Recently Used cache:

from collections import OrderedDict
 
class LRUCache:
    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return None
        # Move accessed item to end (most recently used)
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        # Evict oldest if over capacity
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)
 
# Usage
cache = LRUCache(3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
cache.get("a")      # Access 'a', moves to end
cache.put("d", 4)   # Evicts 'b' (least recently used)

(In practice, use functools.lru_cache for function memoization—but understanding this pattern helps!)

When to Use Regular dict vs OrderedDict

Use regular dict when:

You just need fast key-value storage
Order doesn't affect correctness
You're on Python 3.7+

Use OrderedDict when:

Order-aware equality matters
You need move_to_end() or popitem(last=False)
Implementing ordered data structures (caches, queues)

deque: The Right Tool for Queues

I used to use lists as queues. Then I learned that list.pop(0) is O(n), not O(1).

For every item you remove from the front, Python shifts every other item down. With large lists, this kills performance.

Enter deque (pronounced "deck")—a double-ended queue with O(1) operations on both ends:

from collections import deque
 
q = deque()
q.append("first")      # Add to right: O(1)
q.append("second")
q.appendleft("zero")   # Add to left: O(1)
q.popleft()            # Remove from left: O(1)
q.pop()                # Remove from right: O(1)

The maxlen Feature

This is incredibly useful. Create a fixed-size deque, and old items automatically drop off:

# Keep only the last 5 items
recent = deque(maxlen=5)
for i in range(10):
    recent.append(i)
print(list(recent))  # [5, 6, 7, 8, 9]

Perfect for:

Recent activity logs
Sliding windows
Rolling averages
Undo history (limited depth)

Rotation

Deques can rotate efficiently:

d = deque([1, 2, 3, 4, 5])
d.rotate(2)   # Rotate right
print(list(d))  # [4, 5, 1, 2, 3]
 
d.rotate(-2)  # Rotate left
print(list(d))  # [1, 2, 3, 4, 5]

Real-World Example: BFS with deque

Breadth-first search is a classic deque use case:

from collections import deque
 
def bfs_shortest_path(graph, start, end):
    """Find shortest path using BFS."""
    queue = deque([(start, [start])])
    visited = {start}
    
    while queue:
        node, path = queue.popleft()  # O(1)!
        
        if node == end:
            return path
        
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, path + [neighbor]))
    
    return None  # No path found
 
graph = {
    "A": ["B", "C"],
    "B": ["D", "E"],
    "C": ["F"],
    "D": [],
    "E": ["F"],
    "F": [],
}
 
path = bfs_shortest_path(graph, "A", "F")
print(path)  # ['A', 'C', 'F']

Real-World Example: Moving Average

from collections import deque
 
class MovingAverage:
    def __init__(self, window_size: int):
        self.window = deque(maxlen=window_size)
    
    def add(self, value: float) -> float:
        self.window.append(value)
        return sum(self.window) / len(self.window)
 
avg = MovingAverage(3)
print(avg.add(1))  # 1.0
print(avg.add(2))  # 1.5
print(avg.add(3))  # 2.0
print(avg.add(4))  # 3.0 (window is now [2, 3, 4])

namedtuple vs dataclasses: When to Use Which

This one confused me for a while. Both create classes for holding data. When do you use which?

namedtuple: The Quick Option

from collections import namedtuple
 
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
 
print(p.x, p.y)  # 3 4
print(p[0])      # 3 (indexable like a tuple)
x, y = p         # Unpacking works

Key characteristics:

Immutable (like tuples)
Indexable and unpackable
Memory efficient
Works with tuple APIs

dataclass: The Full-Featured Option

from dataclasses import dataclass
 
@dataclass
class Point:
    x: float
    y: float
 
p = Point(3, 4)
p.x = 5  # Mutable by default

Key characteristics:

Mutable by default (can be frozen)
More features (defaults, field options, post_init)
Better IDE support
Type annotations are enforced in the definition

My Decision Framework

Use namedtuple when:

You need tuple behavior (indexing, unpacking)
Interoperating with code expecting tuples
Simple, immutable records
You want minimal overhead

# Good namedtuple use case: returning multiple values
from collections import namedtuple
 
Result = namedtuple("Result", ["success", "value", "error"])
 
def parse_config(path):
    try:
        # ... parsing logic ...
        return Result(True, config, None)
    except Exception as e:
        return Result(False, None, str(e))
 
# Can unpack the result
success, value, error = parse_config("config.yaml")

Use dataclass when:

You need mutability
You want defaults and computed fields
You need validation (post_init)
It's a domain object, not just a data carrier

# Good dataclass use case: domain object with behavior
from dataclasses import dataclass, field
from datetime import datetime
 
@dataclass
class Task:
    title: str
    priority: int = 1
    completed: bool = False
    created_at: datetime = field(default_factory=datetime.now)
    
    def complete(self):
        self.completed = True
    
    def __post_init__(self):
        if not 1 <= self.priority <= 5:
            raise ValueError(f"Priority must be 1-5")

The Hybrid: NamedTuple from typing

There's also a typed version of namedtuple that feels more like a dataclass:

from typing import NamedTuple
 
class Point(NamedTuple):
    x: float
    y: float
    label: str = ""  # Defaults work!
 
p = Point(3, 4, "origin")

This gives you type hints with tuple behavior. I use it when I want namedtuple with better IDE support.

ChainMap: Layered Configuration

ChainMap is the least-known collection, but it's brilliant for layered lookups.

from collections import ChainMap
 
defaults = {"color": "red", "size": "medium", "debug": False}
env_settings = {"color": "blue"}  # Overrides from environment
user_prefs = {"debug": True}       # User-specific overrides
 
config = ChainMap(user_prefs, env_settings, defaults)
 
print(config["color"])  # "blue" (from env_settings)
print(config["size"])   # "medium" (from defaults)
print(config["debug"])  # True (from user_prefs)

The lookup checks each dict in order until it finds the key. First match wins.

Writes Go to the First Dict

This is important: when you write to a ChainMap, it only affects the first dict:

config["new_key"] = "value"
print(user_prefs)  # {"debug": True, "new_key": "value"}
print(defaults)    # Unchanged

new_child(): Adding Layers

# Create a new layer for local overrides
local_config = config.new_child()
local_config["color"] = "green"
 
print(local_config["color"])  # "green"
print(config["color"])        # Still "blue"

Real-World Example: CLI Configuration

from collections import ChainMap
import os
import json
 
def load_config():
    """Load config with precedence: CLI args > env vars > config file > defaults."""
    
    defaults = {
        "host": "localhost",
        "port": 8080,
        "debug": False,
        "log_level": "INFO",
    }
    
    # Load from config file
    file_config = {}
    if os.path.exists("config.json"):
        with open("config.json") as f:
            file_config = json.load(f)
    
    # Load from environment
    env_config = {}
    for key in defaults:
        env_key = f"APP_{key.upper()}"
        if env_key in os.environ:
            value = os.environ[env_key]
            # Convert types
            if key in ("port",):
                value = int(value)
            elif key in ("debug",):
                value = value.lower() == "true"
            env_config[key] = value
    
    return ChainMap(env_config, file_config, defaults)
 
config = load_config()
print(f"Running on {config['host']}:{config['port']}")

Real-World Example: Variable Scoping

ChainMap models variable scoping perfectly:

from collections import ChainMap
 
class Interpreter:
    def __init__(self):
        self.scopes = ChainMap()  # Global scope
    
    def enter_scope(self):
        """Enter a new local scope."""
        self.scopes = self.scopes.new_child()
    
    def exit_scope(self):
        """Exit current scope."""
        self.scopes = self.scopes.parents
    
    def get_var(self, name):
        return self.scopes.get(name)
    
    def set_var(self, name, value):
        self.scopes[name] = value  # Sets in current scope
 
# Usage
interp = Interpreter()
interp.set_var("x", 1)        # Global
interp.enter_scope()
interp.set_var("y", 2)        # Local
interp.set_var("x", 10)       # Shadows global x
print(interp.get_var("x"))    # 10
interp.exit_scope()
print(interp.get_var("x"))    # 1 (back to global)
print(interp.get_var("y"))    # None (out of scope)

Putting It All Together

Here's a real example that uses several collections together—a simple analytics tracker:

from collections import Counter, defaultdict, deque
from datetime import datetime
 
class AnalyticsTracker:
    def __init__(self, recent_limit: int = 100):
        # Count total events by type
        self.event_counts = Counter()
        
        # Group events by user
        self.user_events = defaultdict(list)
        
        # Keep recent events (sliding window)
        self.recent_events = deque(maxlen=recent_limit)
        
        # Track unique visitors per page
        self.page_visitors = defaultdict(set)
    
    def track(self, event_type: str, user_id: str, page: str = None):
        timestamp = datetime.now()
        
        # Update counters
        self.event_counts[event_type] += 1
        
        # Record for user
        self.user_events[user_id].append({
            "type": event_type,
            "page": page,
            "timestamp": timestamp,
        })
        
        # Add to recent events
        self.recent_events.append({
            "type": event_type,
            "user": user_id,
            "page": page,
            "timestamp": timestamp,
        })
        
        # Track unique visitors
        if page:
            self.page_visitors[page].add(user_id)
    
    def top_events(self, n: int = 5):
        return self.event_counts.most_common(n)
    
    def unique_visitors(self, page: str) -> int:
        return len(self.page_visitors[page])
    
    def recent_activity(self, n: int = 10):
        # Get last n events (deque makes this easy)
        return list(self.recent_events)[-n:]
 
 
# Usage
tracker = AnalyticsTracker()
tracker.track("page_view", "user_1", "/home")
tracker.track("page_view", "user_2", "/home")
tracker.track("click", "user_1", "/home")
tracker.track("page_view", "user_1", "/about")
 
print(tracker.top_events())  # [('page_view', 3), ('click', 1)]
print(tracker.unique_visitors("/home"))  # 2

Quick Reference: When to Use What

Collection	Use When
`Counter`	Counting occurrences, finding most common
`defaultdict`	Grouping items, building nested structures
`OrderedDict`	Order-sensitive equality, LRU caches
`deque`	Queues, sliding windows, recent items
`namedtuple`	Simple records, tuple interoperability
`ChainMap`	Layered configs, scope chains

Final Thoughts

Learning the collections module was one of those "level up" moments for me. Code that used to take 10 lines became 2. Performance issues from using lists as queues disappeared. And the code became more expressive—when I see Counter, I immediately know what's happening.

These aren't obscure tools. They're the right abstractions for common patterns. Use them.

The collections module is in the standard library. There's nothing to install. It's just waiting for you to import it.

React to this post:

#Counter: Stop Writing Counting Loops

#The Things That Blew My Mind

#Real-World Example: Log Analysis

#defaultdict: No More "Initialize If Missing"

#Common Default Factories

#The Nested defaultdict Trick

#Real-World Example: Building an Index

#OrderedDict: When Order Actually Matters

#Equality Considers Order

#move_to_end() and popitem()

#Real-World Example: LRU Cache

#When to Use Regular dict vs OrderedDict

#deque: The Right Tool for Queues

#The maxlen Feature

#Rotation

#Real-World Example: BFS with deque

#Real-World Example: Moving Average

#namedtuple vs dataclasses: When to Use Which

#namedtuple: The Quick Option

#dataclass: The Full-Featured Option

#My Decision Framework

#The Hybrid: NamedTuple from typing

#ChainMap: Layered Configuration

#Writes Go to the First Dict

#new_child(): Adding Layers

#Real-World Example: CLI Configuration

#Real-World Example: Variable Scoping

#Putting It All Together

#Quick Reference: When to Use What

#Final Thoughts

Keep Reading

Need help shipping fast?