Generators let you iterate over data without loading it all into memory. Here's how they work.
The Problem
# Loads entire file into memory
lines = open("huge_file.txt").readlines()
for line in lines:
process(line)
# Memory efficient - one line at a time
for line in open("huge_file.txt"):
process(line)The second approach uses a generator internally.
Generator Functions
Use yield instead of return:
def count_up_to(n):
i = 0
while i < n:
yield i
i += 1
# Creates generator object (no computation yet)
gen = count_up_to(5)
# Values computed on demand
for num in gen:
print(num) # 0, 1, 2, 3, 4Each yield pauses the function, returning a value. Next iteration resumes.
Generator Expressions
Like list comprehensions, but lazy:
# List comprehension - all in memory
squares = [x**2 for x in range(1000000)]
# Generator expression - computed on demand
squares = (x**2 for x in range(1000000))
# Use parentheses, not bracketsCommon Patterns
Reading large files
def read_chunks(file_path, chunk_size=8192):
with open(file_path, "rb") as f:
while chunk := f.read(chunk_size):
yield chunk
for chunk in read_chunks("huge_file.bin"):
process(chunk)Infinite sequences
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Take first 10
from itertools import islice
fibs = list(islice(fibonacci(), 10))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]Transforming data
def uppercase_lines(lines):
for line in lines:
yield line.upper()
# Chain generators
lines = open("data.txt")
upper = uppercase_lines(lines)
for line in upper:
print(line)itertools
The standard library for iterator operations:
from itertools import (
islice, # Take first N items
chain, # Combine iterators
cycle, # Repeat infinitely
repeat, # Repeat value
count, # Infinite counter
takewhile, # Take while condition true
dropwhile, # Skip while condition true
groupby, # Group consecutive items
filterfalse, # Filter by false condition
)
# Examples
list(islice(count(10), 5)) # [10, 11, 12, 13, 14]
list(chain([1, 2], [3, 4])) # [1, 2, 3, 4]
list(takewhile(lambda x: x < 5, [1, 3, 5, 2])) # [1, 3]Generator Methods
def gen():
while True:
value = yield
print(f"Received: {value}")
g = gen()
next(g) # Start generator
g.send(10) # Received: 10
g.send(20) # Received: 20
g.close() # Stop generatoryield from
Delegate to another generator:
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
list(flatten([1, [2, [3, 4]], 5]))
# [1, 2, 3, 4, 5]Memory Comparison
import sys
# List: stores all values
list_data = [x for x in range(1000000)]
print(sys.getsizeof(list_data)) # ~8 MB
# Generator: stores only the generator object
gen_data = (x for x in range(1000000))
print(sys.getsizeof(gen_data)) # ~200 bytesWhen to Use Generators
Use generators when:
- Processing large files
- Infinite sequences
- Memory is constrained
- You only need to iterate once
- Chaining transformations
Use lists when:
- You need random access
- You need to iterate multiple times
- Data is small
- You need list methods (append, sort, etc.)
My Patterns
# Process files line by line
def process_log(path):
with open(path) as f:
for line in f:
if "ERROR" in line:
yield parse_error(line)
# Chain generators for pipelines
raw_data = read_file("data.csv")
parsed = parse_rows(raw_data)
filtered = (row for row in parsed if row["valid"])
transformed = transform(filtered)Generators are Python's way of handling data streams. Master them for memory-efficient code.
React to this post: