When I first discovered Python's dataclasses module, I thought it was just a shortcut for writing __init__ and __repr__. Then I tried to do something slightly complex and realized I'd only scratched the surface.
This is what I wish I'd known earlier.
The Basics (Quick Recap)
from dataclasses import dataclass
@dataclass
class User:
name: str
email: str
age: int = 0This generates __init__, __repr__, __eq__, and more. Nice. But the real power is in what comes next.
Frozen: True Immutability
I was confused when I first saw code that mutated dataclass instances in ways I didn't expect. The problem? Mutable default values and accidental mutations.
Enter frozen=True:
from dataclasses import dataclass
@dataclass(frozen=True)
class Config:
api_key: str
timeout: int = 30
retries: int = 3
config = Config(api_key="secret123")
config.timeout = 60 # FrozenInstanceError!This makes your dataclass immutable—any attempt to modify it raises an exception.
What I learned: Frozen dataclasses are perfect for configuration objects, value objects, and anything that should never change after creation. They're also hashable by default, which means you can use them as dictionary keys or in sets.
# This works because frozen=True makes it hashable
configs = {Config(api_key="prod"), Config(api_key="staging")}The Catch: Nested Mutability
Here's something that tripped me up:
@dataclass(frozen=True)
class Settings:
name: str
options: list # Uh oh
settings = Settings(name="app", options=[1, 2, 3])
settings.options.append(4) # This works! The list is still mutableFrozen only prevents reassignment of the field itself, not mutation of mutable objects inside. For true deep immutability, use tuples or frozen collections.
field() Options: Fine-Grained Control
The field() function is where dataclasses get flexible. I ignored it for too long.
default_factory: Avoiding the Mutable Default Trap
Every Python developer learns this lesson eventually:
# WRONG - all instances share the same list!
@dataclass
class BadTask:
name: str
tags: list = [] # TypeError anyway, but you get the idea
# RIGHT - each instance gets its own list
from dataclasses import dataclass, field
@dataclass
class Task:
name: str
tags: list = field(default_factory=list)default_factory takes a callable that produces a fresh default value for each instance.
# More complex factories
from uuid import uuid4
from datetime import datetime
@dataclass
class Event:
name: str
id: str = field(default_factory=lambda: str(uuid4()))
created_at: datetime = field(default_factory=datetime.now)repr, compare, hash: Controlling Behavior
Sometimes you don't want a field to show up in the repr or affect equality:
@dataclass
class Document:
title: str
content: str
# Internal tracking - don't show in repr or use in comparison
_cache: dict = field(default_factory=dict, repr=False, compare=False)
doc1 = Document("Hello", "World")
doc2 = Document("Hello", "World")
doc1._cache["key"] = "value"
print(doc1) # Document(title='Hello', content='World')
print(doc1 == doc2) # True - _cache is ignoredWhat I learned: These options are invaluable for:
repr=False: Hiding sensitive data or internal statecompare=False: Excluding metadata from equality checkshash=False: Excluding fields from hash computation
post_init: Validation and Computed Fields
This is where I had my "aha" moment with dataclasses. __post_init__ runs after the auto-generated __init__:
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False) # Computed, not passed to __init__
def __post_init__(self):
if self.width <= 0 or self.height <= 0:
raise ValueError("Dimensions must be positive")
self.area = self.width * self.height
rect = Rectangle(10, 5)
print(rect.area) # 50.0
Rectangle(-1, 5) # ValueError: Dimensions must be positiveValidation Patterns
import re
@dataclass
class Email:
address: str
def __post_init__(self):
pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
if not re.match(pattern, self.address):
raise ValueError(f"Invalid email: {self.address}")
Email("user@example.com") # Works
Email("not-an-email") # ValueErrorTransforming Input
@dataclass
class User:
name: str
email: str
def __post_init__(self):
self.name = self.name.strip().title()
self.email = self.email.strip().lower()
user = User(" john doe ", "JOHN@EXAMPLE.COM")
print(user) # User(name='John Doe', email='john@example.com')Inheritance: Patterns and Gotchas
Dataclass inheritance works, but there are traps.
The Field Order Problem
This broke my code more than once:
@dataclass
class Animal:
name: str
species: str = "Unknown" # Has default
@dataclass
class Dog(Animal):
breed: str # No default - ERROR!This raises TypeError: non-default argument 'breed' follows default argument. Parent defaults "pollute" child field ordering.
Solution 1: Give all child fields defaults
@dataclass
class Dog(Animal):
breed: str = "Unknown"Solution 2: Use field with default_factory
@dataclass
class Dog(Animal):
breed: str = field(default="Unknown")Solution 3: Rethink your hierarchy
Sometimes inheritance isn't the right tool. Composition might be cleaner.
Method Inheritance Works Fine
@dataclass
class BaseModel:
id: int
def save(self):
print(f"Saving {self.__class__.__name__} with id={self.id}")
@dataclass
class Product(BaseModel):
name: str
price: float
product = Product(1, "Widget", 9.99)
product.save() # "Saving Product with id=1"Slots: Memory Optimization (Python 3.10+)
I was skeptical until I actually measured this.
@dataclass(slots=True)
class Point:
x: float
y: floatWith slots=True, Python uses __slots__ instead of a __dict__ for instance attributes. Benefits:
- Less memory - no per-instance dictionary
- Faster attribute access - direct offset lookup
- Prevents accidental attribute creation
@dataclass(slots=True)
class SlottedPoint:
x: float
y: float
p = SlottedPoint(1.0, 2.0)
p.z = 3.0 # AttributeError: 'SlottedPoint' object has no attribute 'z'Measuring the Difference
import sys
from dataclasses import dataclass
@dataclass
class RegularPoint:
x: float
y: float
@dataclass(slots=True)
class SlottedPoint:
x: float
y: float
regular = RegularPoint(1.0, 2.0)
slotted = SlottedPoint(1.0, 2.0)
print(sys.getsizeof(regular)) # 48 bytes
print(sys.getsizeof(slotted)) # 32 bytes (rough numbers, varies by Python version)For a million instances, that's meaningful memory savings.
What I learned: Use slots=True when you're creating many instances and don't need dynamic attribute assignment.
Typing Integration: ClassVar and InitVar
ClassVar: Class-Level Constants
Fields that belong to the class, not instances:
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class Counter:
name: str
count: int = 0
total_created: ClassVar[int] = 0 # Not an instance field
def __post_init__(self):
Counter.total_created += 1
c1 = Counter("first")
c2 = Counter("second")
print(Counter.total_created) # 2
print(c1.total_created) # 2 (accessed via class)ClassVar fields don't appear in __init__, aren't compared, and don't show in repr.
InitVar: Init-Only Variables
Pass data to __post_init__ without storing it:
from dataclasses import dataclass, field, InitVar
@dataclass
class Password:
hash: str = field(init=False)
raw_password: InitVar[str]
def __post_init__(self, raw_password: str):
# raw_password is NOT stored as an attribute
self.hash = self._hash(raw_password)
def _hash(self, password: str) -> str:
import hashlib
return hashlib.sha256(password.encode()).hexdigest()
pw = Password(raw_password="secret123")
print(pw.hash) # "e5e9fa1ba31e..."
print(hasattr(pw, 'raw_password')) # False - it's not stored!What I learned: InitVar is perfect for sensitive data you need to process but shouldn't persist on the instance.
When to Use Alternatives
Dataclasses are great, but not always the right choice.
namedtuple: Simple, Immutable, Iterable
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
x, y = p # Unpacking works!
print(p[0]) # Indexing works!Use namedtuple when:
- You need tuple-like behavior (iteration, unpacking, indexing)
- Simple immutable data without methods
- Memory efficiency matters (even lighter than slotted dataclasses)
attrs: More Features, More Control
import attr
@attr.s(auto_attribs=True)
class User:
name: str
email: str = attr.ib(validator=attr.validators.matches_re(r'.+@.+'))Use attrs when:
- You need built-in validators
- You want converters (auto-transform input)
- You need more sophisticated inheritance
- You're on Python < 3.7 (attrs predates dataclasses)
Pydantic: Runtime Validation, JSON Ready
from pydantic import BaseModel, EmailStr
class User(BaseModel):
name: str
email: EmailStr # Validates email format
age: int
# Automatic parsing and validation
user = User(name="John", email="john@example.com", age="25")
print(user.age) # 25 (int, auto-converted)
print(user.model_dump_json()) # JSON serialization built-inUse Pydantic when:
- Parsing external data (APIs, config files, user input)
- You need JSON serialization/deserialization
- Complex validation is core to your use case
- You're building FastAPI applications
My Decision Framework
Need immutability only? → namedtuple or frozen dataclass
Internal domain objects? → dataclass
External data validation? → Pydantic
Complex validation + Python 2 support? → attrs
Common Patterns
The Builder Pattern
For complex objects with many optional parameters:
@dataclass
class QueryBuilder:
table: str
columns: list = field(default_factory=lambda: ["*"])
where: list = field(default_factory=list)
limit: int | None = None
def select(self, *cols):
return QueryBuilder(
table=self.table,
columns=list(cols),
where=self.where.copy(),
limit=self.limit
)
def filter(self, condition):
return QueryBuilder(
table=self.table,
columns=self.columns.copy(),
where=self.where + [condition],
limit=self.limit
)
def take(self, n):
return QueryBuilder(
table=self.table,
columns=self.columns.copy(),
where=self.where.copy(),
limit=n
)
def build(self) -> str:
cols = ", ".join(self.columns)
sql = f"SELECT {cols} FROM {self.table}"
if self.where:
sql += " WHERE " + " AND ".join(self.where)
if self.limit:
sql += f" LIMIT {self.limit}"
return sql
query = (QueryBuilder("users")
.select("id", "name", "email")
.filter("active = true")
.filter("age > 18")
.take(10)
.build())
# "SELECT id, name, email FROM users WHERE active = true AND age > 18 LIMIT 10"Config Objects
from dataclasses import dataclass, field
from typing import ClassVar
import os
@dataclass(frozen=True)
class AppConfig:
ENV_PREFIX: ClassVar[str] = "APP_"
database_url: str
debug: bool = False
max_connections: int = 10
@classmethod
def from_env(cls):
return cls(
database_url=os.environ[f"{cls.ENV_PREFIX}DATABASE_URL"],
debug=os.environ.get(f"{cls.ENV_PREFIX}DEBUG", "").lower() == "true",
max_connections=int(os.environ.get(f"{cls.ENV_PREFIX}MAX_CONNECTIONS", 10))
)Data Transfer Objects (DTOs)
from dataclasses import dataclass, asdict, field
from datetime import datetime
@dataclass
class UserDTO:
id: int
name: str
email: str
created_at: datetime = field(default_factory=datetime.now)
def to_dict(self) -> dict:
data = asdict(self)
data['created_at'] = self.created_at.isoformat()
return data
@classmethod
def from_dict(cls, data: dict) -> 'UserDTO':
if isinstance(data.get('created_at'), str):
data['created_at'] = datetime.fromisoformat(data['created_at'])
return cls(**data)
# Serialize
user = UserDTO(1, "John", "john@example.com")
payload = user.to_dict() # Ready for JSON
# Deserialize
user_copy = UserDTO.from_dict(payload)What I Learned (Summary)
frozen=Truecreates true immutability and enables hashing—use it for configs and value objectsfield()gives you fine-grained control over defaults, representation, and comparison__post_init__is your hook for validation and computed fields—use it liberally- Inheritance has footguns—watch the field ordering and consider composition
slots=True(3.10+) saves memory and prevents attribute typosClassVarandInitVarsolve specific typing needs—class-level and init-only data- Know when to reach for alternatives—Pydantic for validation, attrs for features, namedtuple for simplicity
Dataclasses aren't magic. They're just Python generating boilerplate for you. But understanding the options turns them from a nice convenience into a powerful tool for clean, maintainable code.
Questions or patterns I missed? I'm always learning—reach out if you've got tricks I should know about.