I recently needed to save some Python objects between script runs. Not a database situation—just "remember this dictionary when I run again tomorrow." That's when I discovered pickle and shelve, Python's built-in persistence tools. Here's what I learned.

The Problem: Python Objects Are Ephemeral

When your script ends, everything in memory disappears. Variables, objects, computed results—gone. If you want data to survive, you need to persist it somehow.

# This dictionary exists only while the script runs
config = {"theme": "dark", "font_size": 14}
 
# When Python exits, config vanishes

I could write to a JSON file, but what about more complex Python objects like classes, datetime objects, or nested structures? That's where pickle comes in.

Pickle Basics: dump and load

pickle serializes Python objects to bytes. Think of it as "freeze-drying" your data.

import pickle
 
# dumps = dump to string (bytes)
data = {"name": "Alice", "scores": [95, 87, 92]}
serialized = pickle.dumps(data)
 
print(type(serialized))  # <class 'bytes'>
print(len(serialized))   # Some number of bytes
 
# loads = load from string
restored = pickle.loads(serialized)
print(restored)  # {'name': 'Alice', 'scores': [95, 87, 92]}

For files, use dump and load (no 's'):

import pickle
 
data = {"key": "value", "numbers": [1, 2, 3]}
 
# Write to file (binary mode!)
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)
 
# Read from file
with open("data.pkl", "rb") as f:
    loaded = pickle.load(f)

The b in wb and rb is crucial—pickle produces binary data, not text.

⚠️ The Security Warning Nobody Should Skip

Here's the thing that surprised me: pickle can execute arbitrary code when loading. This isn't a bug—it's how pickle handles reconstructing complex objects.

import pickle
import os
 
class Evil:
    def __reduce__(self):
        # This runs os.system when unpickled!
        return (os.system, ("echo YOU'VE BEEN HACKED",))
 
# Pickling is fine
evil_bytes = pickle.dumps(Evil())
 
# But loading runs the command!
# pickle.loads(evil_bytes)  # DON'T DO THIS with untrusted data

The __reduce__ method tells pickle how to reconstruct an object. Attackers can craft pickle data that runs any code when loaded.

Rule of thumb: Never unpickle data from untrusted sources. Treat .pkl files from the internet like executable files.

If you need to accept external serialized data, use JSON or MessagePack instead.

Pickle Protocol Versions

Pickle has evolved over time. Each protocol version adds features and improves performance.

import pickle
 
data = {"test": True}
 
# Protocol 0: ASCII, human-readable (slow, legacy)
p0 = pickle.dumps(data, protocol=0)
print(p0)  # You can sort of read it
 
# Protocol 4: Python 3.4+ (default in Python 3.8+)
p4 = pickle.dumps(data, protocol=4)
 
# Protocol 5: Python 3.8+ (efficient for large buffers)
p5 = pickle.dumps(data, protocol=5)
 
# Use highest available for best performance
best = pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL)
 
# Check your default
print(f"Default protocol: {pickle.DEFAULT_PROTOCOL}")

In practice, I just use HIGHEST_PROTOCOL unless I need compatibility with older Python versions.

Custom Pickling with getstate and setstate

Not everything can be pickled. File handles, database connections, sockets—these don't serialize. But you can control what gets pickled using special methods.

import pickle
 
class DatabaseConnection:
    def __init__(self, host, port):
        self.host = host
        self.port = port
        self.socket = self._connect()  # Can't pickle sockets
    
    def _connect(self):
        print(f"Connecting to {self.host}:{self.port}")
        return f"socket-{self.host}"  # Fake socket
    
    def __getstate__(self):
        # Called when pickling - return what to save
        state = self.__dict__.copy()
        del state["socket"]  # Remove unpicklable part
        return state
    
    def __setstate__(self, state):
        # Called when unpickling - restore from saved state
        self.__dict__.update(state)
        self.socket = self._connect()  # Reconnect
 
# Test it
conn = DatabaseConnection("localhost", 5432)
data = pickle.dumps(conn)
 
restored = pickle.loads(data)  # Prints "Connecting to..."
print(restored.host)  # localhost

This pattern is essential for objects with external resources. Save the configuration, reconnect on restore.

A simpler example: excluding computed values

import pickle
 
class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height
        self._area = width * height  # Cached computation
    
    def __getstate__(self):
        # Don't persist the cached value
        state = self.__dict__.copy()
        del state["_area"]
        return state
    
    def __setstate__(self, state):
        self.__dict__.update(state)
        self._area = self.width * self.height  # Recompute
 
rect = Rectangle(10, 20)
restored = pickle.loads(pickle.dumps(rect))
print(restored._area)  # 200

Enter shelve: Dict-Like Persistence

pickle is the engine; shelve is the convenient interface. It gives you a persistent dictionary backed by pickle.

import shelve
 
# Open a shelf (creates database files)
with shelve.open("mydata") as db:
    db["name"] = "Alice"
    db["scores"] = [95, 87, 92]
    db["config"] = {"debug": True, "version": 1}
 
# Later, in another script run...
with shelve.open("mydata") as db:
    print(db["name"])    # Alice
    print(db["scores"])  # [95, 87, 92]

No explicit serialization—just assign and read like a dict. The data persists across program runs.

The Writeback Gotcha

This tripped me up. By default, modifying objects retrieved from a shelf doesn't persist:

import shelve
 
# Store a list
with shelve.open("mydata") as db:
    db["items"] = [1, 2, 3]
 
# Try to append
with shelve.open("mydata") as db:
    db["items"].append(4)  # This looks like it should work...
 
# Check what we have
with shelve.open("mydata") as db:
    print(db["items"])  # [1, 2, 3] - The 4 is GONE!

What happened? When you read db["items"], shelve unpickles a copy. You modified the copy, then shelve threw it away.

The fix is writeback=True:

import shelve
 
with shelve.open("mydata", writeback=True) as db:
    db["items"] = [1, 2, 3]
    db["items"].append(4)  # Now this persists
 
with shelve.open("mydata") as db:
    print(db["items"])  # [1, 2, 3, 4]

With writeback, shelve caches all accessed objects and writes them back on close. Trade-off: uses more memory and close() is slower.

Shelve Limitations

A few things to know:

import shelve
 
with shelve.open("mydata") as db:
    # Keys MUST be strings
    # db[123] = "value"  # TypeError!
    db["123"] = "value"  # OK
    
    # Not thread-safe by default
    # Values must be picklable

Alternatives: When to Use What

After learning pickle and shelve, I realized they're not always the right choice.

JSON: Safe and Portable

import json
 
data = {"name": "Alice", "age": 30}
 
# Serialize to string
json_str = json.dumps(data)
 
# Works across languages, safe to load
restored = json.loads(json_str)

Use JSON when:

  • Data comes from or goes to external sources
  • You need cross-language compatibility
  • You only have basic types (dict, list, str, int, float, bool, None)

MessagePack: Fast Binary

# pip install msgpack
import msgpack
 
data = {"name": "Alice", "scores": [95, 87, 92]}
 
# Smaller and faster than JSON
packed = msgpack.packb(data)
restored = msgpack.unpackb(packed)

Use msgpack when:

  • You need speed and small size
  • Still want safety (no code execution)
  • Basic types only, but binary format is OK

When to Use Pickle/Shelve

Good for:

  • Local caching of computed results
  • Quick prototyping
  • Development and debugging
  • Saving/loading your own program's state

Avoid for:

  • Data from untrusted sources (security risk!)
  • Long-term storage (schema changes break it)
  • Cross-language systems (Python-only)
  • Network protocols (use JSON/protobuf)

My Decision Framework

Here's how I now decide what to use:

Is the data from an untrusted source?
├── Yes → Use JSON or msgpack, NEVER pickle
└── No → Continue...

Do I need complex Python objects (classes, datetime, etc.)?
├── Yes → Use pickle/shelve
└── No → Continue...

Do I need dict-like persistence?
├── Yes → Use shelve
└── No → Use pickle

Am I storing this long-term (months/years)?
├── Yes → Consider SQLite or a proper database
└── No → pickle/shelve is fine

Putting It Together: A Simple Cache

Here's a practical example combining what I learned:

import shelve
import time
import hashlib
 
class SimpleCache:
    def __init__(self, path="cache", ttl=3600):
        self.path = path
        self.ttl = ttl
    
    def _hash_key(self, key):
        # Ensure string key for shelve
        return hashlib.md5(str(key).encode()).hexdigest()
    
    def get(self, key):
        cache_key = self._hash_key(key)
        with shelve.open(self.path) as db:
            if cache_key in db:
                entry = db[cache_key]
                if time.time() - entry["time"] < self.ttl:
                    return entry["value"]
        return None
    
    def set(self, key, value):
        cache_key = self._hash_key(key)
        with shelve.open(self.path) as db:
            db[cache_key] = {
                "value": value,
                "time": time.time()
            }
 
# Usage
cache = SimpleCache()
cache.set("expensive_computation", [1, 2, 3, 4, 5])
result = cache.get("expensive_computation")

What I Learned

  1. pickle is powerful but dangerous - Never load untrusted pickle data
  2. shelve is just pickle with a dict interface - Convenient for key-value persistence
  3. writeback mode matters - Without it, in-place modifications are lost
  4. JSON for external data - Safe, portable, universally understood
  5. Protocol versions matter - Use HIGHEST_PROTOCOL for best performance
  6. getstate/setstate for custom objects - Control what gets serialized

For quick local persistence, shelve is hard to beat. But the moment external data or security enters the picture, switch to JSON or msgpack.

React to this post: