I've been using Python's shutil module a lot lately, and I wanted to write down everything I've learned—including the stuff that tripped me up. This is the guide I wish I had when I started.
The Copy Functions: Which One Do You Actually Need?
This confused me at first. There are four main ways to copy files, and they all do slightly different things:
copyfile: Just the Bytes
import shutil
shutil.copyfile('source.txt', 'dest.txt')This copies only the file contents. No permissions, no timestamps—just raw bytes. Use this when you're creating a new file and want to set your own permissions.
Gotcha: The destination must be a file path, not a directory. This fails:
# This raises IsADirectoryError!
shutil.copyfile('source.txt', 'some_directory/')copyfileobj: For File Objects
with open('source.txt', 'rb') as src:
with open('dest.txt', 'wb') as dst:
shutil.copyfileobj(src, dst)Why would you use this instead of copyfile? Two reasons:
- You already have file objects (maybe from a network stream or compressed archive)
- You want to control the buffer size for large files:
# Use smaller buffer to reduce memory usage
shutil.copyfileobj(src, dst, length=1024*1024) # 1MB chunkscopy: File + Permissions
shutil.copy('source.txt', 'dest.txt')
shutil.copy('source.txt', 'dest_dir/') # This works!This copies the file content AND the permission bits. You can pass either a file path or a directory as the destination (if it's a directory, the file keeps its original name).
copy2: File + All Metadata
shutil.copy2('source.txt', 'dest.txt')This is copy plus timestamps and other metadata. I use this for backups where I want to preserve the modification times.
My rule of thumb: Use copy2 for backups, copy for general copying, copyfile when you need minimal copying, and copyfileobj when working with streams.
copytree: Directory Copying Done Right
Copying a whole directory tree is surprisingly tricky to do correctly by hand. copytree handles it:
shutil.copytree('src_project', 'backup_project')Ignoring Files with Patterns
Here's where it gets useful. Say you want to copy a Python project but skip __pycache__ and .pyc files:
shutil.copytree(
'my_project',
'my_project_backup',
ignore=shutil.ignore_patterns('*.pyc', '__pycache__', '.git')
)Custom Ignore Functions
For more complex logic, write your own ignore function:
def ignore_large_files(directory, files):
"""Ignore files larger than 10MB."""
from pathlib import Path
ignored = []
for f in files:
path = Path(directory) / f
if path.is_file() and path.stat().st_size > 10 * 1024 * 1024:
ignored.append(f)
return ignored
shutil.copytree('data', 'data_backup', ignore=ignore_large_files)The function receives the directory being processed and a list of items in it. Return the names of items to skip.
Copying Into an Existing Directory
Before Python 3.8, copytree would fail if the destination existed. Now you can use dirs_exist_ok:
# This works even if 'backup' already exists
shutil.copytree('src', 'backup', dirs_exist_ok=True)Gotcha I learned the hard way: When merging into an existing directory, files are overwritten without warning. Make sure that's what you want.
rmtree: Recursive Deletion (Use With Care)
This deletes a directory and everything in it. No recycle bin. No undo.
shutil.rmtree('old_project')Error Handling
By default, rmtree stops on the first error. For cleaning up where you don't care about individual failures:
shutil.rmtree('temp_dir', ignore_errors=True)For logging errors but continuing:
def handle_error(func, path, exc_info):
"""Log errors but continue deletion."""
print(f"Warning: couldn't delete {path}: {exc_info[1]}")
shutil.rmtree('messy_dir', onerror=handle_error)Python 3.12+ note: The onerror parameter is deprecated. Use onexc instead:
# Python 3.12+
def handle_error(func, path, exc):
print(f"Couldn't delete {path}: {exc}")
shutil.rmtree('messy_dir', onexc=handle_error)A Common Pattern: Safe Cleanup
Here's a pattern I use for cleanup that might fail:
from pathlib import Path
def safe_cleanup(path):
"""Remove directory if it exists, don't fail if it doesn't."""
p = Path(path)
if p.exists():
shutil.rmtree(p, ignore_errors=True)move: Cross-Filesystem Moves
os.rename() fails when moving across filesystems. shutil.move() handles it automatically by copying then deleting:
shutil.move('local_file.txt', '/mnt/external_drive/file.txt')Gotcha: If the destination is a directory, the file moves into it:
shutil.move('report.pdf', '/home/user/documents/')
# Result: /home/user/documents/report.pdfIf the destination is a full file path, it's renamed:
shutil.move('report.pdf', '/home/user/documents/final_report.pdf')
# Result: /home/user/documents/final_report.pdfdisk_usage: Check Before You Copy
Always check disk space before large operations:
usage = shutil.disk_usage('/')
print(f"Total: {usage.total / (1024**3):.1f} GB")
print(f"Used: {usage.used / (1024**3):.1f} GB")
print(f"Free: {usage.free / (1024**3):.1f} GB")A practical example:
from pathlib import Path
def safe_copy(src, dst):
"""Copy file only if there's enough space."""
src_path = Path(src)
dst_path = Path(dst)
file_size = src_path.stat().st_size
free_space = shutil.disk_usage(dst_path.parent).free
if file_size > free_space:
raise IOError(
f"Not enough space: need {file_size:,} bytes, "
f"have {free_space:,} bytes"
)
shutil.copy2(src, dst)which: Finding Executables
This saved me from writing janky PATH parsing code:
python_path = shutil.which('python')
print(python_path) # /usr/bin/python
git_path = shutil.which('git')
if git_path is None:
print("Git is not installed!")Real-world usage—checking for required tools:
def check_dependencies():
"""Verify required tools are installed."""
required = ['git', 'docker', 'python3']
missing = []
for tool in required:
if shutil.which(tool) is None:
missing.append(tool)
if missing:
raise EnvironmentError(
f"Missing required tools: {', '.join(missing)}"
)make_archive and unpack_archive
Creating and extracting archives without touching zipfile or tarfile directly:
Creating Archives
# Create a zip file
shutil.make_archive('backup', 'zip', 'my_project')
# Creates: backup.zip
# Create a tar.gz
shutil.make_archive('backup', 'gztar', 'my_project')
# Creates: backup.tar.gzSupported formats:
zip- ZIP filetar- uncompressed targztar- gzip compressed tar (.tar.gz)bztar- bzip2 compressed tar (.tar.bz2)xztar- xz compressed tar (.tar.xz)
Extracting Archives
shutil.unpack_archive('backup.zip', 'extracted/')
shutil.unpack_archive('backup.tar.gz', 'extracted/')The format is auto-detected from the extension. Nice.
A Backup Script
Putting it together:
from datetime import datetime
from pathlib import Path
import shutil
def backup_project(project_dir, backup_dir='./backups'):
"""Create timestamped backup of a project."""
project = Path(project_dir)
backups = Path(backup_dir)
backups.mkdir(exist_ok=True)
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
archive_name = f"{project.name}_{timestamp}"
# Check space (rough estimate: assume compression is 50%)
project_size = sum(f.stat().st_size for f in project.rglob('*') if f.is_file())
estimated_archive_size = project_size // 2
free = shutil.disk_usage(backups).free
if estimated_archive_size > free * 0.9: # Leave 10% buffer
raise IOError("Not enough disk space for backup")
archive_path = shutil.make_archive(
str(backups / archive_name),
'gztar',
project_dir
)
return archive_path
# Usage
backup_file = backup_project('./my_app')
print(f"Created backup: {backup_file}")Common Gotchas I've Encountered
1. Permissions After Copy
copyfile doesn't copy permissions, so you might create a file you can't execute:
# This loses execute permissions!
shutil.copyfile('script.sh', 'script_copy.sh')
# This preserves them:
shutil.copy2('script.sh', 'script_copy.sh')2. rmtree on Symlinks
rmtree doesn't follow symlinks (good!), but it will delete the symlink itself:
# If 'link_to_important' is a symlink to '/important/data'
shutil.rmtree('link_to_important')
# The symlink is removed, but /important/data is safe3. copytree Destination Must Not Exist (Before 3.8)
Before Python 3.8, this fails:
# Pre-3.8: raises FileExistsError if dest exists
shutil.copytree('src', 'existing_dest')
# 3.8+: use dirs_exist_ok=True
shutil.copytree('src', 'existing_dest', dirs_exist_ok=True)4. move Can Overwrite
shutil.move() will silently overwrite the destination:
# If dest.txt exists, it's gone
shutil.move('src.txt', 'dest.txt')Check first if that matters:
from pathlib import Path
def safe_move(src, dst):
if Path(dst).exists():
raise FileExistsError(f"{dst} already exists")
shutil.move(src, dst)5. Archive Paths Can Be Confusing
The root_dir and base_dir parameters in make_archive are confusing. Here's what they mean:
# Archive contents will start from 'base_dir' relative to 'root_dir'
shutil.make_archive(
'backup',
'zip',
root_dir='/home/user',
base_dir='projects/myapp'
)
# Archive contains: projects/myapp/...Most of the time, just pass the directory you want to archive:
shutil.make_archive('backup', 'zip', '/home/user/projects/myapp')Quick Reference
| Function | What It Does |
|---|---|
copyfile(src, dst) | Copy file contents only |
copyfileobj(fsrc, fdst) | Copy between file objects |
copy(src, dst) | Copy file + permissions |
copy2(src, dst) | Copy file + all metadata |
copytree(src, dst) | Copy directory recursively |
rmtree(path) | Delete directory tree |
move(src, dst) | Move file/directory |
disk_usage(path) | Get disk space info |
which(cmd) | Find executable in PATH |
make_archive(...) | Create zip/tar archive |
unpack_archive(...) | Extract archive |
shutil is one of those modules that's easy to overlook because the operations seem simple. But when you need to copy a million files with correct permissions across filesystems while skipping cache directories—it's really nice to have.