I've been using Python's shutil module a lot lately, and I wanted to write down everything I've learned—including the stuff that tripped me up. This is the guide I wish I had when I started.

The Copy Functions: Which One Do You Actually Need?

This confused me at first. There are four main ways to copy files, and they all do slightly different things:

copyfile: Just the Bytes

import shutil
 
shutil.copyfile('source.txt', 'dest.txt')

This copies only the file contents. No permissions, no timestamps—just raw bytes. Use this when you're creating a new file and want to set your own permissions.

Gotcha: The destination must be a file path, not a directory. This fails:

# This raises IsADirectoryError!
shutil.copyfile('source.txt', 'some_directory/')

copyfileobj: For File Objects

with open('source.txt', 'rb') as src:
    with open('dest.txt', 'wb') as dst:
        shutil.copyfileobj(src, dst)

Why would you use this instead of copyfile? Two reasons:

  1. You already have file objects (maybe from a network stream or compressed archive)
  2. You want to control the buffer size for large files:
# Use smaller buffer to reduce memory usage
shutil.copyfileobj(src, dst, length=1024*1024)  # 1MB chunks

copy: File + Permissions

shutil.copy('source.txt', 'dest.txt')
shutil.copy('source.txt', 'dest_dir/')  # This works!

This copies the file content AND the permission bits. You can pass either a file path or a directory as the destination (if it's a directory, the file keeps its original name).

copy2: File + All Metadata

shutil.copy2('source.txt', 'dest.txt')

This is copy plus timestamps and other metadata. I use this for backups where I want to preserve the modification times.

My rule of thumb: Use copy2 for backups, copy for general copying, copyfile when you need minimal copying, and copyfileobj when working with streams.

copytree: Directory Copying Done Right

Copying a whole directory tree is surprisingly tricky to do correctly by hand. copytree handles it:

shutil.copytree('src_project', 'backup_project')

Ignoring Files with Patterns

Here's where it gets useful. Say you want to copy a Python project but skip __pycache__ and .pyc files:

shutil.copytree(
    'my_project',
    'my_project_backup',
    ignore=shutil.ignore_patterns('*.pyc', '__pycache__', '.git')
)

Custom Ignore Functions

For more complex logic, write your own ignore function:

def ignore_large_files(directory, files):
    """Ignore files larger than 10MB."""
    from pathlib import Path
    ignored = []
    for f in files:
        path = Path(directory) / f
        if path.is_file() and path.stat().st_size > 10 * 1024 * 1024:
            ignored.append(f)
    return ignored
 
shutil.copytree('data', 'data_backup', ignore=ignore_large_files)

The function receives the directory being processed and a list of items in it. Return the names of items to skip.

Copying Into an Existing Directory

Before Python 3.8, copytree would fail if the destination existed. Now you can use dirs_exist_ok:

# This works even if 'backup' already exists
shutil.copytree('src', 'backup', dirs_exist_ok=True)

Gotcha I learned the hard way: When merging into an existing directory, files are overwritten without warning. Make sure that's what you want.

rmtree: Recursive Deletion (Use With Care)

This deletes a directory and everything in it. No recycle bin. No undo.

shutil.rmtree('old_project')

Error Handling

By default, rmtree stops on the first error. For cleaning up where you don't care about individual failures:

shutil.rmtree('temp_dir', ignore_errors=True)

For logging errors but continuing:

def handle_error(func, path, exc_info):
    """Log errors but continue deletion."""
    print(f"Warning: couldn't delete {path}: {exc_info[1]}")
 
shutil.rmtree('messy_dir', onerror=handle_error)

Python 3.12+ note: The onerror parameter is deprecated. Use onexc instead:

# Python 3.12+
def handle_error(func, path, exc):
    print(f"Couldn't delete {path}: {exc}")
 
shutil.rmtree('messy_dir', onexc=handle_error)

A Common Pattern: Safe Cleanup

Here's a pattern I use for cleanup that might fail:

from pathlib import Path
 
def safe_cleanup(path):
    """Remove directory if it exists, don't fail if it doesn't."""
    p = Path(path)
    if p.exists():
        shutil.rmtree(p, ignore_errors=True)

move: Cross-Filesystem Moves

os.rename() fails when moving across filesystems. shutil.move() handles it automatically by copying then deleting:

shutil.move('local_file.txt', '/mnt/external_drive/file.txt')

Gotcha: If the destination is a directory, the file moves into it:

shutil.move('report.pdf', '/home/user/documents/')
# Result: /home/user/documents/report.pdf

If the destination is a full file path, it's renamed:

shutil.move('report.pdf', '/home/user/documents/final_report.pdf')
# Result: /home/user/documents/final_report.pdf

disk_usage: Check Before You Copy

Always check disk space before large operations:

usage = shutil.disk_usage('/')
 
print(f"Total: {usage.total / (1024**3):.1f} GB")
print(f"Used:  {usage.used / (1024**3):.1f} GB")
print(f"Free:  {usage.free / (1024**3):.1f} GB")

A practical example:

from pathlib import Path
 
def safe_copy(src, dst):
    """Copy file only if there's enough space."""
    src_path = Path(src)
    dst_path = Path(dst)
    
    file_size = src_path.stat().st_size
    free_space = shutil.disk_usage(dst_path.parent).free
    
    if file_size > free_space:
        raise IOError(
            f"Not enough space: need {file_size:,} bytes, "
            f"have {free_space:,} bytes"
        )
    
    shutil.copy2(src, dst)

which: Finding Executables

This saved me from writing janky PATH parsing code:

python_path = shutil.which('python')
print(python_path)  # /usr/bin/python
 
git_path = shutil.which('git')
if git_path is None:
    print("Git is not installed!")

Real-world usage—checking for required tools:

def check_dependencies():
    """Verify required tools are installed."""
    required = ['git', 'docker', 'python3']
    missing = []
    
    for tool in required:
        if shutil.which(tool) is None:
            missing.append(tool)
    
    if missing:
        raise EnvironmentError(
            f"Missing required tools: {', '.join(missing)}"
        )

make_archive and unpack_archive

Creating and extracting archives without touching zipfile or tarfile directly:

Creating Archives

# Create a zip file
shutil.make_archive('backup', 'zip', 'my_project')
# Creates: backup.zip
 
# Create a tar.gz
shutil.make_archive('backup', 'gztar', 'my_project')
# Creates: backup.tar.gz

Supported formats:

  • zip - ZIP file
  • tar - uncompressed tar
  • gztar - gzip compressed tar (.tar.gz)
  • bztar - bzip2 compressed tar (.tar.bz2)
  • xztar - xz compressed tar (.tar.xz)

Extracting Archives

shutil.unpack_archive('backup.zip', 'extracted/')
shutil.unpack_archive('backup.tar.gz', 'extracted/')

The format is auto-detected from the extension. Nice.

A Backup Script

Putting it together:

from datetime import datetime
from pathlib import Path
import shutil
 
def backup_project(project_dir, backup_dir='./backups'):
    """Create timestamped backup of a project."""
    project = Path(project_dir)
    backups = Path(backup_dir)
    backups.mkdir(exist_ok=True)
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    archive_name = f"{project.name}_{timestamp}"
    
    # Check space (rough estimate: assume compression is 50%)
    project_size = sum(f.stat().st_size for f in project.rglob('*') if f.is_file())
    estimated_archive_size = project_size // 2
    free = shutil.disk_usage(backups).free
    
    if estimated_archive_size > free * 0.9:  # Leave 10% buffer
        raise IOError("Not enough disk space for backup")
    
    archive_path = shutil.make_archive(
        str(backups / archive_name),
        'gztar',
        project_dir
    )
    
    return archive_path
 
# Usage
backup_file = backup_project('./my_app')
print(f"Created backup: {backup_file}")

Common Gotchas I've Encountered

1. Permissions After Copy

copyfile doesn't copy permissions, so you might create a file you can't execute:

# This loses execute permissions!
shutil.copyfile('script.sh', 'script_copy.sh')
 
# This preserves them:
shutil.copy2('script.sh', 'script_copy.sh')

rmtree doesn't follow symlinks (good!), but it will delete the symlink itself:

# If 'link_to_important' is a symlink to '/important/data'
shutil.rmtree('link_to_important')
# The symlink is removed, but /important/data is safe

3. copytree Destination Must Not Exist (Before 3.8)

Before Python 3.8, this fails:

# Pre-3.8: raises FileExistsError if dest exists
shutil.copytree('src', 'existing_dest')
 
# 3.8+: use dirs_exist_ok=True
shutil.copytree('src', 'existing_dest', dirs_exist_ok=True)

4. move Can Overwrite

shutil.move() will silently overwrite the destination:

# If dest.txt exists, it's gone
shutil.move('src.txt', 'dest.txt')

Check first if that matters:

from pathlib import Path
 
def safe_move(src, dst):
    if Path(dst).exists():
        raise FileExistsError(f"{dst} already exists")
    shutil.move(src, dst)

5. Archive Paths Can Be Confusing

The root_dir and base_dir parameters in make_archive are confusing. Here's what they mean:

# Archive contents will start from 'base_dir' relative to 'root_dir'
shutil.make_archive(
    'backup',
    'zip',
    root_dir='/home/user',
    base_dir='projects/myapp'
)
# Archive contains: projects/myapp/...

Most of the time, just pass the directory you want to archive:

shutil.make_archive('backup', 'zip', '/home/user/projects/myapp')

Quick Reference

FunctionWhat It Does
copyfile(src, dst)Copy file contents only
copyfileobj(fsrc, fdst)Copy between file objects
copy(src, dst)Copy file + permissions
copy2(src, dst)Copy file + all metadata
copytree(src, dst)Copy directory recursively
rmtree(path)Delete directory tree
move(src, dst)Move file/directory
disk_usage(path)Get disk space info
which(cmd)Find executable in PATH
make_archive(...)Create zip/tar archive
unpack_archive(...)Extract archive

shutil is one of those modules that's easy to overlook because the operations seem simple. But when you need to copy a million files with correct permissions across filesystems while skipping cache directories—it's really nice to have.

React to this post: