CLI tools are tricky to test. They interact with filesystems, spawn processes, read environment variables. Here's how I approach it.
The Testing Pyramid for CLIs
Unit tests (most): Pure functions, argument parsing, data transformation.
Integration tests (some): Commands that interact with real files or processes.
End-to-end tests (few): Full CLI invocations checking actual output.
The ratio shifts compared to web apps. CLIs are often thin wrappers around logic, so unit testing the logic covers most cases.
Unit Testing the Core
Separate your business logic from CLI handling:
# Bad: logic mixed with CLI
@click.command()
def process_file(path):
content = open(path).read()
result = content.upper() # Business logic buried here
click.echo(result)
# Good: logic separated
def transform_content(content: str) -> str:
return content.upper()
@click.command()
def process_file(path):
content = open(path).read()
result = transform_content(content)
click.echo(result)Now transform_content is trivially testable. No mocking files, no capturing output.
Testing Argument Parsing
For Click/Typer/argparse, test the parsed arguments:
def test_parse_args():
result = runner.invoke(cli, ["--verbose", "--output", "out.txt"])
assert result.exit_code == 0
# Or test the parsed values directlyDon't over-test the framework. If Click handles --help, you don't need to test that --help works.
Integration Tests with Temp Files
For commands that read/write files:
import tempfile
from pathlib import Path
def test_writes_output_file():
with tempfile.TemporaryDirectory() as tmpdir:
output = Path(tmpdir) / "out.txt"
result = runner.invoke(cli, ["process", "--output", str(output)])
assert result.exit_code == 0
assert output.exists()
assert output.read_text() == "expected content"Always use temp directories. Never write to the actual filesystem.
Snapshot Testing
For CLIs with complex output, snapshot tests catch regressions:
def test_help_output(snapshot):
result = runner.invoke(cli, ["--help"])
assert result.output == snapshotFirst run creates the snapshot. Future runs compare against it. Great for catching accidental output changes.
I use pytest-snapshot or inline snapshots with syrupy.
Testing Exit Codes
Exit codes matter for scripts:
def test_returns_error_on_missing_file():
result = runner.invoke(cli, ["nonexistent.txt"])
assert result.exit_code == 1
def test_returns_success():
result = runner.invoke(cli, ["valid.txt"])
assert result.exit_code == 0Convention: 0 is success, 1 is general error, 2 is usage error.
Testing Environment Variables
Mock the environment:
import os
from unittest.mock import patch
def test_reads_api_key_from_env():
with patch.dict(os.environ, {"API_KEY": "test-key"}):
result = runner.invoke(cli, ["auth"])
assert result.exit_code == 0Or use monkeypatch in pytest:
def test_reads_api_key(monkeypatch):
monkeypatch.setenv("API_KEY", "test-key")
result = runner.invoke(cli, ["auth"])
assert result.exit_code == 0Testing Interactive Input
For prompts and confirmations:
def test_confirms_delete():
result = runner.invoke(cli, ["delete"], input="y\n")
assert "Deleted" in result.output
def test_aborts_on_no():
result = runner.invoke(cli, ["delete"], input="n\n")
assert "Aborted" in result.outputThe input parameter simulates stdin.
What I Actually Test
For a typical CLI:
- Every command runs without error (smoke tests)
- Argument validation (required args, invalid values)
- Core business logic (unit tests, no CLI involvement)
- Output format (snapshot tests for complex output)
- Error handling (missing files, network errors, bad input)
- Exit codes (0 for success, non-zero for errors)
I don't test:
- Framework behavior (Click/argparse internals)
- Trivial wrappers that just call other functions
- Exact output formatting unless it's part of the API
Test Organization
tests/
├── unit/
│ ├── test_parser.py
│ └── test_transform.py
├── integration/
│ ├── test_file_commands.py
│ └── test_api_commands.py
└── snapshots/
└── test_help_output/
Unit tests run fast. Integration tests run slower. Keep them separate.
The Goal
A well-tested CLI should give confidence that:
- Commands don't crash on valid input
- Errors are handled gracefully
- Output format is stable
- Exit codes are correct
If your tests cover those, you can refactor freely without breaking users.