Regular expressions are powerful but often overused. Here's when and how to use them.
When to Use Regex
Use regex for:
- Complex pattern matching
- Extracting multiple groups
- Validation with specific formats
- Find/replace with patterns
Use string methods for:
- Simple checks (
startswith,endswith,in) - Basic splits and joins
- Case changes
- Strip/trim
# Don't use regex
if re.match(r"^https://", url): # Overkill
# Use string methods
if url.startswith("https://"): # BetterBasic Usage
import re
# Search anywhere in string
match = re.search(r"error", "An error occurred")
if match:
print(match.group()) # "error"
# Match at start
match = re.match(r"Hello", "Hello world")
# Find all matches
matches = re.findall(r"\d+", "1 apple, 2 oranges, 3 bananas")
# ['1', '2', '3']
# Replace
result = re.sub(r"\d+", "X", "1 apple, 2 oranges")
# "X apple, X oranges"Common Patterns
# Digits
r"\d" # Single digit
r"\d+" # One or more digits
r"\d{3}" # Exactly 3 digits
r"\d{2,4}" # 2 to 4 digits
# Word characters
r"\w" # Letter, digit, or underscore
r"\w+" # One or more word characters
# Whitespace
r"\s" # Any whitespace
r"\s+" # One or more whitespace
# Anchors
r"^start" # Start of string
r"end$" # End of string
r"\bword\b" # Word boundary
# Character classes
r"[abc]" # a, b, or c
r"[a-z]" # Lowercase letter
r"[^abc]" # Not a, b, or c
# Quantifiers
r"a?" # Zero or one
r"a*" # Zero or more
r"a+" # One or more
r"a{3}" # Exactly 3
r"a{2,5}" # 2 to 5Groups
# Capturing groups
match = re.search(r"(\d+)-(\d+)", "Phone: 123-4567")
if match:
print(match.group(0)) # "123-4567" (full match)
print(match.group(1)) # "123"
print(match.group(2)) # "4567"
print(match.groups()) # ('123', '4567')
# Named groups
match = re.search(r"(?P<area>\d+)-(?P<number>\d+)", "123-4567")
print(match.group("area")) # "123"Compiled Patterns
Compile for reuse:
# Compile once
pattern = re.compile(r"\d+")
# Use many times
pattern.search("abc123")
pattern.findall("1, 2, 3")
pattern.sub("X", "abc123")Better performance when using the same pattern repeatedly.
Flags
# Case insensitive
re.search(r"hello", "HELLO", re.IGNORECASE)
re.search(r"(?i)hello", "HELLO")
# Multiline (^ and $ match line boundaries)
re.findall(r"^\w+", "line1\nline2", re.MULTILINE)
# Dot matches newline
re.search(r"a.b", "a\nb", re.DOTALL)
# Verbose (allows comments)
pattern = re.compile(r"""
\d{3} # Area code
- # Separator
\d{4} # Number
""", re.VERBOSE)Common Recipes
Email (simple)
email_pattern = r"[\w.+-]+@[\w-]+\.[\w.-]+"
re.match(email_pattern, "user@example.com")URL
url_pattern = r"https?://[\w.-]+(?:/[\w./-]*)?"Phone number
phone_pattern = r"\d{3}[-.\s]?\d{3}[-.\s]?\d{4}"Extract data
log_line = "2024-03-21 10:30:45 ERROR: Connection failed"
pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+): (.+)"
match = re.match(pattern, log_line)
date, time, level, message = match.groups()Gotchas
Raw strings
# Wrong - \b is backspace
re.search("\bword\b", text)
# Right - raw string
re.search(r"\bword\b", text)Always use r"..." for regex patterns.
Greedy vs lazy
text = "<tag>content</tag>"
# Greedy (default) - matches as much as possible
re.search(r"<.*>", text).group() # "<tag>content</tag>"
# Lazy - matches as little as possible
re.search(r"<.*?>", text).group() # "<tag>"My Rules
- Try string methods first — simpler is better
- Use raw strings — always
r"pattern" - Compile if reused — for performance
- Comment complex patterns — use
re.VERBOSE - Test thoroughly — edge cases matter
Regex is a tool. Use it when appropriate, not everywhere.
React to this post: