When I first needed to parse XML in Python, I reached for BeautifulSoup out of habit. Then I discovered that Python ships with a perfectly capable XML parser in the standard library: xml.etree.ElementTree. No pip install required.

Here's everything I've learned about it.

Why ElementTree?

ElementTree provides a simple, Pythonic API for XML. It's:

  • Built-in: No external dependencies
  • Memory efficient: Parses into a tree structure, not a full DOM
  • Easy to use: Elements behave like lists, attributes like dicts
import xml.etree.ElementTree as ET

That import is your starting point for everything that follows.

Parsing XML

From a String

import xml.etree.ElementTree as ET
 
xml_string = """
<library>
    <book isbn="978-0-13-468599-1">
        <title>The Pragmatic Programmer</title>
        <author>David Thomas</author>
        <year>2019</year>
    </book>
    <book isbn="978-0-596-00712-6">
        <title>Head First Design Patterns</title>
        <author>Eric Freeman</author>
        <year>2004</year>
    </book>
</library>
"""
 
root = ET.fromstring(xml_string)
print(root.tag)  # library

The fromstring() function returns the root element directly.

From a File

import xml.etree.ElementTree as ET
 
# Parse returns an ElementTree object
tree = ET.parse('library.xml')
root = tree.getroot()
 
# Now work with root as usual
for book in root:
    print(book.get('isbn'))

The difference: parse() returns an ElementTree object (which wraps the root), while fromstring() returns an Element directly. If you need to write back to a file later, keep the tree around.

Iterative Parsing for Large Files

For huge XML files, parsing everything into memory won't work. Use iterparse():

import xml.etree.ElementTree as ET
 
# Process elements as they're parsed
for event, elem in ET.iterparse('huge_file.xml', events=['end']):
    if elem.tag == 'book':
        print(elem.find('title').text)
        elem.clear()  # Free memory after processing

The clear() call is crucial—without it, you're still building the full tree.

Once you have the root element, you need to find things in it.

Direct Children

Elements are iterable. Loop over them to get direct children:

root = ET.fromstring(xml_string)
 
for child in root:
    print(f"{child.tag}: {child.attrib}")
# book: {'isbn': '978-0-13-468599-1'}
# book: {'isbn': '978-0-596-00712-6'}

find() - First Match

Returns the first matching element, or None:

# Find first book
first_book = root.find('book')
print(first_book.find('title').text)  # The Pragmatic Programmer
 
# Careful with None!
missing = root.find('magazine')
print(missing)  # None
 
# This will crash:
# missing.find('title')  # AttributeError

Always check for None when using find().

findall() - All Matches

Returns a list of all matching elements:

books = root.findall('book')
print(len(books))  # 2
 
for book in books:
    title = book.find('title').text
    year = book.find('year').text
    print(f"{title} ({year})")

iter() - All Descendants

Recursively iterate through all descendants with a specific tag:

# All title elements, anywhere in the tree
for title in root.iter('title'):
    print(title.text)
 
# All elements, period
for elem in root.iter():
    print(f"{elem.tag}: {elem.text}")

iter() is great when you don't care about tree structure—just give me all the things.

iterfind() - Lazy findall()

Like findall(), but returns an iterator instead of a list:

# Memory-efficient for large results
for book in root.iterfind('book'):
    process(book)

Element Properties

Every element has these:

book = root.find('book')
 
# Tag name
book.tag  # 'book'
 
# Text content
book.find('title').text  # 'The Pragmatic Programmer'
 
# Tail (text after closing tag)
book.tail  # Usually whitespace
 
# Attributes (dict-like)
book.attrib  # {'isbn': '978-0-13-468599-1'}
book.get('isbn')  # '978-0-13-468599-1'
book.get('missing', 'default')  # 'default'

Text vs Tail

This confused me at first:

<p>Hello <b>world</b>!</p>
p = ET.fromstring('<p>Hello <b>world</b>!</p>')
print(p.text)          # 'Hello '
b = p.find('b')
print(b.text)          # 'world'
print(b.tail)          # '!'

text is content before the first child. tail is content after the element's closing tag but before the parent's closing tag.

XPath Basics

ElementTree supports a subset of XPath. It's powerful enough for most tasks.

Path Syntax

# Direct child
root.find('book')
 
# Any descendant
root.find('.//title')  # Title anywhere below root
 
# Specific path
root.find('./book/title')
 
# Current element (rarely needed)
root.find('.')

Attribute Predicates

# Find by attribute value
root.find(".//book[@isbn='978-0-13-468599-1']")
 
# Element with specific attribute (any value)
root.find(".//book[@isbn]")
 
# Element WITHOUT an attribute
root.findall(".//book[not(@isbn)]")  # Not supported!

That last one doesn't work—ElementTree's XPath is limited.

Position Predicates

# First book (1-indexed!)
root.find('.//book[1]')
 
# Last book
root.find('.//book[last()]')
 
# Second to last
root.find('.//book[last()-1]')

Text Predicates

# Book with specific title text
root.find(".//book[title='The Pragmatic Programmer']")
 
# Book from a specific year
root.find(".//book[year='2019']")

What's NOT Supported

ElementTree's XPath is subset—these don't work:

  • Axes like ancestor::, following-sibling::
  • Functions like contains(), starts-with()
  • Boolean operators in predicates
  • Arithmetic expressions

For full XPath, use lxml:

from lxml import etree
 
root = etree.fromstring(xml_string)
# Now full XPath works
root.xpath(".//book[contains(title, 'Python')]")

Creating XML Documents

Building from Scratch

import xml.etree.ElementTree as ET
 
# Create root
root = ET.Element('library')
 
# Add a book
book = ET.SubElement(root, 'book')
book.set('isbn', '978-0-13-468599-1')
 
# Add book children
title = ET.SubElement(book, 'title')
title.text = 'The Pragmatic Programmer'
 
author = ET.SubElement(book, 'author')
author.text = 'David Thomas'
 
# Convert to string
xml_str = ET.tostring(root, encoding='unicode')
print(xml_str)
# <library><book isbn="978-0-13-468599-1"><title>The Pragmatic Programmer</title><author>David Thomas</author></book></library>

Pretty Printing (Python 3.9+)

That output is ugly. Fix it:

ET.indent(root, space="  ")
xml_str = ET.tostring(root, encoding='unicode')
print(xml_str)

Output:

<library>
  <book isbn="978-0-13-468599-1">
    <title>The Pragmatic Programmer</title>
    <author>David Thomas</author>
  </book>
</library>

Before Python 3.9, you'd need minidom or a helper function.

Writing to File

tree = ET.ElementTree(root)
 
# Basic write
tree.write('output.xml')
 
# With XML declaration and encoding
tree.write(
    'output.xml',
    encoding='utf-8',
    xml_declaration=True
)

The file will start with <?xml version='1.0' encoding='utf-8'?>.

Generating from Data

Real-world use case—turning a list of dicts into XML:

def books_to_xml(books):
    root = ET.Element('library')
    
    for book_data in books:
        book = ET.SubElement(root, 'book')
        book.set('isbn', book_data['isbn'])
        
        for field in ['title', 'author', 'year']:
            if field in book_data:
                elem = ET.SubElement(book, field)
                elem.text = str(book_data[field])
    
    return root
 
books = [
    {'isbn': '123', 'title': 'Book One', 'author': 'Alice', 'year': 2020},
    {'isbn': '456', 'title': 'Book Two', 'author': 'Bob', 'year': 2021},
]
 
root = books_to_xml(books)
ET.indent(root)
print(ET.tostring(root, encoding='unicode'))

Modifying Existing XML

Changing Text and Attributes

root = ET.fromstring(xml_string)
 
# Update text
for title in root.iter('title'):
    title.text = title.text.upper()
 
# Update attributes
for book in root.findall('book'):
    book.set('updated', '2024-01-15')
    
# Remove attribute
for book in root.findall('book'):
    if 'updated' in book.attrib:
        del book.attrib['updated']

Adding Elements

# Add as last child
new_book = ET.SubElement(root, 'book')
new_book.set('isbn', '789')
 
# Insert at specific position
root.insert(0, new_book)  # Insert at beginning
 
# Copy an element
import copy
book_copy = copy.deepcopy(root.find('book'))
root.append(book_copy)

Removing Elements

# Remove specific element
for book in root.findall(".//book[@isbn='978-0-596-00712-6']"):
    root.remove(book)
 
# Remove all books from 2004
for book in root.findall('.//book'):
    year = book.find('year')
    if year is not None and year.text == '2004':
        root.remove(book)

Gotcha: Don't modify a list while iterating over it!

# WRONG - will skip elements
for book in root:
    if some_condition(book):
        root.remove(book)
 
# RIGHT - iterate over a copy
for book in list(root):
    if some_condition(book):
        root.remove(book)
 
# OR - collect then remove
to_remove = [b for b in root if some_condition(b)]
for book in to_remove:
    root.remove(book)

Replacing Elements

old_book = root.find(".//book[@isbn='123']")
new_book = ET.Element('book')
new_book.set('isbn', '123-new')
ET.SubElement(new_book, 'title').text = 'Replacement Book'
 
# Find index and replace
idx = list(root).index(old_book)
root.remove(old_book)
root.insert(idx, new_book)

Handling Namespaces

This is where XML gets annoying.

The Problem

<root xmlns="http://example.com/default"
      xmlns:custom="http://example.com/custom">
    <item>Default namespace</item>
    <custom:item>Custom namespace</custom:item>
</root>

Try to find elements normally:

root = ET.fromstring(xml_with_namespaces)
print(root.find('item'))  # None! Where did it go?

Namespaces change how tags work internally:

for child in root:
    print(child.tag)
# {http://example.com/default}item
# {http://example.com/custom}item

The full tag is {namespace}localname.

Solution: Namespace Dicts

ns = {
    'default': 'http://example.com/default',
    'custom': 'http://example.com/custom',
}
 
# Now finding works
root.find('default:item', ns)
root.find('custom:item', ns)
root.findall('.//default:item', ns)

Dealing with Default Namespaces

When there's no prefix in the XML (just xmlns=), you still need one in your dict:

<feed xmlns="http://www.w3.org/2005/Atom">
    <entry><title>Hello</title></entry>
</feed>
ns = {'atom': 'http://www.w3.org/2005/Atom'}
 
root = ET.fromstring(atom_feed)
entries = root.findall('atom:entry', ns)

Creating Namespaced XML

# Register namespace prefix
ET.register_namespace('custom', 'http://example.com/custom')
 
root = ET.Element('{http://example.com/custom}root')
item = ET.SubElement(root, '{http://example.com/custom}item')
item.text = 'Hello'
 
print(ET.tostring(root, encoding='unicode'))
# <custom:root xmlns:custom="http://example.com/custom"><custom:item>Hello</custom:item></custom:root>

Stripping Namespaces

Sometimes you just want to ignore them:

def strip_namespaces(root):
    """Remove all namespaces from element tags."""
    for elem in root.iter():
        if '}' in elem.tag:
            elem.tag = elem.tag.split('}')[1]
    return root
 
root = strip_namespaces(root)
# Now root.find('item') works

Use with caution—you lose namespace information.

Common Gotchas

1. find() Returns None Silently

book = root.find('nonexistent')
title = book.find('title')  # AttributeError: 'NoneType' has no attribute 'find'

Always check:

book = root.find('book')
if book is not None:
    title = book.find('title')

Or use findall() which returns empty list instead of None.

2. Boolean Testing Elements

# This looks wrong but isn't
elem = root.find('book')
if elem:  # False if book has no children!
    print("Found")

An element with no children is "falsy". Use explicit None check:

if elem is not None:
    print("Found")

3. Text is None, Not Empty String

empty = ET.fromstring('<tag></tag>')
print(empty.text)  # None, not ""
 
print(empty.text or "")  # Safe way to get empty string

4. Encoding Gotchas

# tostring returns bytes by default
ET.tostring(root)  # b'<root>...</root>'
 
# For string, specify encoding
ET.tostring(root, encoding='unicode')  # '<root>...</root>'

5. Modifying During Iteration

# BROKEN
for child in root:
    root.remove(child)  # Skips elements!
 
# FIXED
for child in list(root):
    root.remove(child)

Security: XXE Attacks

This is critical if you parse untrusted XML.

What's XXE?

XML External Entity attacks let attackers read files or make network requests from your server:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>

When parsed, &xxe; expands to the contents of /etc/passwd.

ElementTree's Default Behavior

Good news: xml.etree.ElementTree ignores DTDs by default, so the basic XXE attack doesn't work:

import xml.etree.ElementTree as ET
 
malicious = '''<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>'''
 
root = ET.fromstring(malicious)
print(root.text)  # None (entity not expanded)

But Other Attacks Exist

  • Billion laughs attack (denial of service)
  • External DTD fetching
  • Parameter entity expansion

The Safe Solution: defusedxml

For untrusted input, use defusedxml:

pip install defusedxml
import defusedxml.ElementTree as ET
 
# Same API, but safe
root = ET.fromstring(untrusted_xml)
tree = ET.parse(untrusted_file)

defusedxml blocks:

  • External entity processing
  • DTD retrieval
  • Entity expansion attacks

Rule of thumb: If you're parsing user-supplied XML, use defusedxml.

Security Checklist

  1. ✅ Use defusedxml for untrusted input
  2. ✅ Validate XML against a schema if possible
  3. ✅ Set reasonable size limits on input
  4. ✅ Don't use xml.etree.ElementTree with XMLParser entities enabled
  5. ✅ Consider JSON instead if you control both ends

Quick Reference

import xml.etree.ElementTree as ET
 
# Parsing
root = ET.fromstring(xml_string)     # From string
tree = ET.parse('file.xml')          # From file
root = tree.getroot()
 
# Navigation
root.find('tag')                     # First match (or None)
root.findall('tag')                  # All matches (list)
root.iter('tag')                     # All descendants (iterator)
root.findall('.//tag')               # Descendants via XPath
 
# Element properties
elem.tag                             # Tag name
elem.text                            # Text content
elem.tail                            # Text after close tag
elem.attrib                          # Attributes dict
elem.get('attr', default)            # Get attribute
 
# XPath
root.find('./child/grandchild')      # Path
root.find('.//tag')                  # Any descendant
root.find(".//tag[@attr='val']")     # By attribute
root.find('.//tag[1]')               # By position
 
# Creating
root = ET.Element('root')            # New element
child = ET.SubElement(root, 'child') # Add child
child.text = 'content'               # Set text
child.set('attr', 'value')           # Set attribute
ET.indent(root)                      # Pretty print
 
# Writing
ET.tostring(root, encoding='unicode')
tree = ET.ElementTree(root)
tree.write('out.xml', encoding='utf-8', xml_declaration=True)
 
# Namespaces
ns = {'prefix': 'http://example.com'}
root.find('prefix:tag', ns)

When to Use What

TaskTool
Simple parsingxml.etree.ElementTree
Large filesiterparse()
Pretty printingminidom or ET.indent()
Full XPathlxml
Untrusted XMLdefusedxml
Speed criticallxml

Wrapping Up

ElementTree handles 90% of XML tasks with zero dependencies. The API is intuitive once you understand:

  • Elements are list-like (iterate children)
  • Attributes are dict-like (.get(), .set())
  • find() returns None, findall() returns empty list
  • Namespaces need explicit handling
  • Use defusedxml for untrusted input

Start with fromstring() and find()/findall(). Add complexity only when needed.

React to this post: