In software development, you’ll move to XML (Extensible Markup Language) when working with configuration files, API responses, data exports, and more. Although there are powerful third-party libraries for parsing XML, Python’s standard library already includes everything you need.
In this tutorial, you’ll learn how to parse XML using Python’s built-in xml.etree.ElementTree The module does not require any PIP to be installed.
🔗 You can find the code on GitHub.
Conditions
To follow along with this tutorial, you should:
Python 3.7 or later installed on your system
Basic understanding of Python syntax and data structures
Familiarity with basic programming concepts such as loops and conditionals
A text editor or IDE for writing Python code
No external libraries are required as we will be using Python’s built-in ones xml.etree.ElementTree Module
Table of Contents
How to read an XML string
Let’s start simple. To understand the basic concepts we will parse the XML directly from a string.
import xml.etree.ElementTree as ET
xml_string = """
Wireless Keyboard
29.99
"""
root = ET.fromstring(xml_string)
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")
How it works:
We import
xml.etree.ElementTreeAnd give it an aliasET(This is convention)ET.fromstring()Parses and returns an XML stringrootelementIn each element a
.tagproperty (element name) and.attribGlossary (its attributes)rootRepresents the objectelement in our XML
For the above example, you will see the following output:
Root tag: catalog
Root attributes: {}
here, root.attrib is empty because the root element In provided xml_string No attributes are specified. Attributes are key-value pairs within the opening tag of an XML element, eg id="101" or currency="USD" i And Since the elements Its opening tag contains only one tag and no additional information, its attribute dictionary is empty.
How to read an XML file
In real applications, you will typically read XML from files. Say you have one products.xml file Here’s how you can read from an XML file:
tree = ET.parse('products.xml')
root = tree.getroot()
print(f"Root element: {root.tag}")
Before we move on to running and checking the output, let’s note the differences between reading XML strings vs. files:
ET.parse()One reads from a file and one returnsElementTreeObjectionWe call
.getroot()to getrootelementuse
ET.parse()For files,ET.fromstring()For wires
Running the above code should give you:
Root element: catalog
How to find elements in an XML tree
ElementTree Gives you three main ways to search for elements. Each must be understood to use.
import xml.etree.ElementTree as ET
xml_data = """
Wireless Keyboard
Electronics
Accessories
USB Mouse
Electronics
"""
root = ET.fromstring(xml_data)
first_product = root.find('product')
print(f"First product ID: {first_product.get('id')}")
all_products = root.findall('product')
print(f"Total products: {len(all_products)}")
all_categories = root.iter('category')
category_list = (cat.text for cat in all_categories)
print(f"All categories: {category_list}")
Now let’s understand how the three methods work:
find()Stopped in the first match. Use when you only need one element.findall()Finds only direct children (one level deep). Use for immediate child elements.iter()Iteratively searches through the entire tree. Use when elements are punched anywhere in the house.
It is important to: findall('category') Because nothing will be found on the root has no direct child . But iter('category') Regardless, all categories will be met. So when you run the above code, you will get:
First product ID: 101
Total products: 2
All categories: ('Electronics', 'Accessories', 'Electronics')
How to extract text and attributes from XML
Now extract the actual data from our XML. This is where you convert the structured XML into Python data you can work with.
xml_data = """
Wireless Keyboard
29.99
45
"""
root = ET.fromstring(xml_data)
product = root.find('product')
product_name = product.find('name').text
price_text = product.find('price').text
stock_text = product.find('stock').text
product_id = product.get('id')
product_id_alt = product.attrib('id')
price_element = product.find('price')
currency = price_element.get('currency')
print(f"Product: {product_name}")
print(f"ID: {product_id}")
print(f"Price: {currency} {price_text}")
print(f"Stock: {stock_text}")
These results:
Product: Wireless Keyboard
ID: 101
Price: USD 29.99
Stock: 45
Here’s what’s going on:
.textThe text content found between opening and closing tags.get('attribute_name')Safely retrieves an attribute (returnNoneif missing).attrib('attribute_name')Directly accesses the attribute dictionary (picks upKeyErrorif missing)use
.get()When an attribute can be optional, use.attrib()When it is needed
How to Create a Simple XML Parser
Let’s put it all together with a practical example. We will analyze the entire product catalog and convert it into a Python list of dictionaries.
def parse_product_catalog(xml_file):
"""Parse an XML product catalog and return a list of product dictionaries."""
tree = ET.parse(xml_file)
root = tree.getroot()
products = ()
for product_element in root.findall('product'):
product = {
'id': product_element.get('id'),
'name': product_element.find('name').text,
'price': float(product_element.find('price').text),
'currency': product_element.find('price').get('currency'),
'stock': int(product_element.find('stock').text),
'categories': ()
}
categories_element = product_element.find('categories')
if categories_element is not None:
for category in categories_element.findall('category'):
product('categories').append(category.text)
products.append(product)
return products
Breaking down this parser:
We iterate through all
Using elementsfindall()For each product, we extract the text and attributes into a dictionary. We convert the numeric strings to the appropriate types (
floatFor the price,intfor stock)For the domestic category, we first check whether
The element exists. Then we iterate through the childCollect the elements and their text
The result is a clean data structure that you can easily work with. Now you can use the parser like this:
products = parse_product_catalog('products.xml')
for product in products:
print(f"\nProduct: {product('name')}")
print(f" ID: {product('id')}")
print(f" Price: {product('currency')} {product('price')}")
print(f" Stock: {product('stock')}")
print(f" Categories: {', '.join(product('categories'))}")
Output:
Product: Wireless Keyboard
ID: 101
Price: USD 29.99
Stock: 45
Categories: Electronics, Accessories
Product: USB Mouse
ID: 102
Price: USD 15.99
Stock: 120
Categories: Electronics
How to handle missing data
Real-world XML is messy (no surprise there!). Elements may be missing, text may be empty, or attributes may not exist. Here’s how to handle it gracefully.
xml_data = """
Wireless Keyboard
29.99
USB Mouse
"""
root = ET.fromstring(xml_data)
for product in root.findall('product'):
name = product.find('name').text
price_element = product.find('price')
if price_element is not None:
price = float(price_element.text)
currency = price_element.get('currency', 'USD')
print(f"{name}: {currency} {price}")
else:
print(f"{name}: Price not available")
Here, we handle missing data by:
By using
product.find('price')to findelement within the currentelementChecking if it has a result
find()isNone. If no element is found,find()returnNone.Using a
if price_element is not None:Condition to attempt text-only access(price_element.text)and attributes(price_element.get('currency', 'USD'))Ofelement if it was actually found.Adding one
elseblock to handle the case whereElement is missing, printing “price not available”.
This approach prevents bugs from trying to access you .text or .get() on a None Object For the above code fragment, you will get:
Wireless Keyboard: USD 29.99
USB Mouse: Price not available
Here are some other fallacy strategies:
Always check if
find()returnNoneBefore access.textor.get()use
.get('attr', 'default')Providing default values ​​for missing attributesConsider wrapping parsing in try blocks for production code
Validate your data after parsing instead of validating the XML structure
The result
Now you know how to parse XML in Python without installing any external libraries. You learned:
How to read XML from strings and files
The difference between
find()for , for , for , .findall()anditer()How to safely extract text content and attributes
How to handle nested elements and missing data
xml.etree.ElementTree The module does enough for most XML parsing needs, and is always available in Python’s standard library.
For more advanced XML navigation and selection, you can explore xpath expressions. XPath works well for selecting nodes in an XML document and can be very useful for complex structures. We will cover this in another tutorial.
Until then, happy parsing!