How to parse XML in Python without using external libraries

by SkillAiNest

In software development, you’ll move to XML (Extensible Markup Language) when working with configuration files, API responses, data exports, and more. Although there are powerful third-party libraries for parsing XML, Python’s standard library already includes everything you need.

In this tutorial, you’ll learn how to parse XML using Python’s built-in xml.etree.ElementTree The module does not require any PIP to be installed.

🔗 You can find the code on GitHub.

Conditions

To follow along with this tutorial, you should:

  • Python 3.7 or later installed on your system

  • Basic understanding of Python syntax and data structures

  • Familiarity with basic programming concepts such as loops and conditionals

  • A text editor or IDE for writing Python code

No external libraries are required as we will be using Python’s built-in ones xml.etree.ElementTree Module

Table of Contents

  1. How to read an XML string

  2. How to read an XML file

  3. How to find elements in an XML tree

  4. How to extract text and attributes from XML

  5. How to Create a Simple XML Parser

  6. How to handle missing data

How to read an XML string

Let’s start simple. To understand the basic concepts we will parse the XML directly from a string.

import xml.etree.ElementTree as ET

xml_string = """

    
        Wireless Keyboard
        29.99
    

"""

root = ET.fromstring(xml_string)
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")

How it works:

  • We import xml.etree.ElementTree And give it an alias ET (This is convention)

  • ET.fromstring() Parses and returns an XML string root element

  • In each element a .tag property (element name) and .attrib Glossary (its attributes)

  • root Represents the object element in our XML

For the above example, you will see the following output:

Root tag: catalog
Root attributes: {}

here, root.attrib is empty because the root element In provided xml_string No attributes are specified. Attributes are key-value pairs within the opening tag of an XML element, eg id="101" or currency="USD" i And Since the elements Its opening tag contains only one tag and no additional information, its attribute dictionary is empty.

How to read an XML file

In real applications, you will typically read XML from files. Say you have one products.xml file Here’s how you can read from an XML file:


tree = ET.parse('products.xml')
root = tree.getroot()

print(f"Root element: {root.tag}")

Before we move on to running and checking the output, let’s note the differences between reading XML strings vs. files:

  • ET.parse() One reads from a file and one returns ElementTree Objection

  • We call .getroot() to get root element

  • use ET.parse() For files, ET.fromstring() For wires

Running the above code should give you:

Root element: catalog

How to find elements in an XML tree

ElementTree Gives you three main ways to search for elements. Each must be understood to use.

import xml.etree.ElementTree as ET

xml_data = """

    
        Wireless Keyboard
        
            Electronics
            Accessories
        
    
    
        USB Mouse
        
            Electronics
        
    

"""

root = ET.fromstring(xml_data)


first_product = root.find('product')
print(f"First product ID: {first_product.get('id')}")


all_products = root.findall('product')
print(f"Total products: {len(all_products)}")


all_categories = root.iter('category')
category_list = (cat.text for cat in all_categories)
print(f"All categories: {category_list}")

Now let’s understand how the three methods work:

  • find() Stopped in the first match. Use when you only need one element.

  • findall() Finds only direct children (one level deep). Use for immediate child elements.

  • iter() Iteratively searches through the entire tree. Use when elements are punched anywhere in the house.

It is important to: findall('category') Because nothing will be found on the root has no direct child . But iter('category') Regardless, all categories will be met. So when you run the above code, you will get:

First product ID: 101
Total products: 2
All categories: ('Electronics', 'Accessories', 'Electronics')

How to extract text and attributes from XML

Now extract the actual data from our XML. This is where you convert the structured XML into Python data you can work with.

xml_data = """

    
        Wireless Keyboard
        29.99
        45
    

"""

root = ET.fromstring(xml_data)
product = root.find('product')


product_name = product.find('name').text
price_text = product.find('price').text
stock_text = product.find('stock').text


product_id = product.get('id')  
product_id_alt = product.attrib('id')  


price_element = product.find('price')
currency = price_element.get('currency')

print(f"Product: {product_name}")
print(f"ID: {product_id}")
print(f"Price: {currency} {price_text}")
print(f"Stock: {stock_text}")

These results:

Product: Wireless Keyboard
ID: 101
Price: USD 29.99
Stock: 45

Here’s what’s going on:

  • .text The text content found between opening and closing tags

  • .get('attribute_name') Safely retrieves an attribute (return None if missing)

  • .attrib('attribute_name') Directly accesses the attribute dictionary (picks up KeyError if missing)

  • use .get() When an attribute can be optional, use .attrib() When it is needed

How to Create a Simple XML Parser

Let’s put it all together with a practical example. We will analyze the entire product catalog and convert it into a Python list of dictionaries.

def parse_product_catalog(xml_file):
    """Parse an XML product catalog and return a list of product dictionaries."""
    tree = ET.parse(xml_file)
    root = tree.getroot()

    products = ()

    for product_element in root.findall('product'):
        
        product = {
            'id': product_element.get('id'),
            'name': product_element.find('name').text,
            'price': float(product_element.find('price').text),
            'currency': product_element.find('price').get('currency'),
            'stock': int(product_element.find('stock').text),
            'categories': ()
        }

        
        categories_element = product_element.find('categories')
        if categories_element is not None:
            for category in categories_element.findall('category'):
                product('categories').append(category.text)

        products.append(product)

    return products

Breaking down this parser:

  • We iterate through all Using elements findall()

  • For each product, we extract the text and attributes into a dictionary. We convert the numeric strings to the appropriate types (float For the price, int for stock)

  • For the domestic category, we first check whether The element exists. Then we iterate through the child Collect the elements and their text

The result is a clean data structure that you can easily work with. Now you can use the parser like this:

products = parse_product_catalog('products.xml')

for product in products:
    print(f"\nProduct: {product('name')}")
    print(f"  ID: {product('id')}")
    print(f"  Price: {product('currency')} {product('price')}")
    print(f"  Stock: {product('stock')}")
    print(f"  Categories: {', '.join(product('categories'))}")

Output:

Product: Wireless Keyboard
  ID: 101
  Price: USD 29.99
  Stock: 45
  Categories: Electronics, Accessories

Product: USB Mouse
  ID: 102
  Price: USD 15.99
  Stock: 120
  Categories: Electronics

How to handle missing data

Real-world XML is messy (no surprise there!). Elements may be missing, text may be empty, or attributes may not exist. Here’s how to handle it gracefully.

xml_data = """

    
        Wireless Keyboard
        29.99
    
    
        USB Mouse
        
    

"""

root = ET.fromstring(xml_data)

for product in root.findall('product'):
    name = product.find('name').text

    
    price_element = product.find('price')
    if price_element is not None:
        price = float(price_element.text)
        currency = price_element.get('currency', 'USD')  
        print(f"{name}: {currency} {price}")
    else:
        print(f"{name}: Price not available")

Here, we handle missing data by:

  1. By using product.find('price') to find element within the current element

  2. Checking if it has a result find() is None. If no element is found, find() return None.

  3. Using a if price_element is not None: Condition to attempt text-only access (price_element.text) and attributes (price_element.get('currency', 'USD')) Of element if it was actually found.

  4. Adding one else block to handle the case where Element is missing, printing “price not available”.

This approach prevents bugs from trying to access you .text or .get() on a None Object For the above code fragment, you will get:

Wireless Keyboard: USD 29.99
USB Mouse: Price not available

Here are some other fallacy strategies:

  • Always check if find() return None Before access .text or .get()

  • use .get('attr', 'default') Providing default values ​​for missing attributes

  • Consider wrapping parsing in try blocks for production code

  • Validate your data after parsing instead of validating the XML structure

The result

Now you know how to parse XML in Python without installing any external libraries. You learned:

  • How to read XML from strings and files

  • The difference between find()for , for , for , . findall()and iter()

  • How to safely extract text content and attributes

  • How to handle nested elements and missing data

xml.etree.ElementTree The module does enough for most XML parsing needs, and is always available in Python’s standard library.

For more advanced XML navigation and selection, you can explore xpath expressions. XPath works well for selecting nodes in an XML document and can be very useful for complex structures. We will cover this in another tutorial.

Until then, happy parsing!

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro