How to convert XML to JSON in Python

Learn how to convert XML to JSON in Python. This guide covers various methods, tips, real-world applications, and common error debugging.

Published on:

Wed

Mar 25, 2026

Updated on:

Thu

Mar 26, 2026

The Replit Team

ON THIS PAGE

Example H2

XML and JSON are common data formats, but JSON's simplicity often makes it preferable for modern applications. Python offers robust libraries to convert XML data into a more manageable JSON structure.

In this article, you'll explore several conversion techniques with practical examples. You'll also find implementation tips, real-world applications, and debugging advice to select the right approach for your specific needs.

Basic approach using `xmltodict`

import xmltodict import json xml_data = '<root><person><name>John</name><age>30</age></person></root>' dict_data = xmltodict.parse(xml_data) json_data = json.dumps(dict_data, indent=2) print(json_data)--OUTPUT--{ "root": { "person": { "name": "John", "age": "30" } } }

The xmltodict library offers a direct path for conversion by treating XML as a dictionary. The process is straightforward:

The xmltodict.parse() function ingests the XML string and translates its hierarchical structure into a native Python dictionary.
With the data now in a dictionary format, you can use the standard json.dumps() method to serialize it into a JSON string.

The indent parameter in json.dumps() is used here to format the output, making it easier to read and debug.

Different methods for XML to JSON conversion

Although xmltodict provides a simple route, you'll find that other methods give you more flexibility for tackling intricate XML files and their attributes.

Using `ElementTree` and the `json` module

import xml.etree.ElementTree as ET import json xml_string = '<root><person><name>John</name><age>30</age></person></root>' root = ET.fromstring(xml_string) result = {root.tag: {child.tag: {elem.tag: elem.text for elem in child} for child in root}} print(json.dumps(result, indent=2))--OUTPUT--{ "root": { "person": { "name": "John", "age": "30" } } }

For more fine-tuned control, you can use Python's built-in xml.etree.ElementTree module. First, ET.fromstring() parses the XML data into a tree structure. This gives you a collection of elements that you can loop through and manipulate directly, which is perfect for handling complex or non-standard XML files.

The provided example uses a dictionary comprehension to walk the XML tree.
It uses an element's .tag property for the new dictionary key.
It uses the element's .text content for the value.

Converting with `BeautifulSoup`

from bs4 import BeautifulSoup import json xml_string = '<root><person><name>John</name><age>30</age></person></root>' soup = BeautifulSoup(xml_string, 'xml') person = {'name': soup.find('name').text, 'age': soup.find('age').text} result = {'root': {'person': person}} print(json.dumps(result, indent=2))--OUTPUT--{ "root": { "person": { "name": "John", "age": "30" } } }

While it's famous for web scraping, BeautifulSoup is also a powerful tool for parsing XML. You create a "soup" object by passing your XML string and specifying the 'xml' parser. This gives you a flexible way to navigate your data structure.

The find() method lets you locate specific elements by their tag name.
Once you find an element, you can grab its content using the .text attribute.

This approach is more manual since you build the dictionary yourself, but it offers precise control over which elements you extract and how you structure the final JSON output.

Working with XML attributes

import xmltodict import json xml_with_attrs = '<root><person id="123"><name>John</name><age>30</age></person></root>' dict_data = xmltodict.parse(xml_with_attrs) json_data = json.dumps(dict_data, indent=2) print(json_data)--OUTPUT--{ "root": { "person": { "@id": "123", "name": "John", "age": "30" } } }

XML elements often include attributes, which provide metadata about the element. The xmltodict library has a built-in way to manage this during conversion. It preserves attributes by treating them as distinct key-value pairs within the JSON structure.

By default, it prefixes attribute names with an @ symbol. In the example, the id="123" attribute from the <person> tag is translated into the "@id": "123" pair, clearly separating it from nested elements like name and age.

Advanced XML to JSON techniques

When dealing with deeply nested data, XML namespaces, or large files that strain memory, you'll need to adopt more specialized conversion strategies than the basic methods allow.

Custom recursive conversion with `ElementTree`

import xml.etree.ElementTree as ET import json def elem_to_dict(elem): result = {elem.tag: {} if elem.attrib else None} children = list(elem) if children: child_dict = {} for child in children: child_dict.update(elem_to_dict(child)) result[elem.tag] = child_dict else: result[elem.tag] = elem.text return result xml_string = '<root><person><name>John</name><age>30</age></person></root>' root = ET.fromstring(xml_string) json_data = json.dumps(elem_to_dict(root), indent=2) print(json_data)--OUTPUT--{ "root": { "person": { "name": "John", "age": "30" } } }

For deeply nested XML, a custom recursive function gives you complete control over the conversion. This approach lets you define exactly how to walk the XML tree and build your dictionary structure from the ground up.

The elem_to_dict function processes each XML element one by one.
If an element contains other elements, the function calls itself for each child, diving deeper into the structure.
When it reaches an element with no children, it simply captures the text content and returns, unwinding the recursion.

Handling XML namespaces

import xmltodict import json ns_xml = '<root xmlns:h="http://www.example.com/h"><h:person><h:name>John</h:name></h:person></root>' dict_data = xmltodict.parse(ns_xml, process_namespaces=True) json_data = json.dumps(dict_data, indent=2) print(json_data)--OUTPUT--{ "root": { "@xmlns:h": "http://www.example.com/h", "http://www.example.com/h:person": { "http://www.example.com/h:name": "John" } } }

XML namespaces prevent naming conflicts when you're mixing data from different sources. The xmltodict library can handle these with a simple flag. By setting process_namespaces=True in the parse() function, you tell the library to expand the namespace prefixes into unique keys.

The full namespace URI gets prepended to the tag name, turning a tag like h:person into a distinct JSON key.
The namespace declaration itself, such as xmlns:h, is preserved as an attribute, ensuring no data is lost during conversion.

Streaming large XML files with `iterparse`

import xml.etree.ElementTree as ET import json import io xml_file = io.StringIO('<root><person><name>John</name></person><person><name>Jane</name></person></root>') names = [] for event, elem in ET.iterparse(xml_file): if elem.tag == 'name': names.append(elem.text) elem.clear() print(json.dumps({"names": names}, indent=2))--OUTPUT--{ "names": [ "John", "Jane" ] }

When you're working with massive XML files, loading the entire document into memory can be a recipe for disaster. The iterparse function from ElementTree provides a memory-efficient solution by parsing the file incrementally. It reads the XML piece by piece, letting you process elements as they're found.

The loop iterates over parsing events, and you can check the elem.tag to find the specific data you need.
After you've extracted the necessary information—like adding elem.text to a list—you call elem.clear().
This clear() method is vital because it discards the element, freeing up memory and preventing your application from crashing.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. It’s designed to empower anyone to turn their ideas into software, regardless of their technical background.

For the conversion techniques in this article, Replit Agent can turn them into production applications. Describe what you want to build, and it creates the app—complete with databases, APIs, and deployment.

Build a data migration tool that converts legacy XML product catalogs into JSON for a modern e-commerce platform.
Create an RSS feed aggregator that parses multiple XML feeds and displays the content in a unified, JSON-powered dashboard.
Deploy a utility that translates complex XML configuration files into a clean JSON format for new software systems.

Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.

Common errors and challenges

Converting XML to JSON can introduce tricky issues with data consistency, formatting errors, and mismatched data types.

Handling malformed XML with `try-except`

Your conversion script can easily break if it encounters malformed XML. A missing closing tag or an invalid character will raise a parsing error and halt execution. To prevent this, wrap your parsing logic in a try-except block. This allows you to catch exceptions, log the problematic data for later review, and continue processing other files without crashing your entire application.

Using `force_list` parameter for repeated elements

A common headache is handling elements that might appear once or multiple times. For example, an order with a single <item> tag converts differently than an order with several, which complicates your code. Libraries like xmltodict offer a force_list parameter to solve this. By specifying which tags should always be lists, you ensure a predictable structure every time.

Converting string values to appropriate data types

Since XML treats everything as a string, you'll need to post-process the converted dictionary to get the right data types. A number like 30 becomes the string "30", which you can't use for calculations until it's converted.

Convert numerical strings to integers or floats using functions like int() or float().
Check for strings like "true" or "false" and change them to proper boolean values.
Identify and transform date strings into a standardized format for consistency.

Handling malformed XML with `try-except`

A single formatting error in an XML file can halt your entire script. To build a more robust converter, you can catch these issues using a try-except block. This lets you handle errors gracefully without crashing. See how it works below.

import xmltodict import json xml_data = '<root><person><name>John</name><age>30</person></root>' dict_data = xmltodict.parse(xml_data) json_data = json.dumps(dict_data, indent=2) print(json_data)

This code works fine with perfect XML, but it's brittle. If the xml_data string were malformed—say, with a missing closing tag—the xmltodict.parse() function would crash the script. The next example shows how to prevent this.

import xmltodict import json xml_data = '<root><person><name>John</name><age>30</person></root>' try: dict_data = xmltodict.parse(xml_data) json_data = json.dumps(dict_data, indent=2) print(json_data) except Exception as e: print(f"Error parsing XML: {e}")

By wrapping the conversion logic in a try-except block, you can gracefully handle parsing errors. The code attempts to run xmltodict.parse(), and if the XML is malformed, the except block catches the exception. Instead of crashing, your program prints a helpful error message and can continue running. This approach is essential when you're processing XML from external sources or files you don't control, since you can't always guarantee their quality.

Using `force_list` parameter for repeated elements

Inconsistent data structures are a frequent issue. An XML element that appears once converts to a string or object, but multiple occurrences become a list. This forces you to write code that handles two different data types. See this in action below.

import xmltodict import json xml_data = '<root><person><hobby>Reading</hobby><hobby>Cycling</hobby></person></root>' dict_data = xmltodict.parse(xml_data) print(json.dumps(dict_data, indent=2))

Because the XML contains two <hobby> tags, xmltodict converts them into a list. A single tag would have produced a string, forcing you to write logic for both cases. The following code shows how to ensure a predictable structure.

import xmltodict import json xml_data = '<root><person><hobby>Reading</hobby><hobby>Cycling</hobby></person></root>' dict_data = xmltodict.parse(xml_data, force_list=('hobby',)) print(json.dumps(dict_data, indent=2))

By passing the force_list parameter to the xmltodict.parse() function, you guarantee that certain elements—like hobby in this case—will always be converted into a list. This creates a predictable JSON structure, even if only one such tag exists in the XML. You won't have to write extra code to handle both a single item and a list. This is especially useful when dealing with optional or repeating fields in your XML data.

Converting string values to appropriate data types

XML doesn't distinguish between data types, so everything becomes a string during conversion. This means numbers like 30 and booleans like true are treated as text, which can break your application's logic. The following code shows this in action.

import xmltodict import json xml_data = '<root><person><age>30</age><active>true</active></person></root>' dict_data = xmltodict.parse(xml_data) print(json.dumps(dict_data, indent=2))

The output shows both age and active as strings, which prevents you from using them in mathematical or logical operations. The following example shows how to implement a post-processing step to correct the data types.

import xmltodict import json xml_data = '<root><person><age>30</age><active>true</active></person></root>' dict_data = xmltodict.parse(xml_data) person = dict_data['root']['person'] person['age'] = int(person['age']) person['active'] = person['active'] == 'true' print(json.dumps(dict_data, indent=2))

After parsing, you'll need to manually correct the data types since XML treats everything as a string. The code targets specific dictionary values and applies Python's type conversion. For instance, it uses int() to change the age string to a number and a comparison like person['active'] == 'true' to create a true boolean value. This post-processing step is crucial whenever your application logic depends on numerical calculations or boolean checks, ensuring your data is usable.

Real-world applications

These conversion techniques are put to work in everyday applications, from parsing simple RSS feeds to handling complex, attribute-heavy weather data.

Extracting data from an RSS feed with `xmltodict`

Parsing common XML formats like RSS feeds is straightforward with xmltodict, allowing you to quickly extract key information like article titles and links.

import xmltodict import json rss_data = '''<rss><channel><item><title>First news</title> <link>https://example.com/1</link><pubDate>Jun 28, 2023</pubDate> </item></channel></rss>''' dict_data = xmltodict.parse(rss_data) news_item = dict_data['rss']['channel']['item'] print(json.dumps(news_item, indent=2))

This code snippet transforms an RSS feed using the xmltodict library. The xmltodict.parse() function converts the entire XML string into a nested Python dictionary, which mirrors the original structure.

After conversion, you can navigate the data using standard dictionary keys.
The example drills down to the item dictionary and then uses json.dumps() to serialize just that portion into a formatted JSON string.

This technique effectively isolates specific parts of an XML document for further processing.

Processing weather data with mixed content and attributes

Weather data often mixes text content with attributes, and xmltodict handles this structure without any extra configuration.

import xmltodict import json weather_xml = '''<weather location="New York"> <temperature unit="F">72</temperature> <condition>Partly Cloudy</condition> <forecast days="3"> <day>Sunny</day><day>Rainy</day><day>Cloudy</day> </forecast> </weather>''' weather_dict = xmltodict.parse(weather_xml) print(json.dumps(weather_dict, indent=2))

This example shows how xmltodict.parse() intelligently converts XML containing both elements and attributes. The function processes the weather_xml string, creating a Python dictionary that mirrors the XML's hierarchy.

Attributes like location and unit are preserved as key-value pairs, with keys prefixed by an @ symbol.
The text inside an element, like 72 in <temperature>, becomes the value.
Repeated elements, such as the multiple <day> tags, are automatically grouped into a list, ensuring a consistent data structure.

Get started with Replit

Put your new skills to use by building a real tool. Describe what you want to Replit Agent, like "a utility that converts an RSS feed to JSON" or "an app that parses XML product data."

It writes the code, tests for errors, and deploys your application, letting you focus on the idea. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free