How to convert XML to JSON in Python
Learn how to convert XML to JSON in Python. This guide covers various methods, tips, real-world applications, and common error debugging.

XML and JSON are common data formats, but JSON's simplicity often makes it preferable for modern applications. Python offers robust libraries to convert XML data into a more manageable JSON structure.
In this article, you'll explore several conversion techniques with practical examples. You'll also find implementation tips, real-world applications, and debugging advice to select the right approach for your specific needs.
Basic approach using xmltodict
import xmltodict
import json
xml_data = '<root><person><name>John</name><age>30</age></person></root>'
dict_data = xmltodict.parse(xml_data)
json_data = json.dumps(dict_data, indent=2)
print(json_data)--OUTPUT--{
"root": {
"person": {
"name": "John",
"age": "30"
}
}
}
The xmltodict library offers a direct path for conversion by treating XML as a dictionary. The process is straightforward:
- The
xmltodict.parse()function ingests the XML string and translates its hierarchical structure into a native Python dictionary. - With the data now in a dictionary format, you can use the standard
json.dumps()method to serialize it into a JSON string.
The indent parameter in json.dumps() is used here to format the output, making it easier to read and debug.
Different methods for XML to JSON conversion
Although xmltodict provides a simple route, you'll find that other methods give you more flexibility for tackling intricate XML files and their attributes.
Using ElementTree and the json module
import xml.etree.ElementTree as ET
import json
xml_string = '<root><person><name>John</name><age>30</age></person></root>'
root = ET.fromstring(xml_string)
result = {root.tag: {child.tag: {elem.tag: elem.text for elem in child} for child in root}}
print(json.dumps(result, indent=2))--OUTPUT--{
"root": {
"person": {
"name": "John",
"age": "30"
}
}
}
For more fine-tuned control, you can use Python's built-in xml.etree.ElementTree module. First, ET.fromstring() parses the XML data into a tree structure. This gives you a collection of elements that you can loop through and manipulate directly, which is perfect for handling complex or non-standard XML files.
- The provided example uses a dictionary comprehension to walk the XML tree.
- It uses an element's
.tagproperty for the new dictionary key. - It uses the element's
.textcontent for the value.
Converting with BeautifulSoup
from bs4 import BeautifulSoup
import json
xml_string = '<root><person><name>John</name><age>30</age></person></root>'
soup = BeautifulSoup(xml_string, 'xml')
person = {'name': soup.find('name').text, 'age': soup.find('age').text}
result = {'root': {'person': person}}
print(json.dumps(result, indent=2))--OUTPUT--{
"root": {
"person": {
"name": "John",
"age": "30"
}
}
}
While it's famous for web scraping, BeautifulSoup is also a powerful tool for parsing XML. You create a "soup" object by passing your XML string and specifying the 'xml' parser. This gives you a flexible way to navigate your data structure.
- The
find()method lets you locate specific elements by their tag name. - Once you find an element, you can grab its content using the
.textattribute.
This approach is more manual since you build the dictionary yourself, but it offers precise control over which elements you extract and how you structure the final JSON output.
Working with XML attributes
import xmltodict
import json
xml_with_attrs = '<root><person id="123"><name>John</name><age>30</age></person></root>'
dict_data = xmltodict.parse(xml_with_attrs)
json_data = json.dumps(dict_data, indent=2)
print(json_data)--OUTPUT--{
"root": {
"person": {
"@id": "123",
"name": "John",
"age": "30"
}
}
}
XML elements often include attributes, which provide metadata about the element. The xmltodict library has a built-in way to manage this during conversion. It preserves attributes by treating them as distinct key-value pairs within the JSON structure.
- By default, it prefixes attribute names with an
@symbol. In the example, theid="123"attribute from the<person>tag is translated into the"@id": "123"pair, clearly separating it from nested elements likenameandage.
Advanced XML to JSON techniques
When dealing with deeply nested data, XML namespaces, or large files that strain memory, you'll need to adopt more specialized conversion strategies than the basic methods allow.
Custom recursive conversion with ElementTree
import xml.etree.ElementTree as ET
import json
def elem_to_dict(elem):
result = {elem.tag: {} if elem.attrib else None}
children = list(elem)
if children:
child_dict = {}
for child in children:
child_dict.update(elem_to_dict(child))
result[elem.tag] = child_dict
else:
result[elem.tag] = elem.text
return result
xml_string = '<root><person><name>John</name><age>30</age></person></root>'
root = ET.fromstring(xml_string)
json_data = json.dumps(elem_to_dict(root), indent=2)
print(json_data)--OUTPUT--{
"root": {
"person": {
"name": "John",
"age": "30"
}
}
}
For deeply nested XML, a custom recursive function gives you complete control over the conversion. This approach lets you define exactly how to walk the XML tree and build your dictionary structure from the ground up.
- The
elem_to_dictfunction processes each XML element one by one. - If an element contains other elements, the function calls itself for each child, diving deeper into the structure.
- When it reaches an element with no children, it simply captures the text content and returns, unwinding the recursion.
Handling XML namespaces
import xmltodict
import json
ns_xml = '<root xmlns:h="http://www.example.com/h"><h:person><h:name>John</h:name></h:person></root>'
dict_data = xmltodict.parse(ns_xml, process_namespaces=True)
json_data = json.dumps(dict_data, indent=2)
print(json_data)--OUTPUT--{
"root": {
"@xmlns:h": "http://www.example.com/h",
"http://www.example.com/h:person": {
"http://www.example.com/h:name": "John"
}
}
}
XML namespaces prevent naming conflicts when you're mixing data from different sources. The xmltodict library can handle these with a simple flag. By setting process_namespaces=True in the parse() function, you tell the library to expand the namespace prefixes into unique keys.
- The full namespace URI gets prepended to the tag name, turning a tag like
h:personinto a distinct JSON key. - The namespace declaration itself, such as
xmlns:h, is preserved as an attribute, ensuring no data is lost during conversion.
Streaming large XML files with iterparse
import xml.etree.ElementTree as ET
import json
import io
xml_file = io.StringIO('<root><person><name>John</name></person><person><name>Jane</name></person></root>')
names = []
for event, elem in ET.iterparse(xml_file):
if elem.tag == 'name':
names.append(elem.text)
elem.clear()
print(json.dumps({"names": names}, indent=2))--OUTPUT--{
"names": [
"John",
"Jane"
]
}
When you're working with massive XML files, loading the entire document into memory can be a recipe for disaster. The iterparse function from ElementTree provides a memory-efficient solution by parsing the file incrementally. It reads the XML piece by piece, letting you process elements as they're found.
- The loop iterates over parsing events, and you can check the
elem.tagto find the specific data you need. - After you've extracted the necessary information—like adding
elem.textto a list—you callelem.clear(). - This
clear()method is vital because it discards the element, freeing up memory and preventing your application from crashing.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. It’s designed to empower anyone to turn their ideas into software, regardless of their technical background.
For the conversion techniques in this article, Replit Agent can turn them into production applications. Describe what you want to build, and it creates the app—complete with databases, APIs, and deployment.
- Build a data migration tool that converts legacy XML product catalogs into JSON for a modern e-commerce platform.
- Create an RSS feed aggregator that parses multiple XML feeds and displays the content in a unified, JSON-powered dashboard.
- Deploy a utility that translates complex XML configuration files into a clean JSON format for new software systems.
Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.
Common errors and challenges
Converting XML to JSON can introduce tricky issues with data consistency, formatting errors, and mismatched data types.
Handling malformed XML with try-except
Your conversion script can easily break if it encounters malformed XML. A missing closing tag or an invalid character will raise a parsing error and halt execution. To prevent this, wrap your parsing logic in a try-except block. This allows you to catch exceptions, log the problematic data for later review, and continue processing other files without crashing your entire application.
Using force_list parameter for repeated elements
A common headache is handling elements that might appear once or multiple times. For example, an order with a single <item> tag converts differently than an order with several, which complicates your code. Libraries like xmltodict offer a force_list parameter to solve this. By specifying which tags should always be lists, you ensure a predictable structure every time.
Converting string values to appropriate data types
Since XML treats everything as a string, you'll need to post-process the converted dictionary to get the right data types. A number like 30 becomes the string "30", which you can't use for calculations until it's converted.
- Convert numerical strings to integers or floats using functions like
int()orfloat(). - Check for strings like
"true"or"false"and change them to proper boolean values. - Identify and transform date strings into a standardized format for consistency.
Handling malformed XML with try-except
A single formatting error in an XML file can halt your entire script. To build a more robust converter, you can catch these issues using a try-except block. This lets you handle errors gracefully without crashing. See how it works below.
import xmltodict
import json
xml_data = '<root><person><name>John</name><age>30</person></root>'
dict_data = xmltodict.parse(xml_data)
json_data = json.dumps(dict_data, indent=2)
print(json_data)
This code works fine with perfect XML, but it's brittle. If the xml_data string were malformed—say, with a missing closing tag—the xmltodict.parse() function would crash the script. The next example shows how to prevent this.
import xmltodict
import json
xml_data = '<root><person><name>John</name><age>30</person></root>'
try:
dict_data = xmltodict.parse(xml_data)
json_data = json.dumps(dict_data, indent=2)
print(json_data)
except Exception as e:
print(f"Error parsing XML: {e}")
By wrapping the conversion logic in a try-except block, you can gracefully handle parsing errors. The code attempts to run xmltodict.parse(), and if the XML is malformed, the except block catches the exception. Instead of crashing, your program prints a helpful error message and can continue running. This approach is essential when you're processing XML from external sources or files you don't control, since you can't always guarantee their quality.
Using force_list parameter for repeated elements
Inconsistent data structures are a frequent issue. An XML element that appears once converts to a string or object, but multiple occurrences become a list. This forces you to write code that handles two different data types. See this in action below.
import xmltodict
import json
xml_data = '<root><person><hobby>Reading</hobby><hobby>Cycling</hobby></person></root>'
dict_data = xmltodict.parse(xml_data)
print(json.dumps(dict_data, indent=2))
Because the XML contains two <hobby> tags, xmltodict converts them into a list. A single tag would have produced a string, forcing you to write logic for both cases. The following code shows how to ensure a predictable structure.
import xmltodict
import json
xml_data = '<root><person><hobby>Reading</hobby><hobby>Cycling</hobby></person></root>'
dict_data = xmltodict.parse(xml_data, force_list=('hobby',))
print(json.dumps(dict_data, indent=2))
By passing the force_list parameter to the xmltodict.parse() function, you guarantee that certain elements—like hobby in this case—will always be converted into a list. This creates a predictable JSON structure, even if only one such tag exists in the XML. You won't have to write extra code to handle both a single item and a list. This is especially useful when dealing with optional or repeating fields in your XML data.
Converting string values to appropriate data types
XML doesn't distinguish between data types, so everything becomes a string during conversion. This means numbers like 30 and booleans like true are treated as text, which can break your application's logic. The following code shows this in action.
import xmltodict
import json
xml_data = '<root><person><age>30</age><active>true</active></person></root>'
dict_data = xmltodict.parse(xml_data)
print(json.dumps(dict_data, indent=2))
The output shows both age and active as strings, which prevents you from using them in mathematical or logical operations. The following example shows how to implement a post-processing step to correct the data types.
import xmltodict
import json
xml_data = '<root><person><age>30</age><active>true</active></person></root>'
dict_data = xmltodict.parse(xml_data)
person = dict_data['root']['person']
person['age'] = int(person['age'])
person['active'] = person['active'] == 'true'
print(json.dumps(dict_data, indent=2))
After parsing, you'll need to manually correct the data types since XML treats everything as a string. The code targets specific dictionary values and applies Python's type conversion. For instance, it uses int() to change the age string to a number and a comparison like person['active'] == 'true' to create a true boolean value. This post-processing step is crucial whenever your application logic depends on numerical calculations or boolean checks, ensuring your data is usable.
Real-world applications
These conversion techniques are put to work in everyday applications, from parsing simple RSS feeds to handling complex, attribute-heavy weather data.
Extracting data from an RSS feed with xmltodict
Parsing common XML formats like RSS feeds is straightforward with xmltodict, allowing you to quickly extract key information like article titles and links.
import xmltodict
import json
rss_data = '''<rss><channel><item><title>First news</title>
<link>https://example.com/1</link><pubDate>Jun 28, 2023</pubDate>
</item></channel></rss>'''
dict_data = xmltodict.parse(rss_data)
news_item = dict_data['rss']['channel']['item']
print(json.dumps(news_item, indent=2))
This code snippet transforms an RSS feed using the xmltodict library. The xmltodict.parse() function converts the entire XML string into a nested Python dictionary, which mirrors the original structure.
- After conversion, you can navigate the data using standard dictionary keys.
- The example drills down to the
itemdictionary and then usesjson.dumps()to serialize just that portion into a formatted JSON string.
This technique effectively isolates specific parts of an XML document for further processing.
Processing weather data with mixed content and attributes
Weather data often mixes text content with attributes, and xmltodict handles this structure without any extra configuration.
import xmltodict
import json
weather_xml = '''<weather location="New York">
<temperature unit="F">72</temperature>
<condition>Partly Cloudy</condition>
<forecast days="3">
<day>Sunny</day><day>Rainy</day><day>Cloudy</day>
</forecast>
</weather>'''
weather_dict = xmltodict.parse(weather_xml)
print(json.dumps(weather_dict, indent=2))
This example shows how xmltodict.parse() intelligently converts XML containing both elements and attributes. The function processes the weather_xml string, creating a Python dictionary that mirrors the XML's hierarchy.
- Attributes like
locationandunitare preserved as key-value pairs, with keys prefixed by an@symbol. - The text inside an element, like
72in<temperature>, becomes the value. - Repeated elements, such as the multiple
<day>tags, are automatically grouped into a list, ensuring a consistent data structure.
Get started with Replit
Put your new skills to use by building a real tool. Describe what you want to Replit Agent, like "a utility that converts an RSS feed to JSON" or "an app that parses XML product data."
It writes the code, tests for errors, and deploys your application, letting you focus on the idea. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)
.png)
.png)