5 Useful DIY Python Functions for JSON Parsing and Processing

by SkillAiNest

5 Useful DIY Python Functions for JSON Parsing and Processing5 Useful DIY Python Functions for JSON Parsing and Processing
Photo by author

# Introduction

Working with JSON Python is often difficult. Basic json.loads() Only you get so far.

API responses, configuration files, and data exports often contain JSON that is messy or poorly structured. You need to flatten nested objects, without extracting values ​​safely keyerror Exceptions, merge multiple JSON files, or convert between JSON and other formats. These tasks are constantly exposed in web scraping, API integration, and data processing. This article walks you through five practical functions to handle common JSON parsing and processing tasks.

You can find the code for these functions GitHub.

# 1. Safely extracting home values

JSON objects are often nested at several levels. Accessing deeply nested values ​​with bracket notation is increasingly challenging. If a key is missing, you will find one keyerror.

Here’s a function that lets you access nested values ​​using dot notation, with a fallback for missing keys:

def get_nested_value(data, path, default=None):
    """
    Safely extract nested values from JSON using dot notation.

    Args:
        data: Dictionary or JSON object
        path: Dot-separated string like "user.profile.email"
        default: Value to return if path doesn't exist

    Returns:
        The value at the path, or default if not found
    """
    keys = path.split('.')
    current = data

    for key in keys:
        if isinstance(current, dict):
            current = current.get(key)
            if current is None:
                return default
        elif isinstance(current, list):
            try:
                index = int(key)
                current = current(index)
            except (ValueError, IndexError):
                return default
        else:
            return default

    return current

Let’s test this with a complex nested structure:

# Sample JSON data
user_data = {
    "user": {
        "id": 123,
        "profile": {
            "name": "Allie",
            "email": "allie@example.com",
            "settings": {
                "theme": "dark",
                "notifications": True
            }
        },
        "posts": (
            {"id": 1, "title": "First Post"},
            {"id": 2, "title": "Second Post"}
        )
    }
}

# Extract values
email = get_nested_value(user_data, "user.profile.email")
theme = get_nested_value(user_data, "user.profile.settings.theme")
first_post = get_nested_value(user_data, "user.posts.0.title")
missing = get_nested_value(user_data, "user.profile.age", default=25)

print(f"Email: {email}")
print(f"Theme: {theme}")
print(f"First post: {first_post}")
print(f"Age (default): {missing}")

Output:

Email: allie@example.com
Theme: dark
First post: First Post
Age (default): 25

The function splits the path string at points and walks through the data structure one key at a time. At each level, it checks whether the current value is a dictionary or a list. For dictionaries, it uses .get(key)which returns None For missing keys instead of raising an error. For lists, it tries to convert the key to an integer index.

default The parameter provides a fallback when no part of the path exists. This prevents your code from crashing when dealing with incomplete or inconsistent JSON data from APIs.

This pattern is particularly useful when processing API responses where some fields are optional or only present under certain conditions.

# 2. Flattening nests JSON into single-level dictionaries

Machine learning models, CSV exports, and database inputs often require flat data structures. But API responses and configuration files use nested JSON. Converting nested objects to flat key-value pairs is a common task.

Here’s a function that flattens nested JSON with a custom separator:

def flatten_json(data, parent_key='', separator="_"):
    """
    Flatten nested JSON into a single-level dictionary.

    Args:
        data: Nested dictionary or JSON object
        parent_key: Prefix for keys (used in recursion)
        separator: String to join nested keys

    Returns:
        Flattened dictionary with concatenated keys
    """
    items = ()

    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{parent_key}{separator}{key}" if parent_key else key

            if isinstance(value, dict):
                # Recursively flatten nested dicts
                items.extend(flatten_json(value, new_key, separator).items())
            elif isinstance(value, list):
                # Flatten lists with indexed keys
                for i, item in enumerate(value):
                    list_key = f"{new_key}{separator}{i}"
                    if isinstance(item, (dict, list)):
                        items.extend(flatten_json(item, list_key, separator).items())
                    else:
                        items.append((list_key, item))
            else:
                items.append((new_key, value))
    else:
        items.append((parent_key, data))

    return dict(items)

Now flatten a complex nested structure:

# Complex nested JSON
product_data = {
    "product": {
        "id": 456,
        "name": "Laptop",
        "specs": {
            "cpu": "Intel i7",
            "ram": "16GB",
            "storage": {
                "type": "SSD",
                "capacity": "512GB"
            }
        },
        "reviews": (
            {"rating": 5, "comment": "Excellent"},
            {"rating": 4, "comment": "Good value"}
        )
    }
}

flattened = flatten_json(product_data)

for key, value in flattened.items():
    print(f"{key}: {value}")

Output:

product_id: 456
product_name: Laptop
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Excellent
product_reviews_1_rating: 4
product_reviews_1_comment: Good value

The function uses iteration to handle an arbitrary depth of nesting. When it encounters a dictionary, it processes each key-value pair, and generates a flattened key by generating the parent keys with a separator.

For lists, it uses the index as part of the key. This can help you preserve the order and composition of array elements in flat output. The pattern reviews_0_rating Tells you this is the rating from the first review.

separator The parameter lets you customize the output format. Use dots for dot notation, underscores for snake_case, or slashes for path-like keys depending on your needs.

This function is especially useful when you need to convert JSON API responses into data frames or CSV rows where each column needs a unique name.

# 3. Deep matching multiple JSON objects

Configuration management often requires merging multiple JSON files containing default settings, environment-specific configurations, user preferences, and more. A simple one dict.update() Only handles the top level. You need a deep merge that repeatedly links nested structures.

Here’s a function that merges deep JSON objects:

def deep_merge_json(base, override):
    """
    Deep merge two JSON objects, with override taking precedence.

    Args:
        base: Base dictionary
        override: Dictionary with values to override/add

    Returns:
        New dictionary with merged values
    """
    result = base.copy()

    for key, value in override.items():
        if key in result and isinstance(result(key), dict) and isinstance(value, dict):
            # Recursively merge nested dictionaries
            result(key) = deep_merge_json(result(key), value)
        else:
            # Override or add the value
            result(key) = value

    return result

Let’s try to integrate the sample configuration information:

import json

# Default configuration
default_config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "timeout": 30,
        "pool": {
            "min": 2,
            "max": 10
        }
    },
    "cache": {
        "enabled": True,
        "ttl": 300
    },
    "logging": {
        "level": "INFO"
    }
}

# Production overrides
prod_config = {
    "database": {
        "host": "prod-db.example.com",
        "pool": {
            "min": 5,
            "max": 50
        }
    },
    "cache": {
        "ttl": 600
    },
    "monitoring": {
        "enabled": True
    }
}

merged = deep_merge_json(default_config, prod_config)

print(json.dumps(merged, indent=2))

Output:

{
  "database": {
    "host": "prod-db.example.com",
    "port": 5432,
    "timeout": 30,
    "pool": {
      "min": 5,
      "max": 50
    }
  },
  "cache": {
    "enabled": true,
    "ttl": 600
  },
  "logging": {
    "level": "INFO"
  },
  "monitoring": {
    "enabled": true
  }
}

The function repeatedly merges native dictionaries. When both base and override contain dictionaries in the same key, it merges those dictionaries instead of replacing them entirely. It preserves values ​​that are not explicitly overridden.

Notice how database.port And database.timeout Stay with the default setting, while database.host becomes excessive. Pool settings are integrated at the household level min And max Both are updated.

The function also adds new keys that are not present in the base config, e.g monitoring Section in Production Override.

You can chain multiple integrations in a layer configuration:

final_config = deep_merge_json(
    deep_merge_json(default_config, prod_config),
    user_preferences
)

This pattern is common in application configurations where you have defaults, environment-specific settings, and runtime overrides.

# 4. Filtering JSON by schema or whitelist

APIs often return more data than you need. Large JSON responses make your code harder to read. Sometimes you only want specific fields, or you need to remove sensitive data before logging.

Here’s a function that filters the JSON to contain only certain fields:

def filter_json(data, schema):
    """
    Filter JSON to keep only fields specified in schema.

    Args:
        data: Dictionary or JSON object to filter
        schema: Dictionary defining which fields to keep
                Use True to keep a field, nested dict for nested filtering

    Returns:
        Filtered dictionary containing only specified fields
    """
    if not isinstance(data, dict) or not isinstance(schema, dict):
        return data

    result = {}

    for key, value in schema.items():
        if key not in data:
            continue

        if value is True:
            # Keep this field as-is
            result(key) = data(key)
        elif isinstance(value, dict):
            # Recursively filter nested object
            if isinstance(data(key), dict):
                filtered_nested = filter_json(data(key), value)
                if filtered_nested:
                    result(key) = filtered_nested
            elif isinstance(data(key), list):
                # Filter each item in the list
                filtered_list = ()
                for item in data(key):
                    if isinstance(item, dict):
                        filtered_item = filter_json(item, value)
                        if filtered_item:
                            filtered_list.append(filtered_item)
                    else:
                        filtered_list.append(item)
                if filtered_list:
                    result(key) = filtered_list

    return result

Let’s filter a sample API response:

import json
# Sample API response
api_response = {
    "user": {
        "id": 789,
        "username": "Cayla",
        "email": "cayla@example.com",
        "password_hash": "secret123",
        "profile": {
            "name": "Cayla Smith",
            "bio": "Software developer",
            "avatar_url": "
            "private_notes": "Internal notes"
        },
        "posts": (
            {
                "id": 1,
                "title": "Hello World",
                "content": "My first post",
                "views": 100,
                "internal_score": 0.85
            },
            {
                "id": 2,
                "title": "Python Tips",
                "content": "Some tips",
                "views": 250,
                "internal_score": 0.92
            }
        )
    },
    "metadata": {
        "request_id": "abc123",
        "server": "web-01"
    }
}

# Schema defining what to keep
public_schema = {
    "user": {
        "id": True,
        "username": True,
        "profile": {
            "name": True,
            "avatar_url": True
        },
        "posts": {
            "id": True,
            "title": True,
            "views": True
        }
    }
}

filtered = filter_json(api_response, public_schema)

print(json.dumps(filtered, indent=2))

Output:

{
  "user": {
    "id": 789,
    "username": "Cayla",
    "profile": {
      "name": "Cayla Smith",
      "avatar_url": "
    },
    "posts": (
      {
        "id": 1,
        "title": "Hello World",
        "views": 100
      },
      {
        "id": 2,
        "title": "Python Tips",
        "views": 250
      }
    )
  }
}

The schema acts as a whitelist. Setting a field True This is included in the output. Using a nested dictionary lets you filter nested objects. The function recursively applies the schema to the nested structure.

For arrays, the schema applies to each item. For example, the Posts array is filtered so each post only includes idfor , for , for , . titleand viewswhile content And internal_score have been excluded.

Consider how sensitive fields are like password_hash And private_notes Do not appear in the output. This makes the function useful for cleaning data before sending it to logging or front-end applications.

You can configure different schemas for different use cases, such as a minimal schema for list views, a detailed schema for single-item views, and an admin schema that includes everything.

# 5. Converting JSON to dot notation

Some systems use flat key-value stores, but you may want to work with nested JSON in your code. Switching between flat dot-note keys and home structures helps achieve this.

Here is a pair of functions for two-way conversion.

// Converting JSON to dot notation

def json_to_dot_notation(data, parent_key=''):
    """
    Convert nested JSON to flat dot-notation dictionary.

    Args:
        data: Nested dictionary
        parent_key: Prefix for keys (used in recursion)

    Returns:
        Flat dictionary with dot-notation keys
    """
    items = {}

    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{parent_key}.{key}" if parent_key else key

            if isinstance(value, dict):
                items.update(json_to_dot_notation(value, new_key))
            else:
                items(new_key) = value
    else:
        items(parent_key) = data

    return items

// Converting dot notation to JSON

def dot_notation_to_json(flat_data):
    """
    Convert flat dot-notation dictionary to nested JSON.

    Args:
        flat_data: Dictionary with dot-notation keys

    Returns:
        Nested dictionary
    """
    result = {}

    for key, value in flat_data.items():
        parts = key.split('.')
        current = result

        for i, part in enumerate(parts(:-1)):
            if part not in current:
                current(part) = {}
            current = current(part)

        current(parts(-1)) = value

    return result

Let’s examine the round-trip conversion:

import json
# Original nested JSON
config = {
    "app": {
        "name": "MyApp",
        "version": "1.0.0"
    },
    "database": {
        "host": "localhost",
        "credentials": {
            "username": "admin",
            "password": "secret"
        }
    },
    "features": {
        "analytics": True,
        "notifications": False
    }
}

# Convert to dot notation (for environment variables)
flat = json_to_dot_notation(config)
print("Flat format:")
for key, value in flat.items():
    print(f"  {key} = {value}")

print("\n" + "="*50 + "\n")

# Convert back to nested JSON
nested = dot_notation_to_json(flat)

print("Nested format:")
print(json.dumps(nested, indent=2))

Output:

Flat format:
  app.name = MyApp
  app.version = 1.0.0
  database.host = localhost
  database.credentials.username = admin
  database.credentials.password = secret
  features.analytics = True
  features.notifications = False

==================================================

Nested format:
{
  "app": {
    "name": "MyApp",
    "version": "1.0.0"
  },
  "database": {
    "host": "localhost",
    "credentials": {
      "username": "admin",
      "password": "secret"
    }
  },
  "features": {
    "analytics": true,
    "notifications": false
  }
}

json_to_dot_notation The function flattens the structure by iterating through nested dictionaries and joining keys with dots. Unlike the flaton function before it, it does not handle arrays. This is best for configuration data that is an absolute key value.

dot_notation_to_json A function changes an action. It divides each key into dots and constructs the nested structure by creating intermediate dictionaries as needed. The loop handles all but the last part, creating a nested level. Then it assigns a value to the final key.

This approach keeps your configuration readable and maintainable while working within the constraints of a flat key-value system.

# wrap up

JSON processing is beyond basic json.loads(). In most projects, you’ll need nesting structures, transformation formats, integration configuration, filter fields, and tools to convert between formats.

The techniques in this article are transferable to other data processing tasks. You can change these patterns for XML, YAML, or custom data formats.

Start with a safe access function to prevent clear exceptions in your code. Add others when you run into specific needs. Happy coding!

Bala Priya c is a developer and technical writer from India. She loves working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include devops, data science, and natural language processing. She enjoys reading, writing, coding and coffee! Currently, she is working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces and more. Bala also engages resource reviews and coding lessons.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro