How to change JSON data to meet any scheme

by SkillAiNest

Whether you are transferring data between APIs or just producing JSON data for import, matching schemes can break your workflow. Learning JSON data to clear and normalize how to make a smooth, error -free data transfer.

This tutorial shows that how to clear the dirt JSON and export the results to a new file, based on a predetermined scheme. The JSON file we will clean has a data of 200 artificial customer records.

In this tutorial, we will apply two ways to clear input data:

  • With pure tiger

  • With pandas

You can apply any of them in your code. But pandas The method for large, complex data sets is better. Let’s jump straight into this process.

Here’s what we will cover is:

Provisions

You should have a basic understanding of this tutorial as well:

  • Dictionaries, lists and loop

  • JSON data structure (keys, values, and nests)

  • How to Read and Write JSON Files with Uzar json Module

Add and inspect the JSON file

Before starting writing any code, make sure that .Jusion The file you want to clean is in your project directory. This makes it easier to load into your script using just the file name.

Now you can inspect the file locally by looking locally or loading it into your script. json Module

Here (to assume the file name “OLD_CUSTOMERS.JSON”::

Code to view or print raw JSON file content in the terminal

This shows you whether the JSON file is formed as a dictionary or a list. It also prints the entire file in your terminal. Mine is a dictionary that maps 200 users’ entries. Looking closely at its structure and scheme, you should always open a raw JSON file in your IDE.

Explain the target scheme

If someone asks to clear the JSON data, this may mean that The current scheme It is not suitable for its desired purpose. At this point, you want to make it clear that the final JSON should look like export.

JSON Scheme is mainly a blueprint that describes:

  • Wanted fields

  • Names of the field

  • Data type for each field

  • Standard formats (for example, small emails, trimmed White Space etc.)

Here the old scheme looks like a target scheme:

The screenshot of the old JSON scheme will be replaced

Expected JSON Scheme

As you can see, the goal is to delete ”customer_id” And ”address” Change fields and rest names in each entry:

  • ”name” to ”full_name”

  • ”email” to ”email_address”

  • ”phone” to ”mobile”

  • ”membership_level” to ”tier”

Output should consist of 4 response fields instead of 6, all are named to meet the project requirements.

How to clear JSON data with purezer

Let’s discover the built -in json Module to align raw data with default schemes.

Step 1: Imported json And time Modules

Export json It is important because we are working with JSON files. But we will use time To find out how long the data cleaning process takes place.

import json
import time

Step 2: Load with the file json.load()

start_time = time.time()
with open('old_customers.json') as file:
    crm_data = json.load(file)

Step 3: Write a function to loop and clean each customer entry in the dictionary

def clean_data(records):
    transformed_records = ()
    for customer in records("customers"):
        transformed_records.append({
                "full_name": customer("name"),
                "email_address": customer("email"),
                "mobile": customer("phone"),
                "tier": customer("membership_level"),

                })
    return {"customers": transformed_records}

new_data = clean_data(crm_data)

clean_data() Takes into the original data (Temporarily) Safe in the record variable, changing it to meet our target scheme.

Since the JSON file we have a packed dictionary in which A ”customers” The key, which maps the customer entry list, we access this key and reach the loop through each entry contained in the list.

In the loop for loop, we rename the relevant fields and store clean entries in a new list called ”transformed_records”.

Then, we loot with the dictionary ”customers” The key is intact.

Step 4: Save the output in a .json file

Decide about your cleaned JSON data name and assign it to one output_file Variable, like:

output_file = "transformed_data.json"
with open(output_file, "w") as f:
    json.dump(new_data, f, indent=4)

You can also add print() The statement below this block to confirm that the file has been saved in your project directory.

Step 5: Time data cleaning process

At the beginning of this process, we imported the time module to measure how long it takes to clear the JSON data using purezer. LOW WE to track the run time, we saved the current time in a start_time Variable before cleaning, and now we will add a end_time Variable at the end of the script.

The difference between end_time And start_time Values ​​provide you with a total run time in seconds.

end_time = time.time()
elapsed_time = end_time - start_time

print(f"Transformed data saved to {output_file}")
print(f"Processing data took {elapsed_time:.2f} seconds")

Here is how long it took to the data cleaning process with a purezer approach:

The script appears in the run -time terminal

How to clear JSON data with pandas

Now we are going to try to get the results as soon as the above, said and used to use a third -party library. pandas. Pandas is an open source library used for manipulation and analysis of data.

To start, you need to install Pandas Library in your directory. In your terminal, drive:

pip install pandas

Then follow these steps:

Step 1: Import relevant libraries

import json
import time
import pandas as pd

Step 2: Load file and extract customer entries

Unlike the purezer procedure, where we easily configured the keynote ”customers” To access the customer data list, to work pandas Need a slightly different approach.

We must remove the list before loading the data frame because pandas Expects structural data. The list of customer dictionaries ensures that we only separate and clean the relevant records, preventing errors due to nest or unrelated JSON data.

start_time = time.time()
with open('old_customers.json', 'r') as f:
    crm_data = json.load(f)


clients = crm_data.get("customers", ())

Step 3: Load customer entries into the data frame

Once you get a clean list of customer dictionaries, load the list in the data frame and assign its list to variables, such as:


df = pd.DataFrame(clients)

It produces a structure like a tabler or a spreadsheet, where each row represents a user. Loading the list in the data frame allows you to access too pandas‘Powerful ways of cleaning data such as:

  • drop_duplicate(): Data frame removes duplicate rows or entries

  • dropna(): Drops in rows with any lost or banned figures

  • fillna(value): All lost or banned data changes with a specified value

  • drop(columns): Unused columns fall clearly

Step 4: Write a custom function to rename the relevant departments

At this point, we need a function that enters the same customer – a row – and returns a clean version that according to the target scheme (“full_name”For, for, for,. “email_address”For, for, for,. “mobile” And “tier”,

The function should also handle the lost data by configuring the default values “Unknown” Or “N/a” When a field is absent.

Ps: First, I used drop(columns) To clearly remove “address” And “customer_id” Fields. But in this case it is not needed, as transform_fields() The function only selects and nominates the desired fields. Any additional columns are automatically excluded from clean data.

Step 5: Apply Skima Change in All Rows

We will use pandasapply() How to apply our customs function in each row in the data frame. It will create a series (for example, 0 {…}, 1 → {……, 2 → {…}), which is not JSON friendly.

As if json.dump() Expects a list, not Pandas series, we will apply tolist()Converting the series into a list of dictionaries.


transformed_df = df.apply(transform_fields, axis=1)


transformed_data = transformed_df.tolist()

Another way to reach it is with the understanding of the list. To use instead of apply() Of course, you can write:

transformed_data = (transform_fields(row) for row in df.to_dict(orient="records"))

orient=”records” There is an argument for df.to_dict Which tells Pandas to transform the data frame into a dictionary list, where each dictionary represents a single customer record (ie a row).

Again For a loop Calling the customs function on each row, repetitions through every customer record in the list. Finally, the list of the list ((…)) Clean rows collect in a new list.

Step 6: Save the output in a .json file


output_data = {"customers": transformed_data}
output_file = "applypandas_customer.json"
with open(output_file, "w") as f:
    json.dump(output_data, f, indent=4)

I suggest you choose a different file name for you pandas Output you can inspect both files as well as to find out if this output is similar to the cleaning results.

Step 7: Track the run time

Once again, check the difference between the start and the end time to determine the program execution time.

end_time = time.time()
elapsed_time = end_time - start_time


print(f"Transformed data saved to {output_file}")
print(f"Processing data took {elapsed_time:.2f} seconds")

When I used to use List of understanding To apply the customs function, my script was the run time 0.03 secondsBut with pandasapply() Function, fell yesterday’s run time 0.01 seconds.

Final Output Preview:

If you follow this tutorial closely, your JSON output should look like this – whether you used pandas Method or purely auspicious view:

Expected JSON output after the scheme change

How to correct the cleaned JSON

Verification of your output ensures that cleaned data follows the expected structure before being used or combined. The move helps to catch formatting errors, lost fields and types of wrong data quickly.

Below are steps to verify your cleaned JSON file:

Step 1: Install and import jsonschema

jsonschema There is a third -party verification library for Azigar. It helps you explain the anticipated structure of your JSON data and automatically checks whether your output is similar to this structure.

In your terminal, drive:

pip install jsonschema

Import the desired libraries:

import json
from jsonschema import validate, ValidationError

validate() Check whether your JSON data is similar to the rules described in your scheme. If the data is correct, nothing happens. But if there is an error – such as a lost field or incorrect data type – it increases ValidationError.

Step 2: Explain a scheme

As you know, JSON scheme changes with each file structure. If your JSON data is different from whom we are now working, learn how to make schemes Here. Otherwise, the scheme below describes the structure we expect we expect for our clean JSON:

schema = {
    "type": "object",
    "properties": {
        "customers": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "full_name": {"type": "string"},
                    "email_address": {"type": "string"},
                    "mobile": {"type": "string"},
                    "tier": {"type": "string"}
                },
                "required": ("full_name", "email_address", "mobile", "tier")
            }
        }
    },
    "required": ("customers")
}
  • Data is a item that should be a key: "customers".

  • "customers" Should be a Heavy (A list), representing a customer entry with each item.

  • Each user’s enrollment must have four fields.

    • "full_name"

    • "email_address"

    • "mobile"

    • "tier"

  • "required" Fields make sure that none of the relevant fields are missing in any customer record.

Step 3: Load the cleared JSON file

with open("transformed_data.json") as f:
    data = json.load(f)

Step 4: Correct the data

This step ls Wee, we will use A try. . . except Block the process safely to finish, and if the code increases, display a helpful message ValidationError.

try:
    validate(instance=data, schema=schema)
    print("JSON is valid.")
except ValidationError as e:
    print("JSON is invalid:", e.message)

Pandas vs Purezer for Data Cleaning

From this tutorial, you may be able to say that cleaning JSON and using purezer for reorganization is a more straightforward approach. It is fast and ideal to handle small datases or simple changes.

But as the data increases and becomes more complicated, you may need modern data cleaning methods that are not just providing. In such cases, pandas Becomes a better choice. It effectively handles large, complex datases, which provide built -in in hand -handling to handle the lost data and remove duplicate.

You can study Pandas chat sheet Knowing more methods of manipulation in the data.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro