Python

 

 

 

 

JSON

 

JSON, short for JavaScript Object Notation, is like the universal language for data on the web. It's used all over the place because it's easy to read, write and understand by both humans and machines.

 

There are two commonly modules (packages) used for json parsing : json and ijson.

 

Python's json module is the go-to tool for working with JSON data. If your JSON is a string, you can use json.loads() to get it into a Python object that you can work with - like a dictionary or a list. And if you're going the other way, turning Python objects back into a JSON string, you can use json.dumps(). Super useful stuff!

 

But what happens when your JSON data is bigger than the Titanic? That's where ijson steps in.

 

ijson is like the marathon runner of JSON modules. Instead of trying to gulp down the whole thing in one go, ijson takes it step by step. This approach is called iterative parsing, and it's great for dealing with huge JSON files because you don't need to load all of the data into memory at once.

 

The ijson.items() function is a workhorse. Give it a file-like object and a prefix, and it'll give you back an iterator that chucks out JSON objects matching that prefix. The prefix is just a string of property names with dots in between, showing the path to the goodies you're after.

 

And if you need even more control, ijson.parse() returns an iterator that'll give you parsing events from a JSON document. You'll see stuff like 'start_map', 'end_map', 'start_array', 'end_array', and other events carrying the associated data.

 

So, json or ijson? It's all about the size of your data. If you've got a petite JSON dataset that won't give your memory a hard time, json is a neat and simple choice. But if your data is more like Godzilla, or you're receiving it as a stream and can't wait for all of it to arrive, ijson has got your back!

 

 

 

Test JSON file

 

This is a json file that I will use as an example to test the python example that I will show you later.

JsonTest_01.txt

{

  "employees": [

    {

      "id": "001",

      "firstName": "John",

      "lastName": "Doe",

      "email": "john.doe@example.com",

      "phoneNumbers": [

        {

          "type": "home",

          "number": "123-456-7890"

        },

        {

          "type": "work",

          "number": "555-555-1212"

        }

      ],

      "address": {

        "street": "123 Main Street",

        "city": "Anytown",

        "state": "CA",

        "zip": "12345",

        "country": "USA"

      },

      "skills": [

        {

          "name": "Programming",

          "level": "Expert"

        },

        {

          "name": "Project Management",

          "level": "Intermediate"

        }

      ]

    },

    {

      "id": "002",

      "firstName": "Jane",

      "lastName": "Smith",

      "email": "jane.smith@example.com",

      "phoneNumbers": [

        {

          "type": "home",

          "number": "987-654-3210"

        },

        {

          "type": "work",

          "number": "555-555-1212"

        }

      ],

      "address": {

        "street": "456 High Street",

        "city": "Anytown",

        "state": "CA",

        "zip": "12345",

        "country": "USA"

      },

      "skills": [

        {

          "name": "Design",

          "level": "Expert"

        },

        {

          "name": "Copywriting",

          "level": "Intermediate"

        }

      ]

    }

  ]

}

 

 

 

JSON Parsing with json module

 

This is an example Python code to parse and print it or extract a specific items from the json object. In this example, I a going to use json module only.

 

Json_Test_01.py

import json

 

 

# this is to read a json file and return the whole contents of the file. This is just two line codes but would need a

# long explanation to give you the full details.

# The with statement is used here to handle the opening and closing of the file. This is good practice because it

# makes sure the file gets closed properly even if something goes wrong while reading it.

 

# Then, the json.load function is called with file as its argument. This function reads the whole JSON file and turns

# it into a Python object.

 

# If the JSON file contains a JSON object (i.e., what you'd write in JavaScript as {...}), json.load returns a

# dictionary. If the JSON file contains a JSON array (i.e., [...] in JavaScript), it returns a list. If the JSON file just

# contains a single JSON value (like "hello" or 42), then it'll return a string or an integer, or whatever the appropriate

# Python equivalent is.

#

# In case of the test json file shown in previous section. This is what happens when json.load() is executed.

# In this case, the outermost element of the JSON data is a JSON object (i.e., {...}). Therefore, json.load() will

# return a Python dictionary when you use it to load this JSON file.

#

# Breaking further down, it goes as follows:

 #

# The returned dictionary has a single key: "employees". The value associated with this key is a JSON array, which

# is translated into a Python list.

# Each element of this list is a JSON object, corresponding to individual employees. These objects are translated into

# dictionaries in Python.

# Each of these dictionaries has several keys (like "id", "firstName", "lastName", and so on). The values for these

# keys are either simple data types (strings in this case), JSON arrays (for "phoneNumbers" and "skills"), or JSON

# objects (for "address").

# These inner JSON arrays and objects are also translated into Python lists and dictionaries, respectively.

 

def read_json_file(file_name):

    with open(file_name, 'r') as file:

        return json.load(file)

 

 

# This function is to print the contents of a JSON structure in a readable, nicely-formatted way. It only takes one

# argument named json_content:

#

#        json_content is a Python object that you want to print. This object should be a valid JSON structure.

#              In other words, it could be a Python dictionary, list, string, number, None, etc.

#

# Now let's dive into the function body:

#

# The function uses the json.dumps() method from Python's json module. The dumps() function is short for "dump

# string," and its job is to convert a Python object into a JSON string.

 #

# The first argument it takes is the Python object you want to convert. In this case, it's the json_content that you

#  passed to the json_print() function.

# The second argument, indent=4, is an optional parameter that tells dumps() to format the output string in a pretty

#  way. Specifically, it adds newlines after each JSON object (i.e., {}) or array ([]), and it indents the contents of

#  those objects or arrays by 4 spaces. This makes the JSON structure much easier to read, especially if it's nested!

# Once json.dumps() has created this pretty string, the function simply prints it to the console with the print()

# function. And that's all there is to it!

#

def json_print(json_content):

    print(json.dumps(json_content, indent=4))

 

 

# This  is to get access a specific item from a JSON structure and print it in a pretty way. In short, it is to extract

# a specific items of the given JSON data.

#

# It takes two arguments:

#

#     json_content is a Python object representing the JSON structure you want to work with. This can be a Python

#           dictionary, list, string, number, None, etc. anything that's a valid JSON structure.

 #

#     json_item_path is a string describing the path to the item within the JSON structure you're interested in. It

#           uses dot notation to represent the hierarchy of keys in the structure. For example, in a JSON structure

#           representing a person, the path "address.city" would correspond to the city part of the address.

#

# Now, let's break down the function body:

 #

# First, it splits json_item_path into a list of its components with json_item_path.split('.'). This is done because the

# path could represent several levels of nesting, e.g., "address.city" would be split into ['address', 'city'].

 #

# It then initializes item to point to the whole JSON content.

 #

# Next, it enters a loop that iterates over each part of the item path. For each part:

 #

# If the part is a string of digits (which is checked with part.isdigit()), it's assumed to be an index into a list.

# So, the function converts it to an integer with int(part) and uses that to index into item.

# If the part is not a string of digits, it's assumed to be a key into a dictionary. So, the function uses it directly to

# index into item.

# In either case, item is updated to point to the next level of the JSON structure as specified by the current part

# of the item path.

# Once all parts of the item path have been processed, item should be pointing to the desired item in the JSON

# structure.

 #

# Finally, the function pretty-prints the item path and the item itself with print()

#

def json_print_item(json_content, json_item_path):

    item_path_parts = json_item_path.split('.')

    item = json_content

    for part in item_path_parts:

        if part.isdigit():

            item = item[int(part)]

        else:

            item = item[part]   

    print(json_item_path,' = ', json.dumps(item, indent=4))

 

 

 

# This  is to do exactly same thing as def json_print_item() does, but in different way (doing reversive instead of

# for loop). I would not describe in this about this. you may skip this part unless you are really interested in.

#

def json_print_item_recursive(json_content, json_item_path, full_path=''):

    item_path_parts = json_item_path.split('.')

    part = item_path_parts[0]

 

    new_full_path = full_path + ('.' + part if full_path else part)

 

    if len(item_path_parts) == 1:

        if part.isdigit():

            item = json_content[int(part)]

        else:

            item = json_content[part]

        print(new_full_path, ' = ', json.dumps(item, indent=4))

    else:

        new_path = '.'.join(item_path_parts[1:])

        if part.isdigit():

            json_print_item_recursive(json_content[int(part)], new_path, new_full_path)

        else:

            json_print_item_recursive(json_content[part], new_path, new_full_path)

 

 

# This is the test code for the functions defined above

 

json_content = read_json_file('JsonTest_01.txt')

 

json_print(json_content)

json_print_item(json_content, 'employees.1.firstName')  

json_print_item(json_content, 'employees.0.phoneNumbers.0.type')

json_print_item(json_content, 'employees.0.phoneNumbers.0.number')

json_print_item_recursive(json_content, 'employees.1.firstName')  

json_print_item_recursive(json_content, 'employees.0.phoneNumbers.0.type')

json_print_item_recursive(json_content, 'employees.0.phoneNumbers.0.number')

 

Result

{

    "employees": [

        {

            "id": "001",

            "firstName": "John",

            "lastName": "Doe",

            "email": "john.doe@example.com",

            "phoneNumbers": [

                {

                    "type": "home",

                    "number": "123-456-7890"

                },

                {

                    "type": "work",

                    "number": "555-555-1212"

                }

            ],

            "address": {

                "street": "123 Main Street",

                "city": "Anytown",

                "state": "CA",

                "zip": "12345",

                "country": "USA"

            },

            "skills": [

                {

                    "name": "Programming",

                    "level": "Expert"

                },

                {

                    "name": "Project Management",

                    "level": "Intermediate"

                }

            ]

        },

        {

            "id": "002",

            "firstName": "Jane",

            "lastName": "Smith",

            "email": "jane.smith@example.com",

            "phoneNumbers": [

                {

                    "type": "home",

                    "number": "987-654-3210"

                },

                {

                    "type": "work",

                    "number": "555-555-1212"

                }

            ],

            "address": {

                "street": "456 High Street",

                "city": "Anytown",

                "state": "CA",

                "zip": "12345",

                "country": "USA"

            },

            "skills": [

                {

                    "name": "Design",

                    "level": "Expert"

                },

                {

                    "name": "Copywriting",

                    "level": "Intermediate"

                }

            ]

        }

    ]

}

employees.1.firstName  =  "Jane"

employees.0.phoneNumbers.0.type  =  "home"

employees.0.phoneNumbers.0.number  =  "123-456-7890"

employees.1.firstName  =  "Jane"

employees.0.phoneNumbers.0.type  =  "home"

employees.0.phoneNumbers.0.number  =  "123-456-7890"

 

 

 

 

 

ex.py

 

 

Result