Using json

[from https://www.json.org/json-en.html] JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

JSON is built on two structures:

json { "name": "Zoe", "salary": 56000, "married": true }

json ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]

Here is a JSON array of two objects:

[  
    {"name":"Ram", "email":"Ram@gmail.com"},  
    {"name":"Bob", "email":"bob32@gmail.com"}  
]

We will be using a JSON file of movie information that is organized a list of JSON objects (I got it from https://oracleofbacon.org/how.php). Each line consists of information for a movie, formated as a JSON object such as:

{"title":"Return of the Jedi","cast":["Mark Hamill","Harrison Ford","Carrie Fisher","Billy Dee Williams","Anthony Daniels","David Prowse","Kenny Baker","Peter Mayhew","Frank Oz","Ian McDiarmid","Alec Guinness"],"directors":["Richard Marquand"],"producers":["Howard Kazanjian"],"companies":["Lucasfilm Ltd.","20th Century Fox"],"year":1983}

Typically, a JSON file is read into a program and converted to a dictionary, i.e. an abstract data type that allows efficient storage and access to the key pairs. For instance, if we were to store the previous JSON object into a dictionary called D, we would be able to get the value of key "year" by using D["year"].

// ... after some magic instructions to read the JSON object from 
// ... the file into an object called D in your program

cout << D["year"];     // would print 1983
cout << D["title"];    // would print "Return of the Jedi"
cout << D["cast"][0];  // would print "Mark Hamill"

Download the file at: https://oracleofbacon.org/data.txt.bz2 and decompress it.

As fas as I know, the C++ STL does not provide functions to facilitate reading and manipulating JSON files. Therefore we will be using an external library called "JSON for Modern C++" (https://github.com/nlohmann/json). To install it, you must first install the following programs in your computer: git and cmake. If you are using windows, I strongly advise that you perform your software development for this course in a VM running some flavor of linux (I prefer Ubuntu).

Once you have git and cmake, here is how to install the "JSON to Modern C++" library.

git clone https://github.com/nlohmann/json
cd json
mkdir build
cd build
cmake ..
make

To test if the JSON library is working properly you can try compiling this program:

#include <nlohmann/json.hpp>
#include <iostream>
#include <fstream>
using nlohmann::json;
using namespace std;

// This program reads the json file and prints the names of all the movies.

int main(int argc, char **argv) {

  if (argc < 2) {
    cout << "Usage: " << argv[0] << " JSONFileName" << endl;
    exit(1);
  }

  std::ifstream f(argv[1]);;
  json jsonObj;  

  try{
    while (f >> jsonObj) {
        std::cout << jsonObj["title"] << std::endl;
    }
  }
  catch (...) {
    // just a catch all to skip the end of file error
  }
  return 0;
}

To compile it use:

g++ -o nameofexec nameofyourprogram.cpp -I[path of the json dir/include] -std=c++11

For example, if your program is named test.cpp and the json library is in /Users/rarce/code/json, then:

g++ -o test test.cpp -I/Users/rarce/code/json/include -std=c++11

When you run ./test data.txt you will see a long list of movie names (400K+).

"Actrius"
"Army of Darkness"
"The Birth of a Nation"
"Blade Runner"
....

Exercises:

  1. Modify the C++ program to the dates of the movies. You can compare the result of accessing a key by comparing against nullptr, for example: if (e["year"] != nullptr) cout << e["year"];

  2. Modify the C++ program to print the quantity of movies per year. Use a Direct Address Table to perform the counting. You can safely assume that the years of the movies is in the range [1800-2025] (yes, 2025).

  3. Create a C++ program to: read the movies title and year into a array of objects of a class such as this:

c++ class titleYear { string title; unsigned short year; };

Then, sort the array according to the year. You do not need to implement a sorting algorithm. Read this to learn how to use STL's sort algorithm on objects. Print the results.

  1. What is (are) the movies with the largest cast.

  2. Find what actor(s) is(are) cast in the most movies.

  3. What director has directed the most (unique) actors. "unique" means that eventhough Quentin Tarantino has worked many times with Uma Thurman, she only counts once.

  4. Find what actor(s) spent the longest time between two movies. For example, Carrie Fisher spent 5 years between 2009 "White Lightnin'" 2014 "Maps to the Stars" but she is not the actor who has spent the most years between movies.