Skip to content

TL;DR What is JSONSchema?

JSONSchema is just grammar for JSON that is also stored as JSON. It has implementations in basically every programming language so it is quite portable.

Next Steps

  • Come up with a more specific user case, and finish this tutorial

Source Code

TODO

Question Engine and JSONSchema

The grand vision of Dentropy Daemon is to make all data a person has ever generated accessible via a single API, then find interesting things to do with it. For this vision to become a reality will require a variety of data formats that will have to be convertible amongst one another. For example no messaging app stores messages the same way as another even though they all basically have the same content. Now imagine all these message formats being able to transform amongst one another. Just like there are many ways to skin a chicken there are many ways to parse the same raw data. Just like how a skinned chicken is a skinned chicken once data is available in a supported JSONSchema format it can fit into any supported ddaemon application.

Help I don't know how to Code

Goals of This Tutorial

  • Get some real world JSON from the web
  • Use NodeJS to Problematically to infer JSONSchema from JSON
    • Add JSON data that is valid with JSONSchema
    • Add JSON data that is NOT valid with JSONSchema
  • Read JSONSchema and write compatible JSON
  • Edit raw JSONSchema and write compatible JSON
  • Use JSONSchema with Python
  • Write your own JSONSchema from scratch

Results of This Tutorial

  • You will know how to use JSONSchema in your NodeJS and python projects

Requirements

Setup


git clone
cd JSONSchema-tutorial

Steps

Download JSON From web


cd JSONSchema-tutorial
mkdir JSON-data
cd JSON-data

curl -o pokedex.json https://raw.githubusercontent.com/fanzeyi/pokemon.json/master/pokedex.json

curl -o ev-data.json https://data.wa.gov/api/views/f6w7-q2d2/rows.json?accessType=DOWNLOAD

curl "https://en.wikipedia.org/w/api.php?origin=*&action=query&format=json&formatversion=2&redirects&prop=revisions&rvprop=content&titles=Albert+Einstein" | jq > wikipedia-Albert-Einstein.json


Install Requirements


git clone ......
npm init -y
npm instll jsonschema
npm install -g ajv # JSONSchema Validator
npm install -g ajv-cli # JSONSchema Validator CLI
npm install -g quicktype #JSONSchema Generator


pip install check-jsonschema

Playing with ev-data.json

Infer the JSONSchema

Get JSONSchema from ./JSON-data/ev-data.json


quicktype -l schema -o ev-data-schema.json ./JSON-data/ev-data.json

This JSONSchema produced too much gibberish, before we look into why let't test the schema that was generated.


ajv -s ev-data-schema.json -d ./JSON-data/ev-data.json

And we get.....

(base) ➜  JSONSchema-tutorial ajv -s ev-data-schema.json -d ./JSON-data/ev-data.json
schema ev-data-schema.json is invalid
error: strict mode: unknown keyword: "qt-uri-protocols"
(base) ➜  JSONSchema-tutorial 

Wow okay that failed let's examine that later, let's try another JSONSchema validator

(base) ➜  JSONSchema-tutorial check-jsonschema --schemafile ./ev-data-schema.json ./JSON-data/ev-data.json 
 ok -- validation done

Alright that took a long time but it did complete successfully.

Let's try another one,

// test.js
const fs = require('fs');
var Validator = require('jsonschema').Validator;
var v = new Validator();
let schema = JSON.parse(fs.readFileSync('./ev-data-schema.json'));
let instance = JSON.parse(fs.readFileSync('./JSON-data/ev-data.json'));
var res = v.validate(instance, schema);
console.log(res.valid) // true

Result:

(base) ➜  JSONSchema-tutorial node test.js
true

Nice, the JSONSchema validators did not all work now let's get in there and understand why.

There is so much complexity inside the JSONSchema because there is no regular pattern, for example the data data under keys .meta and .data are completely unique, let's try pulling out those pieces of data specifically.



cat ./JSON-data/ev-data.json | jq .meta > ./JSON-data/ev-data-meta.json

cat ./JSON-data/ev-data.json | jq .data > ./JSON-data/ev-data-data.json


Now let's generate the schemas from the subset of JSON.


quicktype -l schema -o ev-data-data-schema.json ./JSON-data/ev-data-data.json

quicktype -l schema -o ev-data-meta-schema.json ./JSON-data/ev-data-meta.json

When we take a look inside ev-data-data-schema.json we see a nice concise type description. When we take a look at ev-data-meta-schema.json we see a whole lot of gibberish, let's check out `./JSON-data/ev-data-meta.json` to understand why.

When you take a look inside ./JSON-data/ev-data-meta.json you will not see any regular patterns, except under the .view.columns key. The JSONSchema has to check for every unique key in the JSON which is why the JSONSchema,./ev-data-meta-schema.json, is so complex.

Let's now try and generate a JSONSchema for .columns


cat ./JSON-data/ev-data-meta.json | jq .view.columns > ./JSON-data/ev-data-meta-columns.json

quicktype -l schema -o ev-data-meta-columns-schema.json ./JSON-data/ev-data-meta-columns.json

Now we can take a look in ev-data-meta-columns-schema.json and see we have a relatively concise JSONSchema.

Now since only .data and .meta.views.columns contain regular patterns of information how can we create a JSONSchema that only checks for those JSON paths.

.data and .meta.views.columns are the regular data structures we want to validate. It is possible to write a JSONSchema that can validate the entire ev-data.json file but it will just be easier to jq our way to victor, take a look.

Run Commands:


cat ./JSON-data/ev-data.json | jq .data | ajv -s ./ev-data-data-schema.json
# Failed, dammit I can't pipe into ajv
ajv -s ./ev-data-data-schema.json -d ./JSON-data/ev-data-data.json

Result:

(base) ➜  JSONSchema-tutorial ajv -s ./ev-data-data-schema.json -d ./JSON-data/ev-data-data.json
./JSON-data/ev-data-data.json valid

Nice now let's try check-jsonschema

Run Commands:


cat ./JSON-data/ev-data.json | jq .data | ajv -s ./ev-data-data-schema.json
# Failed, dammit I can't pipe into ajv
ajv -s ./ev-data-data-schema.json -d ./JSON-data/ev-data-data.json


cat ./JSON-data/ev-data.json | jq .meta.views.columns | check-jsonschema --schemafile ./JSON-data/ev-data-data.json
# Failed, dammit I can't pipe into ajv
check-jsonschema --schemafile ./ev-data-data-schema.json ./JSON-data/ev-data-data.json

Result:

(base) ➜  JSONSchema-tutorial check-jsonschema --schemafile ./ev-data-data-schema.json ./JSON-data/ev-data-data.json
ok -- validation done

check-jsonschema took a long time but it was still successful. Now let's also try the jsonschema npm package.

Code:

// test2.js
const fs = require('fs');
var Validator = require('jsonschema').Validator;
var v = new Validator();
let schema = JSON.parse(fs.readFileSync('./ev-data-data-schema.json'));
let instance = JSON.parse(fs.readFileSync('./JSON-data/ev-data.json'));
var res = v.validate(instance.data, schema);
console.log(res.valid) // true

Run Commands:

node test2.js 

Result:

(base) ➜  JSONSchema-tutorial# node test2.js 
true

Playing with pokedex.json


quicktype -l schema -o pokedex-schema.json ./JSON-data/pokedex.json

You should now have pokedex-schema.json, it is only 119 lines long and get's strait to the point. Now let's validate it.

Run Command:


ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex.json

Result:

(base) ➜  JSONSchema-tutorial ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex.json
./JSON-data/pokedex.json valid

Nice that worked, now let's try the other validator,

Run Command:


check-jsonschema --schemafile ./pokedex-schema.json ./JSON-data/pokedex.json

Result:

(base) ➜  JSONSchema-tutorial check-jsonschema --schemafile ./pokedex-schema.json ./JSON-data/pokedex.json
ok -- validation done

Nice that was easy, now let's try the jsonschema npm package

Code:

// jsonschema.js
const fs = require('fs');
var Validator = require('jsonschema').Validator;
var v = new Validator();
let schema = JSON.parse(fs.readFileSync(process.argv[2]));
let instance = JSON.parse(fs.readFileSync(process.argv[3]));
var res = v.validate(instance, schema);
console.log(res.valid) // true

Run Commands:


node jsonschema.js ./pokedex-schema.json ./JSON-data/pokedex.json

Result:


(base) ➜  JSONSchema-tutorial node jsonschema.js ./pokedex-schema.json ./JSON-data/pokedex.json
true

Nice that worked.

Now let's try and invent out own Pokemon and test if they are compatible with the jsonschema.

Valid Pokemon Test


export new_valid_pokemon="$(cat ./JSON-data/valid-new-pokemon.json)"
echo $new_valid_pokemon

jq ". += [$new_valid_pokemon]" ./JSON-data/pokedex.json > ./JSON-data/pokedex-valid.json

ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex-valid.json

Invalid Pokemon Test:

export new_invalid_pokemon="$(cat ./JSON-data/invalid-new-pokemon.json)"
echo $new_invalid_pokemon

jq ". += [$new_invalid_pokemon]" ./JSON-data/pokedex.json > ./JSON-data/pokedex-invalid.json

ajv -s ./pokedex-schema.json -d ./JSON-data/pokedex-invalid.json

Logs