Using the json
type¶
⚠️ Note the following before using the
json
type
- Supported only for MySQL >= 8.0 when JSON_VALUE introduced.
- Equivalent Percona is fully-compatible.
- MariaDB is not supported since JSON_VALUE does not allow type specification like MySQL's.
- Not yet supported in DataJoint MATLAB
First you will need to install and connect to a DataJoint data pipeline.
Now let's start by importing the datajoint
client.
import datajoint as dj
Table Definition¶
For this exercise, let's imagine we work for an awesome company that is organizing a fun RC car race across various teams in the company. Let's see which team has the fastest car! 🏎️
This establishes 2 important entities: a Team
and a Car
. Normally we'd map this to their own dedicated table, however, let's assume that Team
is well-structured but Car
is less structured then we'd prefer. In other words, the structure for what makes up a car is varing too much between entries (perhaps because users of the pipeline haven't agreed yet on the definition? 🤷).
This would make it a good use-case to keep Team
as a table but make Car
actually a json
type defined within the Team
table.
Let's begin.
schema = dj.Schema(f"{dj.config['database.user']}_json")
[2023-02-12 00:14:33,027][INFO]: Connecting root@fakeservices.datajoint.io:3306 [2023-02-12 00:14:33,039][INFO]: Connected root@fakeservices.datajoint.io:3306
@schema
class Team(dj.Lookup):
definition = """
# A team within a company
name: varchar(40) # team name
---
car=null: json # A car belonging to a team (null to allow registering first but specifying car later)
unique index(car.length:decimal(4, 1)) # Add an index if this key is frequently accessed
"""
Insert¶
Let's suppose that engineering is first up to register their car.
Team.insert1(
{
"name": "engineering",
"car": {
"name": "Rever",
"length": 20.5,
"inspected": True,
"tire_pressure": [32, 31, 33, 34],
"headlights": [
{
"side": "left",
"hyper_white": None,
},
{
"side": "right",
"hyper_white": None,
},
],
},
}
)
Next, business and marketing teams are up and register their cars.
A few points to notice below:
- The person signing up on behalf of marketing does not know the specifics of the car during registration but another team member will be updating this soon before the race.
- Notice how the
business
andengineering
teams appear to specify the same property but refer to it assafety_inspected
andinspected
respectfully.
Team.insert(
[
{
"name": "marketing",
"car": None,
},
{
"name": "business",
"car": {
"name": "Chaching",
"length": 100,
"safety_inspected": False,
"tire_pressure": [34, 30, 27, 32],
"headlights": [
{
"side": "left",
"hyper_white": True,
},
{
"side": "right",
"hyper_white": True,
},
],
},
},
]
)
We can preview the table data much like normal but notice how the value of car
behaves like other BLOB-like attributes.
Team()
name team name | car A car belonging to a team (null to allow registering first but specifying car later) |
---|---|
marketing | =BLOB= |
engineering | =BLOB= |
business | =BLOB= |
Total: 3
Restriction¶
Now let's see what kinds of queries we can form to demostrate how we can query this pipeline.
# Which team has a `car` equal to 100 inches long?
Team & {'car.length': 100}
name team name | car A car belonging to a team (null to allow registering first but specifying car later) |
---|---|
business | =BLOB= |
Total: 1
# Which team has a `car` less than 50 inches long?
Team & "car->>'$.length' < 50"
name team name | car A car belonging to a team (null to allow registering first but specifying car later) |
---|---|
engineering | =BLOB= |
Total: 1
# Any team that has had their car inspected?
Team & [{'car.inspected:unsigned': True}, {'car.safety_inspected:unsigned': True}]
name team name | car A car belonging to a team (null to allow registering first but specifying car later) |
---|---|
engineering | =BLOB= |
Total: 1
# Which teams do not have hyper white lights for their first head light?
Team & {"car.headlights[0].hyper_white": None}
name team name | car A car belonging to a team (null to allow registering first but specifying car later) |
---|---|
engineering | =BLOB= |
marketing | =BLOB= |
Total: 2
Notice that the previous query will satisfy the None
check if it experiences any of the following scenarious:
- if entire record missing (
marketing
satisfies this) - JSON key is missing
- JSON value is set to JSON
null
(engineering
satisfies this)
Projection¶
Projections can be quite useful with the json
type since we can extract out just what we need. This allows greater query flexibility but more importantly, for us to be able to fetch only what is pertinent.
# Only interested in the car names and the length but let the type be inferred
q_untyped = Team.proj(
car_name='car.name',
car_length="car.length",
)
q_untyped
name team name | car_name calculated attribute | car_length calculated attribute |
---|---|---|
business | Chaching | 100 |
engineering | Rever | 20.5 |
marketing | None | None |
Total: 3
q_untyped.fetch(as_dict=True)
[{'name': 'business', 'car_name': 'Chaching', 'car_length': '100'}, {'name': 'engineering', 'car_name': 'Rever', 'car_length': '20.5'}, {'name': 'marketing', 'car_name': None, 'car_length': None}]
# Nevermind, I'll specify the type explicitly
q_typed = Team.proj(
car_name='car.name',
car_length="car.length:float",
)
q_typed
name team name | car_name calculated attribute | car_length calculated attribute |
---|---|---|
business | Chaching | 100.0 |
engineering | Rever | 20.5 |
marketing | None | None |
Total: 3
q_typed.fetch(as_dict=True)
[{'name': 'business', 'car_name': 'Chaching', 'car_length': 100.0}, {'name': 'engineering', 'car_name': 'Rever', 'car_length': 20.5}, {'name': 'marketing', 'car_name': None, 'car_length': None}]
Describe¶
Lastly, the .describe()
function on the Team
table can help us generate the table's definition. This is useful if we are connected directly to the pipeline without the original source.
rebuilt_definition = Team.describe()
print(rebuilt_definition)
# A team within a company name : varchar(40) # team name --- car=null : json # A car belonging to a team (null to allow registering first but specifying car later) UNIQUE INDEX ((json_value(`car`, _utf8mb4'$.length' returning decimal(4, 1))))
Cleanup¶
Finally, let's clean up what we created in this tutorial.
schema.drop()