Declaration Syntax¶
Creating Tables¶
Classes represent tables¶
To make it easy to work with tables in MATLAB and Python, DataJoint programs create a
separate class for each table.
Computer programmers refer to this concept as
object-relational mapping.
For example, the class experiment.Subject
in the DataJoint client language may
correspond to the table called subject
on the database server.
Users never need to see the database directly; they only interact with data in the
database by creating and interacting with DataJoint classes.
Data tiers¶
The table class must inherit from one of the following superclasses to indicate its
data tier: dj.Lookup
, dj.Manual
, dj.Imported
, dj.Computed
, or dj.Part
.
See tiers and master-part.
Defining a table¶
To define a DataJoint table in Python:
-
Define a class inheriting from the appropriate DataJoint class:
dj.Lookup
,dj.Manual
,dj.Imported
ordj.Computed
. -
Decorate the class with the schema object (see schema)
-
Define the class property
definition
to define the table heading.
For example, the following code defines the table Person
:
import datajoint as dj
schema = dj.Schema('alice_experiment')
@schema
class Person(dj.Manual):
definition = '''
username : varchar(20) # unique user name
---
first_name : varchar(30)
last_name : varchar(30)
'''
The @schema
decorator uses the class name and the data tier to check whether an
appropriate table exists on the database.
If a table does not already exist, the decorator creates one on the database using the
definition property.
The decorator attaches the information about the table to the class, and then returns
the class.
The class will become usable after you define the definition
property as described in
Table definition.
DataJoint classes in Python¶
DataJoint for Python is implemented through the use of classes providing access to the
actual tables stored on the database.
Since only a single table exists on the database for any class, interactions with all
instances of the class are equivalent.
As such, most methods can be called on the classes themselves rather than on an object,
for convenience.
Whether calling a DataJoint method on a class or on an instance, the result will only
depend on or apply to the corresponding table.
All of the basic functionality of DataJoint is built to operate on the classes
themselves, even when called on an instance.
For example, calling Person.insert(...)
(on the class) and Person.insert(...)
(on
an instance) both have the identical effect of inserting data into the table on the
database server.
DataJoint does not prevent a user from working with instances, but the workflow is
complete without the need for instantiation.
It is up to the user whether to implement additional functionality as class methods or
methods called on instances.
Valid class names¶
Note that in both MATLAB and Python, the class names must follow the CamelCase compound word notation:
- start with a capital letter and
- contain only alphanumerical characters (no underscores).
Examples of valid class names:
TwoPhotonScan
, Scan2P
, Ephys
, MembraneVoltage
Invalid class names:
Two_photon_Scan
, twoPhotonScan
, 2PhotonScan
, membranePotential
, membrane_potential
Table Definition¶
DataJoint models data as sets of entities with shared attributes, often visualized as tables with rows and columns. Each row represents a single entity and the values of all of its attributes. Each column represents a single attribute with a name and a datatype, applicable to entity in the table. Unlike rows in a spreadsheet, entities in DataJoint don't have names or numbers: they can only be identified by the values of their attributes. Defining a table means defining the names and datatypes of the attributes as well as the constraints to be applied to those attributes. Both MATLAB and Python use the same syntax define tables.
For example, the following code in defines the table User
, that contains users of the
database:
The table definition is contained in the definition
property of the class.
@schema
class User(dj.Manual):
definition = """
# database users
username : varchar(20) # unique user name
---
first_name : varchar(30)
last_name : varchar(30)
role : enum('admin', 'contributor', 'viewer')
"""
This defines the class User
that creates the table in the database and provides all
its data manipulation functionality.
Table creation on the database server¶
Users do not need to do anything special to have a table created in the database.
Tables are created at the time of class definition.
In fact, table creation on the database is one of the jobs performed by the decorator
@schema
of the class.
Changing the definition of an existing table¶
Once the table is created in the database, the definition string has no further effect. In other words, changing the definition string in the class of an existing table will not actually update the table definition. To change the table definition, one must first drop the existing table. This means that all the data will be lost, and the new definition will be applied to create the new empty table.
Therefore, in the initial phases of designing a DataJoint pipeline, it is common to experiment with variations of the design before populating it with substantial amounts of data.
It is possible to modify a table without dropping it. This topic is covered separately.
Reverse-engineering the table definition¶
DataJoint objects provide the describe
method, which displays the table definition
used to define the table when it was created in the database.
This definition may differ from the definition string of the class if the definition
string has been edited after creation of the table.
Examples
s = lab.User.describe()
Definition Syntax¶
The table definition consists of one or more lines. Each line can be one of the following:
- The optional first line starting with a
#
provides a description of the table's purpose. It may also be thought of as the table's long title. - A new attribute definition in any of the following forms (see
Attributes for valid datatypes):
name : datatype
name : datatype # comment
name = default : datatype
name = default : datatype # comment
- The divider
---
(at least three hyphens) separating primary key attributes above from secondary attributes below. - A foreign key in the format
-> ReferencedTable
. (See Dependencies.)
For example, the table for Persons may have the following definition:
# Persons in the lab
username : varchar(16) # username in the database
---
full_name : varchar(255)
start_date : date # date when joined the lab
This will define the table with attributes username
, full_name
, and start_date
,
in which username
is the primary key.
Attribute names¶
Attribute names must be in lowercase and must start with a letter. They can only contain alphanumerical characters and underscores. The attribute name cannot exceed 64 characters.
Valid attribute names
first_name
, two_photon_scan
, scan_2p
, two_photon_scan
Invalid attribute names
firstName
, first name
, 2photon_scan
, two-photon_scan
, TwoPhotonScan
Ideally, attribute names should be unique across all tables that are likely to be used
in queries together.
For example, tables often have attributes representing the start times of sessions,
recordings, etc.
Such attributes must be uniquely named in each table, such as session_start_time
or
recording_start_time
.
Default values¶
Secondary attributes can be given default values.
A default value will be used for an attribute if no other value is given at the time
the entity is inserted into the table.
Generally, default values are numerical values or character strings.
Default values for dates must be given as strings as well, contained within quotes
(with the exception of CURRENT_TIMESTAMP
).
Note that default values can only be used when inserting as a mapping.
Primary key attributes cannot have default values (with the exceptions of
auto_increment
and CURRENT_TIMESTAMP
attributes; see primary-key).
An attribute with a default value of NULL
is called a nullable attribute.
A nullable attribute can be thought of as applying to all entities in a table but
having an optional value that may be absent in some entities.
Nullable attributes should not be used to indicate that an attribute is inapplicable
to some entities in a table (see normalization).
Nullable attributes should be used sparingly to indicate optional rather than
inapplicable attributes that still apply to all entities in the table.
NULL
is a special literal value and does not need to be enclosed in quotes.
Here are some examples of attributes with default values:
failures = 0 : int
due_date = "2020-05-31" : date
additional_comments = NULL : varchar(256)