Indexes¶
Table indexes are data structures that allow fast lookups by an indexed attribute or combination of attributes.
In DataJoint, indexes are created by one of the three mechanisms:
- Primary key
- Foreign key
- Explicitly defined indexes
The first two mechanisms are obligatory. Every table has a primary key, which serves as an unique index. Therefore, restrictions by a primary key are very fast. Foreign keys create additional indexes unless a suitable index already exists.
Indexes for single primary key tables¶
Let’s say a mouse in the lab has a lab-specific ID but it also has a separate id issued by the animal facility.
@schema
class Mouse(dj.Manual):
definition = """
mouse_id : int # lab-specific ID
---
tag_id : int # animal facility ID
"""
In this case, searching for a mouse by mouse_id is much faster than by tag_id
because mouse_id is a primary key, and is therefore indexed.
To make searches faster on fields other than the primary key or a foreign key, you can add a secondary index explicitly.
Regular indexes are declared as index(attr1, ..., attrN) on a separate line anywhere in
the table declaration (below the primary key divide).
Indexes can be declared with unique constraint as unique index (attr1, ..., attrN).
Let’s redeclare the table with a unique index on tag_id.
@schema
class Mouse(dj.Manual):
definition = """
mouse_id : int # lab-specific ID
---
tag_id : int # animal facility ID
unique index (tag_id)
"""
mouse_id and tag_id are similarly fast.
Indexes for tables with multiple primary keys¶
Let’s now imagine that rats in a lab are identified by the combination of lab_name and
rat_id in a table Rat.
@schema
class Rat(dj.Manual):
definition = """
lab_name : char(16)
rat_id : int unsigned # lab-specific ID
---
date_of_birth = null : date
"""
rat_id is in the index, searches by rat_id alone are not
helped by the index because it is not first in the index. This is similar to searching for
a word in a dictionary that orders words alphabetically. Searching by the first letters
of a word is easy but searching by the last few letters of a word requires scanning the
whole dictionary.
In this table, the primary key is a unique index on the combination (lab_name, rat_id).
Therefore searches on these attributes or on lab_name alone are fast. But this index
cannot help searches on rat_id alone. Similarly, searing by date_of_birth requires a
full-table scan and is inefficient.
To speed up searches by the rat_id and date_of_birth, we can explicit indexes to
Rat:
@schema
class Rat2(dj.Manual):
definition = """
lab_name : char(16)
rat_id : int unsigned # lab-specific ID
---
date_of_birth = null : date
index(rat_id)
index(date_of_birth)
"""