Read Schema Diagrams¶
DataJoint diagrams visualize schema structure as directed acyclic graphs (DAGs). This guide teaches you to:
- Interpret line styles and their semantic meaning
- Recognize dimensions (underlined vs non-underlined tables)
- Use diagram operations to explore large schemas
- Compare DataJoint notation to traditional ER diagrams
import datajoint as dj
schema = dj.Schema('howto_diagrams')
schema.drop(prompt=False)
schema = dj.Schema('howto_diagrams')
[2026-02-19 18:32:14] DataJoint 2.1.1 connected to postgres@postgres:5432
Quick Reference¶
| Line Style | Relationship | Child's Primary Key |
|---|---|---|
| Thick Solid ━━━ | Extension | Parent PK only (one-to-one) |
| Thin Solid ─── | Containment | Parent PK + own fields (one-to-many) |
| Dashed ┄┄┄ | Reference | Own independent PK (one-to-many) |
Key principle: Solid lines mean the parent's identity becomes part of the child's identity. Dashed lines mean the child maintains independent identity.
Thick Solid Line: Extension (One-to-One)¶
The foreign key is the entire primary key. The child extends the parent.
@schema
class Customer(dj.Manual):
definition = """
customer_id : int64
---
name : varchar(60)
"""
@schema
class CustomerPreferences(dj.Manual):
definition = """
-> Customer # FK is entire PK
---
theme : varchar(20)
notifications : bool
"""
dj.Diagram(Customer) + dj.Diagram(CustomerPreferences)
Equivalent ER Diagram:
DataJoint vs ER: The thick solid line immediately shows this is one-to-one. In ER notation, you must read the crow's foot symbols (||--o|).
Note: CustomerPreferences is not underlined — it exists in the Customer dimension space.
Thin Solid Line: Containment (One-to-Many)¶
The foreign key is part of the primary key, with additional fields.
@schema
class Account(dj.Manual):
definition = """
-> Customer # Part of PK
account_num : int32 # Additional PK field
---
balance : decimal(10,2)
"""
dj.Diagram(Customer) + dj.Diagram(Account)
Equivalent ER Diagram:
DataJoint vs ER: The thin solid line shows containment — accounts belong to customers. In ER, you see ||--o{ (one-to-many).
Note: Account is underlined — it introduces the Account dimension.
Dashed Line: Reference (One-to-Many)¶
The foreign key is a secondary attribute (below the --- line).
@schema
class Department(dj.Manual):
definition = """
dept_id : int32
---
dept_name : varchar(60)
"""
@schema
class Employee(dj.Manual):
definition = """
employee_id : int64 # Own independent PK
---
-> Department # Secondary attribute
employee_name : varchar(60)
"""
dj.Diagram(Department) + dj.Diagram(Employee)
Equivalent ER Diagram:
DataJoint vs ER: Both show one-to-many, but DataJoint's dashed line tells you immediately that Employee has independent identity. In ER, you must examine whether the FK is part of the PK.
Note: Both tables are underlined — each introduces its own dimension.
Dimensions and Underlined Names¶
A dimension is a new entity type introduced by a table that defines new primary key attributes. Each underlined table introduces exactly one dimension—even if it has multiple new PK attributes, together they identify one new entity type.
| Visual | Meaning |
|---|---|
| Underlined | Introduces a new dimension (new entity type) |
| Not underlined | Exists in the space defined by dimensions from referenced tables |
Key rules:
- Computed tables never introduce dimensions (always non-underlined)
- Part tables can introduce dimensions (may be underlined)
@schema
class Subject(dj.Manual):
definition = """
subject_id : varchar(16) # NEW dimension
---
species : varchar(50)
"""
@schema
class Session(dj.Manual):
definition = """
-> Subject # Inherits subject_id
session_idx : int32 # NEW dimension
---
session_date : date
"""
@schema
class SessionQC(dj.Computed):
definition = """
-> Session # Inherits both, adds nothing
---
passed : bool
"""
def make(self, key):
self.insert1({**key, 'passed': True})
dj.Diagram(schema)
In this diagram:
Subjectis underlined — introduces the Subject dimensionSessionis underlined — introduces the Session dimension (within each Subject)SessionQCis not underlined — exists in the Session dimension space, adds no new dimension
Why this matters: Dimensions determine attribute lineage. Primary key attributes trace back to the dimension where they originated, enabling semantic matching for safe joins.
Many-to-Many: Converging Lines¶
Many-to-many relationships appear as tables with multiple solid lines converging.
@schema
class Student(dj.Manual):
definition = """
student_id : int64
---
name : varchar(60)
"""
@schema
class Course(dj.Manual):
definition = """
course_code : char(8)
---
title : varchar(100)
"""
@schema
class Enrollment(dj.Manual):
definition = """
-> Student
-> Course
---
grade : enum('A','B','C','D','F')
"""
dj.Diagram(Student) + dj.Diagram(Course) + dj.Diagram(Enrollment)
Equivalent ER Diagram:
DataJoint vs ER: Both show the association table pattern. DataJoint's converging solid lines immediately indicate the composite primary key.
Note: Enrollment is not underlined — it exists in the space defined by Student × Course dimensions.
Orange Dots: Renamed Foreign Keys¶
When referencing the same table multiple times, use .proj() to rename. Orange dots indicate renamed FKs.
@schema
class Person(dj.Manual):
definition = """
person_id : int64
---
name : varchar(60)
"""
@schema
class Marriage(dj.Manual):
definition = """
marriage_id : int64
---
-> Person.proj(spouse1='person_id')
-> Person.proj(spouse2='person_id')
marriage_date : date
"""
dj.Diagram(Person) + dj.Diagram(Marriage)
The orange dots between Person and Marriage indicate that projections renamed the foreign key attributes (spouse1 and spouse2 both reference person_id).
Tip: In Jupyter, hover over orange dots to see the projection expression.
Diagram Operations¶
Filter and combine diagrams to explore large schemas:
# Entire schema
dj.Diagram(schema)
# Session and 1 level upstream (dependencies)
dj.Diagram(Session) - 1
# Subject and 2 levels downstream (dependents)
dj.Diagram(Subject) + 2
Operation Reference:
| Operation | Meaning |
|---|---|
dj.Diagram(schema) |
Entire schema |
dj.Diagram(Table) - N |
Table + N levels upstream |
dj.Diagram(Table) + N |
Table + N levels downstream |
D1 + D2 |
Union of two diagrams |
D1 * D2 |
Intersection (common nodes) |
D.prune() |
Remove tables with zero matching rows (New in 2.2) |
Finding paths: Use intersection to find connection paths:
(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)
Layout Direction¶
!!! version-added "New in 2.1" Configurable layout direction was added in DataJoint 2.1.
Control the flow direction of diagrams via configuration:
| Direction | Description |
|---|---|
"TB" |
Top to bottom (default) |
"LR" |
Left to right |
# Horizontal layout using config override
with dj.config.override(display__diagram_direction="LR"):
display(dj.Diagram(schema))
# Generate Mermaid syntax
print((dj.Diagram(Subject) + 2).make_mermaid())
flowchart LR
classDef manual fill:#90EE90,stroke:#006400
classDef lookup fill:#D3D3D3,stroke:#696969
classDef computed fill:#FFB6C1,stroke:#8B0000
classDef imported fill:#ADD8E6,stroke:#00008B
classDef part fill:#FFFFFF,stroke:#000000
classDef collapsed fill:#808080,stroke:#404040
subgraph __main__
Session[Session]:::manual
Subject[Subject]:::manual
SessionQC([SessionQC]):::computed
end
Session --> SessionQC
Subject --> Session
Copy this output into any Mermaid-compatible viewer (GitHub Markdown, MkDocs with mermaid plugin, https://mermaid.live) to render the diagram.
Saving to file:
dj.Diagram(schema).save("pipeline.mmd") # .mmd or .mermaid extension
Multi-Schema Pipelines¶
Real-world pipelines often span multiple schemas (modules).
!!! version-added "New in 2.1" Automatic schema grouping was added in DataJoint 2.1. Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label.
# Create a second schema for analysis
howto_analysis = dj.Schema('howto_analysis')
howto_analysis.drop(prompt=False)
howto_analysis = dj.Schema('howto_analysis')
# Reference tables from the first schema
@howto_analysis
class Experimenter(dj.Manual):
definition = """
experimenter : varchar(32)
---
email : varchar(100)
"""
@howto_analysis
class Experiment(dj.Manual):
definition = """
-> Subject # Cross-schema reference
-> Experimenter
experiment_date : date
---
notes : varchar(1000)
"""
@howto_analysis
class Analysis(dj.Computed):
definition = """
-> Experiment
---
result : float64
"""
def make(self, key):
self.insert1({**key, 'result': 0.0})
# Combine both schemas - tables are automatically grouped
multi_schema_diagram = dj.Diagram(schema) + dj.Diagram(howto_analysis)
multi_schema_diagram
Tables are grouped by their database schema automatically. The group label shows the Python module name when available (following the DataJoint convention of one module per schema).
Multi-schema diagrams are useful for:
- Visualizing pipelines spanning multiple schemas
- Understanding which tables belong to which module
- Documentation for multi-module architectures
Collapsing Schemas¶
!!! version-added "New in 2.1"
The collapse() method was added in DataJoint 2.1.
For high-level pipeline views, collapse entire schemas into single nodes using .collapse(). This is useful for showing relationships between modules without the detail of individual tables.
# Show schema1 expanded, schema2 collapsed into a single node
dj.Diagram(schema) + dj.Diagram(howto_analysis).collapse()
The collapsed node shows the module name and table count. Edges from the expanded schema connect to the collapsed node.
"Expanded wins" rule: If a table appears in both a collapsed and non-collapsed diagram, it stays expanded. This applies even when expanding a single table from a collapsed schema:
# "Expanded wins": Experimenter is expanded even though howto_analysis is collapsed
dj.Diagram(Subject) + dj.Diagram(Experimenter) + dj.Diagram(howto_analysis).collapse()
Preserving directionality: Collapsing middle layers of a pipeline preserves the DAG structure. The collapsed node sits between expanded tables, maintaining edge directions:
# Create a separate schema for this example
sandwich = dj.Schema('howto_sandwich')
sandwich.drop(prompt=False)
sandwich = dj.Schema('howto_sandwich')
# Linear pipeline: RawData -> Filtered -> Normalized -> FinalResult
@sandwich
class RawData(dj.Manual):
definition = """
data_id : int32
"""
@sandwich
class Filtered(dj.Computed):
definition = """
-> RawData
---
filtered_value : float32
"""
def make(self, key): pass
@sandwich
class Normalized(dj.Computed):
definition = """
-> Filtered
---
normalized_value : float32
"""
def make(self, key): pass
@sandwich
class FinalResult(dj.Computed):
definition = """
-> Normalized
---
result : float32
"""
def make(self, key): pass
# Sandwich collapse: expand top and bottom, collapse middle processing steps
dj.Diagram(RawData) + dj.Diagram(FinalResult) + dj.Diagram(sandwich).collapse()
Extended Example: Multi-Module Pipeline¶
Here's a realistic example with three modules that have cross-schema dependencies:
demo_modules/acquisition.py - Core data acquisition:
@schema
class Lab(dj.Manual):
definition = """lab : varchar(32) ..."""
@schema
class Subject(dj.Manual):
definition = """subject_id : varchar(16) --- -> Lab ..."""
@schema
class Session(dj.Manual):
definition = """-> Subject session_date : date ..."""
demo_modules/processing.py - Data processing (references acquisition):
@schema
class ProcessingParams(dj.Lookup):
definition = """params_id : int16 ..."""
@schema
class ProcessedSession(dj.Computed):
definition = """-> acquisition.Session -> ProcessingParams ..."""
@schema
class EventDetection(dj.Computed):
definition = """-> ProcessedSession event_id : int32 ..."""
demo_modules/analysis.py - Analysis (references both modules):
@schema
class AnalysisParams(dj.Lookup):
definition = """analysis_id : int16 ..."""
@schema
class SubjectAnalysis(dj.Computed):
definition = """-> acquisition.Subject -> AnalysisParams ..."""
@schema
class CrossSessionAnalysis(dj.Computed):
definition = """-> acquisition.Subject -> processing.ProcessingParams ..."""
# Import the demo modules
from demo_modules import acquisition, processing, analysis
# Activate schemas (creates tables on first run)
acquisition.schema.activate('demo_acquisition')
processing.schema.activate('demo_processing')
analysis.schema.activate('demo_analysis')
# Drop and recreate for clean state
for s in [analysis.schema, processing.schema, acquisition.schema]:
s.drop(prompt=False)
acquisition.schema.activate('demo_acquisition')
processing.schema.activate('demo_processing')
analysis.schema.activate('demo_analysis')
# Full pipeline diagram - all modules expanded
# Note: dj.Diagram(module) works when the module has a `schema` attribute
full_pipeline = dj.Diagram(acquisition) + dj.Diagram(processing) + dj.Diagram(analysis)
full_pipeline
The full diagram shows all three modules with cross-schema references:
acquisitionprovides core tables (Lab,Subject,Session)processingreferencesSessionfrom acquisitionanalysisreferencesSubjectfrom acquisition ANDProcessingParamsfrom processing
Now let's see collapse in action:
# Two schemas collapsed: acquisition expanded, downstream modules collapsed
# This shows acquisition's internal structure while abstracting processing & analysis
dj.Diagram(acquisition) + dj.Diagram(processing).collapse() + dj.Diagram(analysis).collapse()
# "Expanded wins": Subject stays expanded even though analysis references it
dj.Diagram(acquisition.Subject) + dj.Diagram(analysis).collapse()
# Schema-level DAG: all modules collapsed
# Shows the dependency structure between modules at a glance
dj.Diagram(acquisition).collapse() + dj.Diagram(processing).collapse() + dj.Diagram(analysis).collapse()
Key observations:
- Collapsed nodes show table count — e.g., "processing (3 tables)"
- Cross-schema edges preserved — Dependencies between modules are shown as edges between collapsed nodes
- "Expanded wins" — If you explicitly include a table (like
Subject), it stays expanded even if a collapsed schema references it - Schema-level DAG — Collapsing all schemas reveals the high-level module dependency graph, useful for understanding pipeline architecture
DataJoint vs Traditional ER Notation¶
| Feature | Chen's ER | Crow's Foot | DataJoint |
|---|---|---|---|
| Cardinality | Numbers | Line symbols | Line style |
| Direction | None | None | Top-to-bottom |
| Cycles | Allowed | Allowed | Not allowed |
| PK cascade | Not shown | Not shown | Solid lines |
| Identity sharing | Not indicated | Not indicated | Thick solid |
| New dimensions | Not indicated | Not indicated | Underlined |
Why DataJoint differs:
- DAG structure — No cycles means schemas read as workflows (top-to-bottom)
- Line semantics — Immediately reveals relationship type
- Executable — Diagram is generated from schema, cannot drift out of sync
Summary¶
| Visual | Meaning |
|---|---|
| Thick solid | One-to-one extension |
| Thin solid | One-to-many containment |
| Dashed | Reference (independent identity) |
| Underlined | Introduces new dimension |
| Orange dots | Renamed FK via .proj() |
| Colors | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |
| Grouped boxes | Tables grouped by schema/module |
| 3D box (gray) | Collapsed schema (New in 2.1) |
| Feature | Method |
|---|---|
| Layout direction | dj.config.display.diagram_direction |
| Mermaid output | .make_mermaid() |
| Collapse schema | .collapse() (New in 2.1) |
| Prune empty tables | .prune() (New in 2.2) |
Related¶
# Cleanup all schemas
# Demo modules
for s in [analysis.schema, processing.schema, acquisition.schema]:
if s.is_activated():
s.drop(prompt=False)
# Earlier examples
sandwich.drop(prompt=False)
howto_analysis.drop(prompt=False)
schema.drop(prompt=False)