Diagram Specification¶
Schema visualization as directed acyclic graphs.
Overview¶
dj.Diagram visualizes DataJoint schemas as directed graphs showing tables and their foreign key relationships. It provides multiple output formats including SVG, PNG, and Mermaid syntax.
Design Principles¶
- Multiple output formats: Graphviz (SVG/PNG) and Mermaid for different use cases
- Graph algebra: Combine and filter diagrams with set operators
- Visual encoding: Table tiers distinguished by shape and color
- Flexible layout: Configurable direction and schema grouping
API Reference¶
Constructor¶
dj.Diagram(source, context=None)
| Parameter | Type | Default | Description |
|---|---|---|---|
source |
Table, Schema, module | — | Source to visualize |
context |
dict | None | Namespace for class name resolution |
Layout Direction¶
New in 2.1
Configurable layout direction was added in DataJoint 2.1.
Layout direction is controlled via configuration:
# Check current direction
dj.config.display.diagram_direction # "TB" or "LR"
# Set globally
dj.config.display.diagram_direction = "LR"
# Override temporarily
with dj.config.override(display__diagram_direction="LR"):
dj.Diagram(schema).draw()
| Value | Description |
|---|---|
"TB" |
Top to bottom (default) |
"LR" |
Left to right |
Class Method¶
dj.Diagram.from_sequence(sequence)
Create a combined diagram from multiple sources. Equivalent to Diagram(a) + Diagram(b) + ....
Operators¶
Diagrams support set algebra for combining and filtering:
| Operator | Description | Example |
|---|---|---|
diag + n |
Expand n levels downstream (children) | dj.Diagram(Mouse) + 2 |
diag - n |
Expand n levels upstream (parents) | dj.Diagram(Neuron) - 2 |
diag1 + diag2 |
Union of two diagrams | dj.Diagram(Mouse) + dj.Diagram(Session) |
diag1 - diag2 |
Difference (remove nodes) | dj.Diagram(schema) - dj.Diagram(Lookup) |
diag1 * diag2 |
Intersection | dj.Diagram(schema1) * dj.Diagram(schema2) |
Common Patterns¶
# Show table with immediate parents and children
dj.Diagram(MyTable) + 1 - 1
# Show entire schema
dj.Diagram(schema)
# Show all tables downstream of a source
dj.Diagram(SourceTable) + 10
# Show ancestry of a computed table
dj.Diagram(ComputedTable) - 10
Note: Order matters. diagram + 1 - 1 may differ from diagram - 1 + 1.
Collapsing Schemas¶
New in 2.1
The collapse() method was added in DataJoint 2.1.
diag.collapse()
Mark a diagram for collapsing when combined with other diagrams. Collapsed schemas appear as single nodes showing the table count.
# Show schema1 expanded, schema2 as a single collapsed node
dj.Diagram(schema1) + dj.Diagram(schema2).collapse()
"Expanded wins" rule: If a node appears in both a collapsed and non-collapsed diagram, it stays expanded. This allows you to show specific tables from a schema while collapsing the rest.
# Subject is expanded, rest of analysis schema is collapsed
dj.Diagram(Subject) + dj.Diagram(analysis).collapse()
Operational Methods¶
New in 2.2
Operational methods (Diagram.cascade(), restrict, counts, prune) were added in DataJoint 2.2.
Diagrams can propagate restrictions through the dependency graph and inspect affected data using the graph structure. These methods turn Diagram from a visualization tool into a graph computation and inspection component. All mutation operations (delete, drop) are executed by Table.delete() and Table.drop(), which use Diagram internally.
Diagram.cascade() (class method)¶
dj.Diagram.cascade(table_expr, part_integrity="enforce")
Create a cascade diagram for delete. Builds a complete dependency graph from the table expression, includes all descendants across all loaded schemas, propagates the restriction downstream using OR semantics — a descendant row is marked for deletion if any ancestor path reaches it — and trims to the cascade subgraph.
| Parameter | Type | Default | Description |
|---|---|---|---|
table_expr |
QueryExpression | — | A restricted table expression (e.g., Session & 'subject_id=1') |
part_integrity |
str | "enforce" |
Master-part integrity policy |
Returns: New Diagram containing only the seed table and its descendants, with cascade restrictions applied.
part_integrity values:
| Value | Behavior |
|---|---|
"enforce" |
Error if parts would be deleted before masters |
"ignore" |
Allow deleting parts without masters |
"cascade" |
Propagate restriction upward from part to master, then re-propagate downstream to all sibling parts |
With "cascade", the restriction flows upward from a part table to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction propagates back downstream through the normal cascade — deleting the entire compositional unit (master + all parts), not just the originally matched part rows.
# Preview cascade impact across all loaded schemas
dj.Diagram.cascade(Session & {'subject_id': 'M001'}).counts()
restrict()¶
diag.restrict(table_expr)
Select a subset of data for export or inspection. Starting from a restricted table expression, propagate the restriction downstream through all descendants using AND semantics — a descendant row is included only if all restricted ancestors match. The full diagram is preserved (ancestors, unrelated tables) so that restrict() can be called again from a different seed table, building up a multi-condition subset incrementally.
| Parameter | Type | Default | Description |
|---|---|---|---|
table_expr |
QueryExpression | — | A restricted table expression |
Returns: New Diagram with restrict conditions applied. The graph is not trimmed.
Constraints:
- Chainable — call multiple times to add conditions from different seed tables
- Cannot be called on a Diagram produced by
Diagram.cascade() table_expr.full_table_namemust be a node in the diagram
# Chain multiple restrictions (AND semantics)
diag = dj.Diagram(schema)
restricted = (diag
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"'))
counts()¶
diag.counts()
Return affected row counts per table without modifying data. Works with both cascade() and restrict() restrictions.
Returns: dict[str, int] — mapping of full table names to affected row counts.
Requires: Diagram.cascade() or restrict() must be called first.
counts = dj.Diagram.cascade(Session & {'subject_id': 'M001'}).counts()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
prune()¶
diag.prune()
Remove tables with zero matching rows from the diagram view. This only affects the diagram object — no tables or data are modified in the database. Without prior restrictions, removes physically empty tables from the diagram. After restrict(), removes tables where the restricted query yields zero rows.
Returns: New Diagram with empty tables removed.
Constraints: Cannot be used on a Diagram produced by Diagram.cascade(). Cascade diagrams must retain all descendant tables because a table empty at cascade time could have rows by the time delete() executes.
Note: Queries the database to determine row counts. The underlying graph structure is preserved — subsequent restrict() calls can still seed at any table in the schema.
# Export workflow: restrict, prune, visualize
export = (dj.Diagram(schema)
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"')
.prune())
export.counts() # only tables with matching rows
export # visualize the export subgraph
Iteration¶
Diagrams support iteration in topological order:
| Method | Order | Use Case |
|---|---|---|
for ft in diagram |
Parents first | Data export, inspection |
for ft in reversed(diagram) |
Leaves first | Cascade delete, drop |
Each iteration yields a FreeTable with any cascade or restrict conditions applied. Alias nodes are skipped. Only nodes in the diagram's visible set (nodes_to_show) are yielded.
Table.delete() and Table.drop() use reversed(diagram) internally to execute mutations in safe dependency order.
Restriction Propagation¶
When cascade() or restrict() propagates a restriction from a parent table to a child table, one of three rules applies depending on the foreign key relationship:
Rule 1 — Direct copy: When the foreign key is non-aliased and the restriction attributes are a subset of the child's primary key, the restriction is copied directly to the child.
Rule 2 — Aliased projection: When the foreign key uses attribute renaming (e.g., subject_id → animal_id), the parent is projected with the attribute mapping to match the child's column names.
Rule 3 — Full projection: When the foreign key is non-aliased but the restriction uses attributes not in the child's primary key, the parent is projected (all attributes) and used as a restriction on the child.
Convergence behavior:
When a child table has multiple restricted ancestors, the convergence rule depends on the mode:
cascade()(OR): A child row is affected if any path from a restricted ancestor reaches it. This is appropriate for delete — if any reason exists to delete a row, it should be deleted.restrict()(AND): A child row is included only if all restricted ancestors match. This is appropriate for export — only rows satisfying every condition are selected.
Multiple foreign keys to the same parent:
When a child table references the same parent through multiple foreign keys (e.g., source_mouse and target_mouse both referencing Mouse), these paths always combine with OR regardless of the propagation mode. Each foreign key path is an independent reason for the child row to be affected — this is structural, not operation-dependent.
Unloaded schemas:
If a descendant table lives in a schema that hasn't been activated (loaded into the dependency graph), the graph-driven delete won't know about it. The final DELETE on the parent will fail with a foreign key error. DataJoint catches this and produces an actionable error message identifying which schema needs to be activated.
Output Methods¶
Graphviz Output¶
| Method | Returns | Description |
|---|---|---|
make_svg() |
IPython.SVG |
SVG for Jupyter display |
make_png() |
BytesIO |
PNG image bytes |
make_image() |
ndarray |
NumPy array (matplotlib) |
make_dot() |
pydot.Dot |
Graphviz DOT object |
Mermaid Output¶
New in 2.1
Mermaid output was added in DataJoint 2.1.
make_mermaid() -> str
Generates Mermaid flowchart syntax for embedding in Markdown, GitHub, or web documentation. Tables are grouped into subgraphs by schema.
Display Methods¶
| Method | Description |
|---|---|
draw() |
Display with matplotlib |
_repr_svg_() |
Jupyter notebook auto-display |
File Output¶
save(filename, format=None)
| Parameter | Type | Description |
|---|---|---|
filename |
str | Output file path |
format |
str | "png", "svg", or "mermaid". Inferred from extension if None. |
Supported extensions: .png, .svg, .mmd, .mermaid
Visual Encoding¶
Table Tiers¶
Each table tier has a distinct visual style:
| Tier | Shape | Fill Color | Font Color |
|---|---|---|---|
| Manual | rectangle | green | dark green |
| Lookup | plain text | gray | black |
| Computed | ellipse | red | dark red |
| Imported | ellipse | blue | dark blue |
| Part | plain text | transparent | black |
Edge Styles¶
| Style | Meaning |
|---|---|
| Solid line | Primary foreign key |
| Dashed line | Non-primary foreign key |
| Thick line | Master-Part relationship |
| Thin line | Multi-valued foreign key |
Node Labels¶
- Underlined: Table introduces new primary key attributes
- Plain: Table inherits all primary key attributes from parents
Schema Grouping¶
New in 2.1
Automatic schema grouping was added in DataJoint 2.1.
Tables are automatically grouped into visual clusters by their database schema. The cluster label shows the Python module name when available (following the DataJoint convention of one module per schema), otherwise the database schema name.
# Multi-schema diagram - tables automatically grouped
combined = dj.Diagram(schema1) + dj.Diagram(schema2)
combined.draw()
# Save with grouping
combined.save("pipeline.svg")
This is useful when visualizing multi-schema pipelines to see which tables belong to which module.
Examples¶
Basic Usage¶
import datajoint as dj
# Diagram from a single table
dj.Diagram(Mouse)
# Diagram from entire schema
dj.Diagram(schema)
# Diagram from module
dj.Diagram(my_pipeline_module)
Layout Direction¶
# Horizontal layout using config override
with dj.config.override(display__diagram_direction="LR"):
dj.Diagram(schema).draw()
# Or set globally
dj.config.display.diagram_direction = "LR"
dj.Diagram(schema).save("pipeline.svg")
Saving Diagrams¶
diag = dj.Diagram(schema)
# Save as SVG
diag.save("pipeline.svg")
# Save as PNG
diag.save("pipeline.png")
# Save as Mermaid
diag.save("pipeline.mmd")
# Explicit format
diag.save("output.txt", format="mermaid")
Mermaid Output¶
print(dj.Diagram(schema).make_mermaid())
Output:
flowchart TB
classDef manual fill:#90EE90,stroke:#006400
classDef lookup fill:#D3D3D3,stroke:#696969
classDef computed fill:#FFB6C1,stroke:#8B0000
classDef imported fill:#ADD8E6,stroke:#00008B
classDef part fill:#FFFFFF,stroke:#000000
subgraph my_pipeline
Mouse[Mouse]:::manual
Session[Session]:::manual
Neuron([Neuron]):::computed
end
Mouse --> Session
Session --> Neuron
Combining Diagrams¶
# Union of schemas
combined = dj.Diagram(schema1) + dj.Diagram(schema2)
# Intersection
common = dj.Diagram(schema1) * dj.Diagram(schema2)
# From sequence
combined = dj.Diagram.from_sequence([schema1, schema2, schema3])
Dependencies¶
Operational methods (cascade, restrict, counts, prune) use networkx, which is always installed as a core dependency.
Diagram visualization requires optional dependencies:
pip install matplotlib pygraphviz
If visualization dependencies are missing, dj.Diagram displays a warning and provides a stub class. Operational methods remain available regardless.
See Also¶
- How to Read Diagrams
- Delete Data — Cascade inspection and delete workflow
- What's New in 2.2 — Motivation and design
- Data Manipulation — Insert, update, delete specification
- Query Algebra
- Table Declaration