Diagram Specification¶

Schema visualization as directed acyclic graphs.

Overview¶

dj.Diagram visualizes DataJoint schemas as directed graphs showing tables and their foreign key relationships. It provides multiple output formats including SVG, PNG, and Mermaid syntax.

Design Principles¶

Multiple output formats: Graphviz (SVG/PNG) and Mermaid for different use cases
Graph algebra: Combine and filter diagrams with set operators
Visual encoding: Table tiers distinguished by shape and color
Flexible layout: Configurable direction and schema grouping

API Reference¶

Constructor¶

dj.Diagram(source, context=None)

Parameter	Type	Default	Description
`source`	Table, Schema, module	—	Source to visualize
`context`	dict	None	Namespace for class name resolution

Layout Direction¶

New in 2.1

Configurable layout direction was added in DataJoint 2.1.

Layout direction is controlled via configuration:

# Check current direction
dj.config.display.diagram_direction  # "TB" or "LR"

# Set globally
dj.config.display.diagram_direction = "LR"

# Override temporarily
with dj.config.override(display__diagram_direction="LR"):
    dj.Diagram(schema).draw()

Value	Description
`"TB"`	Top to bottom (default)
`"LR"`	Left to right

Class Method¶

dj.Diagram.from_sequence(sequence)

Create a combined diagram from multiple sources. Equivalent to Diagram(a) + Diagram(b) + ....

Operators¶

Diagrams support set algebra for combining and filtering:

Operator	Description	Example
`diag + n`	Expand n levels downstream (children)	`dj.Diagram(Mouse) + 2`
`diag - n`	Expand n levels upstream (parents)	`dj.Diagram(Neuron) - 2`
`diag1 + diag2`	Union of two diagrams	`dj.Diagram(Mouse) + dj.Diagram(Session)`
`diag1 - diag2`	Difference (remove nodes)	`dj.Diagram(schema) - dj.Diagram(Lookup)`
`diag1 * diag2`	Intersection	`dj.Diagram(schema1) * dj.Diagram(schema2)`

Common Patterns¶

# Show table with immediate parents and children
dj.Diagram(MyTable) + 1 - 1

# Show entire schema
dj.Diagram(schema)

# Show all tables downstream of a source
dj.Diagram(SourceTable) + 10

# Show ancestry of a computed table
dj.Diagram(ComputedTable) - 10

Note: Order matters. diagram + 1 - 1 may differ from diagram - 1 + 1.

Collapsing Schemas¶

New in 2.1

The collapse() method was added in DataJoint 2.1.

diag.collapse()

Mark a diagram for collapsing when combined with other diagrams. Collapsed schemas appear as single nodes showing the table count.

# Show schema1 expanded, schema2 as a single collapsed node
dj.Diagram(schema1) + dj.Diagram(schema2).collapse()

"Expanded wins" rule: If a node appears in both a collapsed and non-collapsed diagram, it stays expanded. This allows you to show specific tables from a schema while collapsing the rest.

# Subject is expanded, rest of analysis schema is collapsed
dj.Diagram(Subject) + dj.Diagram(analysis).collapse()

Operational Methods¶

New in 2.2

Operational methods (Diagram.cascade(), restrict, counts, prune) were added in DataJoint 2.2.

Diagrams can propagate restrictions through the dependency graph and inspect affected data using the graph structure. These methods turn Diagram from a visualization tool into a graph computation and inspection component. All mutation operations (delete, drop) are executed by Table.delete() and Table.drop(), which use Diagram internally.

`Diagram.cascade()` (class method)¶

dj.Diagram.cascade(table_expr, part_integrity="enforce")

Create a cascade diagram for delete. Builds a complete dependency graph from the table expression, includes all descendants across all loaded schemas, propagates the restriction downstream using OR semantics — a descendant row is marked for deletion if any ancestor path reaches it — and trims to the cascade subgraph.

Parameter	Type	Default	Description
`table_expr`	QueryExpression	—	A restricted table expression (e.g., `Session & 'subject_id=1'`)
`part_integrity`	str	`"enforce"`	Master-part integrity policy

Returns: New Diagram containing only the seed table and its descendants, with cascade restrictions applied.

part_integrity values:

Value	Behavior
`"enforce"`	Error if parts would be deleted before masters
`"ignore"`	Allow deleting parts without masters
`"cascade"`	Propagate restriction upward from part to master, then re-propagate downstream to all sibling parts

With "cascade", the restriction flows upward from a part table to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction propagates back downstream through the normal cascade — deleting the entire compositional unit (master + all parts), not just the originally matched part rows.

# Preview cascade impact across all loaded schemas
dj.Diagram.cascade(Session & {'subject_id': 'M001'}).counts()

`restrict()`¶

diag.restrict(table_expr)

Select a subset of data for export or inspection. Starting from a restricted table expression, propagate the restriction downstream through all descendants using AND semantics — a descendant row is included only if all restricted ancestors match. The full diagram is preserved (ancestors, unrelated tables) so that restrict() can be called again from a different seed table, building up a multi-condition subset incrementally.

Parameter	Type	Default	Description
`table_expr`	QueryExpression	—	A restricted table expression

Returns: New Diagram with restrict conditions applied. The graph is not trimmed.

Constraints:

Chainable — call multiple times to add conditions from different seed tables
Cannot be called on a Diagram produced by Diagram.cascade()
table_expr.full_table_name must be a node in the diagram

# Chain multiple restrictions (AND semantics)
diag = dj.Diagram(schema)
restricted = (diag
    .restrict(Subject & {'species': 'mouse'})
    .restrict(Session & 'session_date > "2024-01-01"'))

`counts()`¶

diag.counts()

Return affected row counts per table without modifying data. Works with both cascade() and restrict() restrictions.

Returns: dict[str, int] — mapping of full table names to affected row counts.

Requires: Diagram.cascade() or restrict() must be called first.

counts = dj.Diagram.cascade(Session & {'subject_id': 'M001'}).counts()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

`prune()`¶

diag.prune()

Remove tables with zero matching rows from the diagram view. This only affects the diagram object — no tables or data are modified in the database. Without prior restrictions, removes physically empty tables from the diagram. After restrict(), removes tables where the restricted query yields zero rows.

Returns: New Diagram with empty tables removed.

Constraints: Cannot be used on a Diagram produced by Diagram.cascade(). Cascade diagrams must retain all descendant tables because a table empty at cascade time could have rows by the time delete() executes.

Note: Queries the database to determine row counts. The underlying graph structure is preserved — subsequent restrict() calls can still seed at any table in the schema.

# Export workflow: restrict, prune, visualize
export = (dj.Diagram(schema)
    .restrict(Subject & {'species': 'mouse'})
    .restrict(Session & 'session_date > "2024-01-01"')
    .prune())

export.counts()    # only tables with matching rows
export             # visualize the export subgraph

Iteration¶

Diagrams support iteration in topological order:

Method	Order	Use Case
`for ft in diagram`	Parents first	Data export, inspection
`for ft in reversed(diagram)`	Leaves first	Cascade delete, drop

Each iteration yields a FreeTable with any cascade or restrict conditions applied. Alias nodes are skipped. Only nodes in the diagram's visible set (nodes_to_show) are yielded.

Table.delete() and Table.drop() use reversed(diagram) internally to execute mutations in safe dependency order.

Restriction Propagation¶

When cascade() or restrict() propagates a restriction from a parent table to a child table, one of three rules applies depending on the foreign key relationship:

Rule 1 — Direct copy: When the foreign key is non-aliased and the restriction attributes are a subset of the child's primary key, the restriction is copied directly to the child.

Rule 2 — Aliased projection: When the foreign key uses attribute renaming (e.g., subject_id → animal_id), the parent is projected with the attribute mapping to match the child's column names.

Rule 3 — Full projection: When the foreign key is non-aliased but the restriction uses attributes not in the child's primary key, the parent is projected (all attributes) and used as a restriction on the child.

Convergence behavior:

When a child table has multiple restricted ancestors, the convergence rule depends on the mode:

cascade() (OR): A child row is affected if any path from a restricted ancestor reaches it. This is appropriate for delete — if any reason exists to delete a row, it should be deleted.
restrict() (AND): A child row is included only if all restricted ancestors match. This is appropriate for export — only rows satisfying every condition are selected.

Multiple foreign keys to the same parent:

When a child table references the same parent through multiple foreign keys (e.g., source_mouse and target_mouse both referencing Mouse), these paths always combine with OR regardless of the propagation mode. Each foreign key path is an independent reason for the child row to be affected — this is structural, not operation-dependent.

Unloaded schemas:

If a descendant table lives in a schema that hasn't been activated (loaded into the dependency graph), the graph-driven delete won't know about it. The final DELETE on the parent will fail with a foreign key error. DataJoint catches this and produces an actionable error message identifying which schema needs to be activated.

Output Methods¶

Graphviz Output¶

Method	Returns	Description
`make_svg()`	`IPython.SVG`	SVG for Jupyter display
`make_png()`	`BytesIO`	PNG image bytes
`make_image()`	`ndarray`	NumPy array (matplotlib)
`make_dot()`	`pydot.Dot`	Graphviz DOT object

Mermaid Output¶

New in 2.1

Mermaid output was added in DataJoint 2.1.

make_mermaid() -> str

Generates Mermaid flowchart syntax for embedding in Markdown, GitHub, or web documentation. Tables are grouped into subgraphs by schema.

Display Methods¶

Method	Description
`draw()`	Display with matplotlib
`_repr_svg_()`	Jupyter notebook auto-display

File Output¶

save(filename, format=None)

Parameter	Type	Description
`filename`	str	Output file path
`format`	str	`"png"`, `"svg"`, or `"mermaid"`. Inferred from extension if None.

Supported extensions: .png, .svg, .mmd, .mermaid

Visual Encoding¶

Table Tiers¶

Each table tier has a distinct visual style:

Tier	Shape	Fill Color	Font Color
Manual	rectangle	green	dark green
Lookup	plain text	gray	black
Computed	ellipse	red	dark red
Imported	ellipse	blue	dark blue
Part	plain text	transparent	black

Edge Styles¶

Style	Meaning
Solid line	Primary foreign key
Dashed line	Non-primary foreign key
Thick line	Master-Part relationship
Thin line	Multi-valued foreign key

Node Labels¶

Underlined: Table introduces new primary key attributes
Plain: Table inherits all primary key attributes from parents

Schema Grouping¶

New in 2.1

Automatic schema grouping was added in DataJoint 2.1.

Tables are automatically grouped into visual clusters by their database schema. The cluster label shows the Python module name when available (following the DataJoint convention of one module per schema), otherwise the database schema name.

# Multi-schema diagram - tables automatically grouped
combined = dj.Diagram(schema1) + dj.Diagram(schema2)
combined.draw()

# Save with grouping
combined.save("pipeline.svg")

This is useful when visualizing multi-schema pipelines to see which tables belong to which module.

Examples¶

Basic Usage¶

import datajoint as dj

# Diagram from a single table
dj.Diagram(Mouse)

# Diagram from entire schema
dj.Diagram(schema)

# Diagram from module
dj.Diagram(my_pipeline_module)

Layout Direction¶

# Horizontal layout using config override
with dj.config.override(display__diagram_direction="LR"):
    dj.Diagram(schema).draw()

# Or set globally
dj.config.display.diagram_direction = "LR"
dj.Diagram(schema).save("pipeline.svg")

Saving Diagrams¶

diag = dj.Diagram(schema)

# Save as SVG
diag.save("pipeline.svg")

# Save as PNG
diag.save("pipeline.png")

# Save as Mermaid
diag.save("pipeline.mmd")

# Explicit format
diag.save("output.txt", format="mermaid")

Mermaid Output¶

print(dj.Diagram(schema).make_mermaid())

Output:

flowchart TB
    classDef manual fill:#90EE90,stroke:#006400
    classDef lookup fill:#D3D3D3,stroke:#696969
    classDef computed fill:#FFB6C1,stroke:#8B0000
    classDef imported fill:#ADD8E6,stroke:#00008B
    classDef part fill:#FFFFFF,stroke:#000000

    subgraph my_pipeline
        Mouse[Mouse]:::manual
        Session[Session]:::manual
        Neuron([Neuron]):::computed
    end
    Mouse --> Session
    Session --> Neuron

Combining Diagrams¶

# Union of schemas
combined = dj.Diagram(schema1) + dj.Diagram(schema2)

# Intersection
common = dj.Diagram(schema1) * dj.Diagram(schema2)

# From sequence
combined = dj.Diagram.from_sequence([schema1, schema2, schema3])

Dependencies¶

Operational methods (cascade, restrict, counts, prune) use networkx, which is always installed as a core dependency.

Diagram visualization requires optional dependencies:

pip install matplotlib pygraphviz

If visualization dependencies are missing, dj.Diagram displays a warning and provides a stub class. Operational methods remain available regardless.

Diagram Specification¶

Overview¶

Design Principles¶

API Reference¶

Constructor¶

Layout Direction¶

Class Method¶

Operators¶

Common Patterns¶

Collapsing Schemas¶

Operational Methods¶

Diagram.cascade() (class method)¶

restrict()¶

counts()¶

prune()¶

Iteration¶

Restriction Propagation¶

Output Methods¶

Graphviz Output¶

Mermaid Output¶

Display Methods¶

File Output¶

Visual Encoding¶

Table Tiers¶

Edge Styles¶

Node Labels¶

Schema Grouping¶

Examples¶

Basic Usage¶

Layout Direction¶

Saving Diagrams¶

Mermaid Output¶

Combining Diagrams¶

Dependencies¶

See Also¶

`Diagram.cascade()` (class method)¶

`restrict()`¶

`counts()`¶

`prune()`¶