Work with Existing Pipelines¶
Loading Classes¶
This section describes how to work with database schemas without access to the original code that generated the schema. These situations often arise when the database is created by another user who has not shared the generating code yet or when the database schema is created from a programming language other than Python.
import datajoint as dj
Working with schemas and their modules¶
Typically a DataJoint schema is created as a dedicated Python module. This module defines a schema object that is used to link classes declared in the module to tables in the database schema. As an example, examine the university module: university.py.
You may then import the module to interact with its tables:
import university as uni
dj.Diagram(uni)
Note that dj.Diagram can extract the diagram from a schema object or from a Python module containing its schema object, lending further support to the convention of one-to-one correspondence between database schemas and Python modules in a DataJoint project:
dj.Diagram(uni)
is equivalent to
dj.Diagram(uni.schema)
# students without majors
uni.Student - uni.StudentMajor
Spawning missing classes¶
Now imagine that you do not have access to university.py
or you do not have
its latest version. You can still connect to the database schema but you will
not have classes declared to interact with it.
So let's start over in this scenario.
You may use the dj.list_schemas
function (new in DataJoint 0.12.0) to
list the names of database schemas available to you.
import datajoint as dj
dj.list_schemas()
*['dimitri_alter','dimitri_attach','dimitri_blob','dimitri_blobs',
'dimitri_nphoton','dimitri_schema','dimitri_university','dimitri_uuid',
'university']*
Just as with a new schema, we start by creating a schema object to connect to the chosen database schema:
schema = dj.Schema('dimitri_university')
If the schema already exists, dj.Schema
is initialized as usual and you may plot
the schema diagram. But instead of seeing class names, you will see the raw
table names as they appear in the database.
# let's plot its diagram
dj.Diagram(schema)
You may view the diagram but, at this point, there is no way to interact with these tables. A similar situation arises when another developer has added new tables to the schema but has not yet shared the updated module code with you. Then the diagram will show a mixture of class names and database table names.
Now you may use the spawn_missing_classes
method to spawn classes into
the local namespace for any tables missing their classes:
schema.spawn_missing_classes()
dj.Diagram(schema)
Now you may interact with these tables as if they were declared right here in this namespace:
# students without majors
Student - StudentMajor
Creating a virtual module¶
Virtual modules provide a way to access the classes corresponding to tables in a DataJoint schema without having to create local files.
spawn_missing_classes
creates the new classes in the local namespace.
However, it is often more convenient to import a schema with its Python module,
equivalent to the Python command:
import university as uni
We can mimic this import without having access to university.py
using the
VirtualModule
class object:
import datajoint as dj
uni = dj.VirtualModule(module_name='university.py', schema_name='dimitri_university')
Now uni
behaves as an imported module complete with the schema object and all
the table classes.
dj.Diagram(uni)
uni.Student - uni.StudentMajor
dj.VirtualModule
takes required arguments
module_name
: displayed module name.
schema_name
: name of the database in MySQL.
And dj.VirtualModule
takes optional arguments.
First, create_schema=False
assures that an error is raised when the schema
does not already exist. Set it to True
if you want to create an empty schema.
dj.VirtualModule('what', 'nonexistent')
Returns
---------------------------------------------------------------------------
DataJointError Traceback (most recent call last)
.
.
.
DataJointError: Database named `nonexistent` was not defined. Set argument create_schema=True to create it.
The other optional argument, create_tables=False
is passed to the schema
object. It prevents the use of the schema object of the virtual module for
creating new tables in the existing schema. This is a precautionary measure
since virtual modules are often used for completed schemas. You may set this
argument to True
if you wish to add new tables to the existing schema. A
more common approach in this scenario would be to create a new schema object and
to use the spawn_missing_classes
function to make the classes available.
However, you if do decide to create new tables in an existing tables using the virtual module, you may do so by using the schema object from the module as the decorator for declaring new tables:
uni = dj.VirtualModule('university.py', 'dimitri_university', create_tables=True)
@uni.schema
class Example(dj.Manual):
definition = """
-> uni.Student
---
example : varchar(255)
"""
dj.Diagram(uni)