Insert¶
The insert
method of DataJoint table objects inserts entities into the table.
In Python there is a separate method insert1
to insert one entity at a time.
The entity may have the form of a Python dictionary with key names matching the
attribute names in the table.
lab.Person.insert1(
dict(username='alice',
first_name='Alice',
last_name='Cooper'))
The entity also may take the form of a sequence of values in the same order as the attributes in the table.
lab.Person.insert1(['alice', 'Alice', 'Cooper'])
Additionally, the entity may be inserted as a NumPy record array or Pandas DataFrame.
The insert
method accepts a sequence or a generator of multiple entities and is used
to insert multiple entities at once.
lab.Person.insert([
['alice', 'Alice', 'Cooper'],
['bob', 'Bob', 'Dylan'],
['carol', 'Carol', 'Douglas']])
Several optional parameters can be used with insert
:
replace
If True
, replaces the existing entity.
(Default False
.)
skip_duplicates
If True
, silently skip duplicate inserts.
(Default False
.)
ignore_extra_fields
If False
, fields that are not in the heading raise an error.
(Default False
.)
allow_direct_insert
If True
, allows inserts outside of populate calls.
Applies only in auto-populated tables.
(Default None
.)
Batched inserts¶
Inserting a set of entities in a single insert
differs from inserting the same set of
entities one-by-one in a for
loop in two ways:
- Network overhead is reduced.
Network overhead can be tens of milliseconds per query.
Inserting 1000 entities in a single
insert
call may save a few seconds over inserting them individually. - The insert is performed as an all-or-nothing transaction. If even one insert fails because it violates any constraint, then none of the entities in the set are inserted.
However, inserting too many entities in a single query may run against buffer size or packet size limits of the database server. Due to these limitations, performing inserts of very large numbers of entities should be broken up into moderately sized batches, such as a few hundred at a time.
Server-side inserts¶
Data inserted into a table often come from other tables already present on the database server. In such cases, data can be fetched from the first table and then inserted into another table, but this results in transfers back and forth between the database and the local system. Instead, data can be inserted from one table into another without transfers between the database and the local system using queries.
In the example below, a new schema has been created in preparation for phase two of a
project.
Experimental protocols from the first phase of the project will be reused in the second
phase.
Since the entities are already present on the database in the Protocol
table of the
phase_one
schema, we can perform a server-side insert into phase_two.Protocol
without fetching a local copy.
# Server-side inserts are faster...
phase_two.Protocol.insert(phase_one.Protocol)
# ...than fetching before inserting
protocols = phase_one.Protocol.fetch()
phase_two.Protocol.insert(protocols)