Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Transactions

Databases are not merely storage systems; they should accurately represent an enterprise’s current state. This means that all users, irrespective of their interactions, should view and engage with the same data simultaneously, seeing the results of each other’s interactions without breaking data integrity. This principle is known as data consistency.

Data Consistency

Data Consistency: A database’s capability to present a singular, valid, and current version of its data to all users, even during concurrent access and modifications. Successful read queries should reflect the database’s most recent state, while successful writes should immediately influence all subsequent read actions.

Understanding data consistency becomes clearer when examining its breaches. For instance, during early morning hours, I’ve observed my bank’s website displaying the previous day’s pending transactions, but the account balance doesn’t reflect these changes until a couple of hours later. This discrepancy between transaction views and account balances exemplifies data inconsistency. Fortunately, such inconsistencies, in this case, seem to be confined to the web interface, as the system eventually reaches a consistent state.

Ensuring data consistency is straightforward in certain scenarios. By avoiding conditions that might compromise it, consistency is preserved. For example, if only one party generates data and the rest merely access it, the likelihood of conflicts leading to inconsistency is minimal. Delayed queries still provide a consistent, albeit older, state. This is typical in scientific projects, where one lab produces data while others analyze it.

Complexities arise when multiple entities, be they human or digital, access and modify data simultaneously. Maintaining consistency amidst such concurrent interactions becomes challenging. To achieve this, databases might temporarily limit access for some users during another’s transaction or force users to resolve discrepancies before data integration.

The ACID Model

Modern relational databases adhere to the ACID model to maintain consistency:

ACID Model for Database Transactions
  • Atomic — A transaction is all-or-nothing. Either all operations complete successfully, or none of them take effect. If any operation fails, the entire transaction is rolled back.

  • Consistent — A transaction moves the database from one valid state to another. All data integrity constraints (foreign keys, unique constraints, etc.) are maintained.

  • Isolated — Concurrent transactions don’t interfere with each other. Each transaction sees a consistent snapshot of the database as if it were running alone.

  • Durable — Once a transaction is committed, the changes persist even in the event of system failures.

Ensuring consistency becomes notably challenging in geographically dispersed systems with distributed data storage, especially when faced with slow or intermittent network connections. Historically, it was believed that data systems spanning vast areas couldn’t maintain consistency. The CAP Theorem suggested that in such systems, there’s an irreconcilable trade-off between system responsiveness (availability) and data consistency.

Traditional relational database systems, like Oracle, MySQL, and others, maintained strong consistency but weren’t tailored for distributed setups. This limitation spurred the rise of NoSQL in the 2000s and 2010s, emphasizing responsiveness in distributed systems, albeit with weaker consistency.

However, recent advancements have bridged this gap. Modern distributed systems, like Spanner and CockroachDB, leverage data replication and consensus algorithms (e.g., Paxos, Raft) to offer high availability while maintaining strict consistency.

DataJoint adheres to the classic ACID consistency model, leveraging serializable transactions for operations that must be atomic.

Example: Bank Transfers

The classic example for understanding transactions is a bank transfer. When transferring money between accounts, two operations must happen together:

  1. Debit the source account

  2. Credit the destination account

If only one operation completes (due to an error, network failure, or system crash), money is either lost or created out of thin air—both unacceptable outcomes.

Let’s set up a simple bank account schema to demonstrate:

import datajoint as dj

schema = dj.Schema("bank")
@schema
class Account(dj.Manual):
    definition = """
    account_number : int
    ---
    customer_name : varchar(60) 
    balance : decimal(9, 2)
    """
from faker import Faker

fake = Faker()

# Populate with sample accounts
Account.insert(
    dict(
        account_number=fake.random.randint(10_000_000, 99_999_999),
        customer_name=fake.name(),
        balance=fake.random.randint(0, 100_000_00) / 100,
    )
    for i in range(100)
)
Account()
# Select two random accounts for our transfer example
keys = Account.fetch("KEY")
account1 = fake.random.choice(keys)
account2 = fake.random.choice(keys)

print(f"Source account: {account1}")
print(f"Destination account: {account2}")

The Problem: Transfer Without Transactions

Here’s a naive transfer function that doesn’t use transactions. If an error occurs between the debit and credit operations, money disappears:

def transfer_without_transaction(source, destination, amount):
    """Transfer money WITHOUT transaction protection - DANGEROUS!"""
    # Check sufficient funds
    source_balance = (Account & source).fetch1("balance")
    if source_balance < amount:
        raise RuntimeError("Insufficient funds")
    
    # Debit source account
    Account.update1(dict(source, balance=float(source_balance) - amount))
    
    # Simulate an error occurring here!
    raise RuntimeError("Network error - connection lost")
    
    # Credit destination account - THIS NEVER EXECUTES
    dest_balance = (Account & destination).fetch1("balance")
    Account.update1(dict(destination, balance=float(dest_balance) + amount))
# Check balances before the failed transfer
print("Before failed transfer:")
Account & [account1, account2]
# This transfer will fail midway, leaving the database in an inconsistent state
try:
    transfer_without_transaction(account1, account2, 100.00)
except RuntimeError as e:
    print(f"Transfer failed: {e}")
# Check balances after - money has vanished!
print("After failed transfer (inconsistent state):")
Account & [account1, account2]

The Solution: DataJoint Transactions

DataJoint provides a transaction context manager through the connection object. Operations inside the with conn.transaction: block are atomic—if any operation fails, all changes are rolled back automatically.

def transfer_with_transaction(source, destination, amount):
    """Transfer money WITH transaction protection - SAFE!"""
    conn = dj.conn()
    
    with conn.transaction:
        # Check sufficient funds
        source_balance = (Account & source).fetch1("balance")
        if source_balance < amount:
            raise RuntimeError("Insufficient funds")
        
        # Debit source account
        Account.update1(dict(source, balance=float(source_balance) - amount))
        
        # Simulate an error occurring here!
        raise RuntimeError("Network error - connection lost")
        
        # Credit destination account
        dest_balance = (Account & destination).fetch1("balance")
        Account.update1(dict(destination, balance=float(dest_balance) + amount))
# First, let's reset the source account balance for a fair comparison
source_balance = (Account & account1).fetch1("balance")
Account.update1(dict(account1, balance=float(source_balance) + 100))  # restore the $100

print("Balances restored:")
Account & [account1, account2]
# This transfer will also fail, but the transaction will rollback
try:
    transfer_with_transaction(account1, account2, 100.00)
except RuntimeError as e:
    print(f"Transfer failed: {e}")
    print("But the transaction was rolled back!")
# Check balances after - no money was lost!
print("After failed transfer (consistent state preserved):")
Account & [account1, account2]

The Transaction Context Manager

DataJoint’s transaction context manager (conn.transaction) provides:

conn = dj.conn()

with conn.transaction:
    # All operations here are part of one atomic transaction
    Table1.insert(...)
    Table2.insert(...)
    Table3.delete(...)
# If we reach here, all operations are committed

Behavior:

  • On success: When the with block completes normally, all changes are committed to the database.

  • On exception: If any exception is raised inside the block, all changes are rolled back and the exception is re-raised.

A Working Transfer Function

Here’s a complete, working transfer function:

def transfer(source, destination, amount):
    """Safely transfer money between accounts."""
    if amount <= 0:
        raise ValueError("Transfer amount must be positive")
    
    conn = dj.conn()
    
    with conn.transaction:
        # Check sufficient funds
        source_balance = (Account & source).fetch1("balance")
        if source_balance < amount:
            raise RuntimeError("Insufficient funds")
        
        # Debit source account
        Account.update1(dict(source, balance=float(source_balance) - amount))
        
        # Credit destination account
        dest_balance = (Account & destination).fetch1("balance")
        Account.update1(dict(destination, balance=float(dest_balance) + amount))
    
    print(f"Successfully transferred ${amount:.2f}")
print("Before transfer:")
print(Account & [account1, account2])

transfer(account1, account2, 50.00)

print("\nAfter transfer:")
print(Account & [account1, account2])

Transactions in Computed Tables

An important feature of DataJoint is that make() methods in Computed and Imported tables automatically run inside a transaction. This means:

  1. All inserts within a single make() call are atomic

  2. If make() raises an exception, no partial results are inserted

  3. The computation either fully succeeds or fully fails

@schema
class ProcessedData(dj.Computed):
    definition = """
    -> RawData
    ---
    result : float
    """
    
    def make(self, key):
        # Everything here runs inside a transaction
        data = (RawData & key).fetch1("data")
        result = expensive_computation(data)
        
        self.insert1(dict(key, result=result))
        # If we crash here, the insert is rolled back
        
        AuxiliaryTable.insert1(...)  # Also part of the same transaction

This automatic transaction wrapping is why DataJoint pipelines maintain computational validity—downstream results are always consistent with their upstream inputs.

Transactions and Master-Part Relationships

Master-part tables (see Master-Part) leverage transactions to ensure that a master entry and all its parts are inserted atomically.

When inserting into a master table with parts:

@schema 
class Experiment(dj.Manual):
    definition = """
    experiment_id : int
    ---
    experiment_date : date
    """
    
    class Trial(dj.Part):
        definition = """
        -> master
        trial_num : int
        ---
        outcome : varchar(20)
        """

# Insert master and parts together in a transaction
conn = dj.conn()
with conn.transaction:
    Experiment.insert1(dict(experiment_id=1, experiment_date='2024-01-15'))
    Experiment.Trial.insert([
        dict(experiment_id=1, trial_num=1, outcome='success'),
        dict(experiment_id=1, trial_num=2, outcome='failure'),
        dict(experiment_id=1, trial_num=3, outcome='success'),
    ])

This ensures you never have an experiment without its trials, or orphaned trials without an experiment.

Best Practices

When to Use Explicit Transactions

A single insert() or update1() call is already atomic—it either fully succeeds or fully fails. Explicit transactions using conn.transaction are only needed when multiple operations must succeed or fail together:

# No need for explicit transaction - single insert is already atomic
Table.insert(data)

# Explicit transaction needed - multiple operations must be atomic
with dj.conn().transaction:
    Table1.insert(data1)
    Table2.insert(data2)
    Table3.update1(data3)

Transactions and Long Computations

When computations take a long time, there’s a tension between keeping transactions short (to avoid blocking) and maintaining data integrity (ensuring inputs haven’t changed).

DataJoint’s make() method handles this through a three-part pattern that refetches and verifies input data inside the transaction. See The make Method for details on implementing safe computations.

Avoid User Interaction Inside Transactions

Never wait for user input while holding a transaction open:

# Bad: User could take coffee break!
with conn.transaction:
    Table.insert(data)
    confirm = input("Confirm? (y/n)")  # DON'T DO THIS
    if confirm != 'y':
        raise Exception("Cancelled")

Handle Deadlocks Gracefully

When multiple transactions compete for the same resources, deadlocks can occur. MySQL automatically detects and resolves deadlocks by rolling back one transaction.

import time

def safe_operation(max_retries=3):
    """Execute operations with deadlock retry logic."""
    for attempt in range(max_retries):
        try:
            with dj.conn().transaction:
                Table1.insert(data1)
                Table2.insert(data2)
            return  # Success
        except Exception as e:
            if "Deadlock" in str(e) and attempt < max_retries - 1:
                time.sleep(0.1 * (2 ** attempt))  # Exponential backoff
                continue
            raise

Summary

Transactions are fundamental to maintaining data integrity in DataJoint:

FeatureDescription
Single operationsAlready atomic—no explicit transaction needed
conn.transactionContext manager for grouping multiple operations
Automatic rollbackOn exception, all changes are undone
make() wrappingComputed table computations are automatically transactional
Master-part atomicityInsert master and parts together safely

Key takeaways:

  1. Use explicit transactions only when multiple operations must succeed or fail together

  2. Single inserts and updates are already atomic

  3. DataJoint’s populate() automatically handles transactions for you

  4. Master-part relationships provide built-in transactional semantics

  5. For long computations, see The make Method for safe patterns