Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

UUIDs

UUIDs: Universally Unique Identifiers

This chapter demonstrates how to use Universally Unique Identifiers (UUIDs) in DataJoint tables as surrogate keys.

For the conceptual foundation on primary keys and when to use surrogate keys like UUIDs, see Primary Keys. That chapter covers:

This chapter focuses on practical implementation of UUIDs and related unique identifier systems.

When to Use UUIDs

UUIDs are appropriate when you need surrogate keys that are:

UUID Types and Characteristics

UUIDs are standardized by RFC 9562. The most commonly used types are:

UUID1 (Time-based)

Generated from: Timestamp + MAC address (or random node ID)

Characteristics:

Best for: Primary keys where chronological ordering matters, audit logs, distributed systems needing time-ordered IDs.

UUID4 (Random)

Generated from: Cryptographically secure random numbers

Characteristics:

Best for: Security tokens, session IDs, cases where predictability is a concern, privacy-sensitive contexts.

UUID3/UUID5 (Deterministic)

Generated from: Namespace UUID + name string (MD5 for UUID3, SHA-1 for UUID5)

Characteristics:

Best for: Content-addressable storage, deduplication, creating stable IDs from existing identifiers, hierarchical categorization systems.

Alternative Unique Identifier Systems

While UUIDs are the most common standardized unique identifier system, alternatives may be better suited for specific use cases:

ULID (Universally Unique Lexicographically Sortable Identifier)

ULID provides a lexicographically sortable unique identifier that combines timestamp and randomness. Unlike UUIDs, ULIDs are guaranteed to be sortable by creation time.

Characteristics:

Best for: Database primary keys where chronological sorting is important, log entries, distributed systems requiring sortable IDs without coordination.

Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV

Resources:

NANOID

NANOID is a tiny, URL-safe, unique string ID generator using a cryptographically strong random generator.

Characteristics:

Best for: Short URLs or slugs, compact identifiers, high-performance applications, user-facing identifiers where brevity improves UX.

Example: V1StGXR8_Z5jdHi6B-myT (21 characters)

Resources:

Comparison Table

FeatureUUIDULIDNANOID
StandardRFC 9562Community specCommunity spec
Length36 chars26 chars21 chars (default)
SortableUUID1 onlyAlwaysNo
Time-orderedUUID1 onlyAlwaysNo
URL-safeYesYesYes
Database supportNative in manyLimitedLimited
Best forStandard complianceSortable IDsCompact IDs

Python’s UUID Module

Python provides the uuid module as part of its standard library:

import uuid
# Generate different types of UUIDs
uuid1_value = uuid.uuid1()  # Time-based, sortable
uuid4_value = uuid.uuid4()  # Random
print(f"UUID1: {uuid1_value}")
print(f"UUID4: {uuid4_value}")
UUID1: 3bb5d574-b8f1-11f0-966b-42531a4c8f38
UUID4: 70a060fa-99e2-4d37-bfa7-c4ffc799e2a1
uuid.uuid1()
UUID('3c315866-b8f1-11f0-966b-42531a4c8f38')
# UUID1 generates sequential UUIDs (notice the pattern)
[uuid.uuid1() for _ in range(5)]
[UUID('3ca03ede-b8f1-11f0-966b-42531a4c8f38'), UUID('3ca03f92-b8f1-11f0-966b-42531a4c8f38'), UUID('3ca03fb0-b8f1-11f0-966b-42531a4c8f38'), UUID('3ca03fc4-b8f1-11f0-966b-42531a4c8f38'), UUID('3ca03fd8-b8f1-11f0-966b-42531a4c8f38')]
# UUID4 generates random UUIDs (no pattern)
[uuid.uuid4() for _ in range(5)]
[UUID('7937d5cf-5568-4869-868d-6b0d40364360'), UUID('cdca9e61-e18c-46e5-966c-481a20f7fb53'), UUID('0154cafa-9200-406f-b03d-e5e8bf7f9023'), UUID('b2db8ca1-dcf3-4949-b433-f106b8fc32e2'), UUID('090380f6-a471-4251-85f9-75a3a907eaca')]
# UUID5 generates deterministic UUIDs from namespace + name
namespace = uuid.UUID('00000000-0000-0000-0000-000000000000')
topic = uuid.uuid5(namespace, 'Neuroscience')
subject = uuid.uuid5(topic, 'Habenula')
print(f"Topic UUID: {topic}")
print(f"Subject UUID: {subject}")
# Same inputs always produce same UUIDs
assert uuid.uuid5(namespace, 'Neuroscience') == topic
Topic UUID: 913e0174-a390-5c08-b50a-623690546dd5
Subject UUID: b5804c3f-57b1-54e3-8176-3b45aa443a97

Using UUIDs in DataJoint Tables

DataJoint natively supports UUIDs as a data type. When you declare an attribute as uuid, DataJoint automatically stores it as BINARY(16) in MySQL for efficient storage and indexing.

Let’s create examples demonstrating UUID usage in DataJoint:

import datajoint as dj
schema = dj.Schema('uuid')
@schema
class Message(dj.Manual):
    definition = """
    message_id : uuid  # internal message id
    ---
    message_body : varchar(1000) 
    """
print(Message.describe())
message_id           : uuid                         # internal message id
---
message_body         : varchar(1000)                

# For the curious: Internally, DataJoint represents uuids as BINARY(16) 
Message.heading['message_id'].sql
'`message_id` binary(16) NOT NULL COMMENT ":uuid:internal message id"'
Message.insert1((uuid.uuid1(), 'Hello, world!'))
Message.insert1((uuid.uuid1(), 'Cogito ergo sum'))
Message()
Loading...
Message.insert1((uuid.uuid1(), 'I will be back'))
Message.insert1((uuid.uuid1(), 'Must destroy humans.'))
b = uuid.uuid4().bytes
b
b'\x8c~\xa5\x83\x9a5C\x90\xaf6%\x87+{#\x1e'
uuid.UUID(bytes=b)
UUID('8c7ea583-9a35-4390-af36-25872b7b231e')
Message()
Loading...
Message.insert1((uuid.uuid4(), 'Hasta la vista baby'))
Message()
Loading...
@schema
class Comment(dj.Manual):
    definition = """
    comment_id : uuid
    --- 
    -> Message
    comment_body : varchar(1000)
    """
# For the curious: This is how the table was declared in SQL
print(schema.connection.query('show create table `uuid`.`comment`').fetchall()[0][1])
CREATE TABLE `comment` (
  `comment_id` binary(16) NOT NULL COMMENT ':uuid:',
  `message_id` binary(16) NOT NULL COMMENT ':uuid:internal message id',
  `comment_body` varchar(1000) NOT NULL,
  PRIMARY KEY (`comment_id`),
  KEY `message_id` (`message_id`),
  CONSTRAINT `comment_ibfk_1` FOREIGN KEY (`message_id`) REFERENCES `message` (`message_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
dj.Diagram(schema)
Loading...
keys = Message.fetch('KEY')
keys
[{'message_id': UUID('5d4e0c56-b8f1-11f0-966b-42531a4c8f38')}, {'message_id': UUID('5d4eba5c-b8f1-11f0-966b-42531a4c8f38')}, {'message_id': UUID('5dce3d18-b8f1-11f0-966b-42531a4c8f38')}, {'message_id': UUID('5dced91c-b8f1-11f0-966b-42531a4c8f38')}, {'message_id': UUID('9ca5cb3d-c989-4f0f-adf8-8eb0193f4bcc')}]
Comment.insert1(dict(keys[0], comment_id=uuid.uuid1(), comment_body="thank you"))
Message * Comment
Loading...
Message * Comment & keys[0]
Loading...
Message & keys[1:4]
Loading...
Comment.insert1(dict(keys[1], comment_id=uuid.uuid1(), comment_body="thank you"))
Comment()
Loading...
Message & Comment
Loading...
Message - Comment
Loading...
Message * Comment
Loading...

Summary

UUIDs provide a robust solution for generating surrogate keys in DataJoint:

AspectKey Points
UUID1Time-based, sortable, good for primary keys needing order
UUID4Random, privacy-preserving, good for security tokens
UUID5Deterministic, good for content-addressable systems
StorageDataJoint stores UUIDs as BINARY(16) for efficiency
AlternativesULID (sortable, compact) and NANOID (very compact) available