UUIDs: Universally Unique Identifiers¶
This chapter demonstrates how to use Universally Unique Identifiers (UUIDs) in DataJoint tables as surrogate keys.
For the conceptual foundation on primary keys and when to use surrogate keys like UUIDs, see Primary Keys. That chapter covers:
What primary keys are and why they matter
Natural keys vs. surrogate keys
Why DataJoint requires explicit key values (no auto-increment)
When surrogate keys like UUIDs are appropriate
This chapter focuses on practical implementation of UUIDs and related unique identifier systems.
When to Use UUIDs¶
UUIDs are appropriate when you need surrogate keys that are:
Globally unique without coordination between systems
Generated client-side before database insertion (required by DataJoint)
Collision-resistant even across distributed systems
Not exposed to users (internal identifiers only)
UUID Types and Characteristics¶
UUIDs are standardized by RFC 9562. The most commonly used types are:
UUID1 (Time-based)¶
Generated from: Timestamp + MAC address (or random node ID)
Characteristics:
Sortable by creation time — UUIDs generated later are lexicographically greater
Contains temporal information — Timestamp can be extracted
Contains hardware identifier — May expose MAC address (privacy concern)
Sequential when generated rapidly — Reduces index fragmentation
Best for: Primary keys where chronological ordering matters, audit logs, distributed systems needing time-ordered IDs.
UUID4 (Random)¶
Generated from: Cryptographically secure random numbers
Characteristics:
No temporal information — Cannot determine when generated
No hardware identifier — Privacy-preserving
Uniformly distributed — May cause index fragmentation in large tables
Highest entropy — Most unpredictable
Best for: Security tokens, session IDs, cases where predictability is a concern, privacy-sensitive contexts.
UUID3/UUID5 (Deterministic)¶
Generated from: Namespace UUID + name string (MD5 for UUID3, SHA-1 for UUID5)
Characteristics:
Deterministic — Same inputs always produce the same UUID
Hierarchical — Can create nested namespaces
Content-addressable — UUID identifies the content
Reproducible — No need to store generated UUIDs
Best for: Content-addressable storage, deduplication, creating stable IDs from existing identifiers, hierarchical categorization systems.
Alternative Unique Identifier Systems¶
While UUIDs are the most common standardized unique identifier system, alternatives may be better suited for specific use cases:
ULID (Universally Unique Lexicographically Sortable Identifier)¶
ULID provides a lexicographically sortable unique identifier that combines timestamp and randomness. Unlike UUIDs, ULIDs are guaranteed to be sortable by creation time.
Characteristics:
48-bit timestamp (millisecond precision) + 80-bit randomness
Lexicographically sortable: Can be sorted as strings to maintain chronological order
URL-safe: Uses Crockford’s Base32 encoding (no special characters)
Case-insensitive: Designed for human readability
26 characters: More compact than UUIDs (36 characters)
Best for: Database primary keys where chronological sorting is important, log entries, distributed systems requiring sortable IDs without coordination.
Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV
Resources:
NANOID¶
NANOID is a tiny, URL-safe, unique string ID generator using a cryptographically strong random generator.
Characteristics:
Configurable length: Default is 21 characters (adjustable)
URL-safe: Uses URL-safe characters (A-Za-z0-9_-)
Fast: 4x faster than UUID
Smaller size: 21 characters vs UUID’s 36 characters
Collision-resistant: Uses cryptographically strong random generator
Best for: Short URLs or slugs, compact identifiers, high-performance applications, user-facing identifiers where brevity improves UX.
Example: V1StGXR8_Z5jdHi6B-myT (21 characters)
Resources:
Comparison Table¶
| Feature | UUID | ULID | NANOID |
|---|---|---|---|
| Standard | RFC 9562 | Community spec | Community spec |
| Length | 36 chars | 26 chars | 21 chars (default) |
| Sortable | UUID1 only | Always | No |
| Time-ordered | UUID1 only | Always | No |
| URL-safe | Yes | Yes | Yes |
| Database support | Native in many | Limited | Limited |
| Best for | Standard compliance | Sortable IDs | Compact IDs |
Python’s UUID Module¶
Python provides the uuid module as part of its standard library:
import uuid# Generate different types of UUIDs
uuid1_value = uuid.uuid1() # Time-based, sortable
uuid4_value = uuid.uuid4() # Random
print(f"UUID1: {uuid1_value}")
print(f"UUID4: {uuid4_value}")UUID1: 3bb5d574-b8f1-11f0-966b-42531a4c8f38
UUID4: 70a060fa-99e2-4d37-bfa7-c4ffc799e2a1
uuid.uuid1()UUID('3c315866-b8f1-11f0-966b-42531a4c8f38')# UUID1 generates sequential UUIDs (notice the pattern)
[uuid.uuid1() for _ in range(5)][UUID('3ca03ede-b8f1-11f0-966b-42531a4c8f38'),
UUID('3ca03f92-b8f1-11f0-966b-42531a4c8f38'),
UUID('3ca03fb0-b8f1-11f0-966b-42531a4c8f38'),
UUID('3ca03fc4-b8f1-11f0-966b-42531a4c8f38'),
UUID('3ca03fd8-b8f1-11f0-966b-42531a4c8f38')]# UUID4 generates random UUIDs (no pattern)
[uuid.uuid4() for _ in range(5)][UUID('7937d5cf-5568-4869-868d-6b0d40364360'),
UUID('cdca9e61-e18c-46e5-966c-481a20f7fb53'),
UUID('0154cafa-9200-406f-b03d-e5e8bf7f9023'),
UUID('b2db8ca1-dcf3-4949-b433-f106b8fc32e2'),
UUID('090380f6-a471-4251-85f9-75a3a907eaca')]# UUID5 generates deterministic UUIDs from namespace + name
namespace = uuid.UUID('00000000-0000-0000-0000-000000000000')
topic = uuid.uuid5(namespace, 'Neuroscience')
subject = uuid.uuid5(topic, 'Habenula')
print(f"Topic UUID: {topic}")
print(f"Subject UUID: {subject}")
# Same inputs always produce same UUIDs
assert uuid.uuid5(namespace, 'Neuroscience') == topicTopic UUID: 913e0174-a390-5c08-b50a-623690546dd5
Subject UUID: b5804c3f-57b1-54e3-8176-3b45aa443a97
Using UUIDs in DataJoint Tables¶
DataJoint natively supports UUIDs as a data type. When you declare an attribute as uuid, DataJoint automatically stores it as BINARY(16) in MySQL for efficient storage and indexing.
Let’s create examples demonstrating UUID usage in DataJoint:
import datajoint as djschema = dj.Schema('uuid')@schema
class Message(dj.Manual):
definition = """
message_id : uuid # internal message id
---
message_body : varchar(1000)
"""print(Message.describe())message_id : uuid # internal message id
---
message_body : varchar(1000)
# For the curious: Internally, DataJoint represents uuids as BINARY(16)
Message.heading['message_id'].sql'`message_id` binary(16) NOT NULL COMMENT ":uuid:internal message id"'Message.insert1((uuid.uuid1(), 'Hello, world!'))
Message.insert1((uuid.uuid1(), 'Cogito ergo sum'))Message()Message.insert1((uuid.uuid1(), 'I will be back'))
Message.insert1((uuid.uuid1(), 'Must destroy humans.'))b = uuid.uuid4().bytes
bb'\x8c~\xa5\x83\x9a5C\x90\xaf6%\x87+{#\x1e'uuid.UUID(bytes=b)UUID('8c7ea583-9a35-4390-af36-25872b7b231e')Message()Message.insert1((uuid.uuid4(), 'Hasta la vista baby'))Message()@schema
class Comment(dj.Manual):
definition = """
comment_id : uuid
---
-> Message
comment_body : varchar(1000)
"""# For the curious: This is how the table was declared in SQL
print(schema.connection.query('show create table `uuid`.`comment`').fetchall()[0][1])CREATE TABLE `comment` (
`comment_id` binary(16) NOT NULL COMMENT ':uuid:',
`message_id` binary(16) NOT NULL COMMENT ':uuid:internal message id',
`comment_body` varchar(1000) NOT NULL,
PRIMARY KEY (`comment_id`),
KEY `message_id` (`message_id`),
CONSTRAINT `comment_ibfk_1` FOREIGN KEY (`message_id`) REFERENCES `message` (`message_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
dj.Diagram(schema)keys = Message.fetch('KEY')keys[{'message_id': UUID('5d4e0c56-b8f1-11f0-966b-42531a4c8f38')},
{'message_id': UUID('5d4eba5c-b8f1-11f0-966b-42531a4c8f38')},
{'message_id': UUID('5dce3d18-b8f1-11f0-966b-42531a4c8f38')},
{'message_id': UUID('5dced91c-b8f1-11f0-966b-42531a4c8f38')},
{'message_id': UUID('9ca5cb3d-c989-4f0f-adf8-8eb0193f4bcc')}]Comment.insert1(dict(keys[0], comment_id=uuid.uuid1(), comment_body="thank you"))Message * CommentMessage * Comment & keys[0]Message & keys[1:4]Comment.insert1(dict(keys[1], comment_id=uuid.uuid1(), comment_body="thank you"))Comment()Message & CommentMessage - CommentMessage * CommentSummary¶
UUIDs provide a robust solution for generating surrogate keys in DataJoint:
| Aspect | Key Points |
|---|---|
| UUID1 | Time-based, sortable, good for primary keys needing order |
| UUID4 | Random, privacy-preserving, good for security tokens |
| UUID5 | Deterministic, good for content-addressable systems |
| Storage | DataJoint stores UUIDs as BINARY(16) for efficiency |
| Alternatives | ULID (sortable, compact) and NANOID (very compact) available |