Universal Sets¶
All query operators are designed to preserve the entity types of their inputs. However, some queries require creating a new entity type that is not represented by any stored tables. This means that a new entity type must be explicitly defined as part of the query. Universal sets fulfill this role.
Universal sets are used in DataJoint to define virtual tables with arbitrary primary key structures for use in query expressions.
A universal set, defined using class dj.U
, denotes the set of all possible entities with given attributes of any possible datatype.
Universal sets allow query expressions using virtual tables when no suitable base table exists.
Attributes of universal sets are allowed to be matched to any namesake attributes, even those that do not come from the same initial source.
For example, you may like to query the university database for the complete list of students’ home cities, along with the number of students from each city. The schema for the university database does not have a table for cities and states. A virtual table can fill the role of the nonexistent base table, allowing queries that would not be possible otherwise.
Note
dj.U
is not yet implemented in MATLAB.
The feature will be added in an upcoming release: https://github.com/datajoint/datajoint-matlab/issues/144
% All home cities of students
dj.U('home_city', 'home_state') & university.Student
% Total number of students from each city
aggr(dj.U('home_city', 'home_state'), university.Student, 'count(*)->n')
% Total number of students from each state
aggr(U('home_state'), university.Student, 'count(*)->n')
% Total number of students in the database
aggr(U(), university.Student, 'count(*)->n')
The result of aggregation on a universal set is restricted to the entities with matches in the aggregated table, such as Student
in the example above.
In other words, X.aggr(A, ...)
is interpreted as (X & A).aggr(A, ...)
for universal set X
.
All attributes of a universal set are considered primary.
Universal sets should be used sparingly when no suitable base tables already exist. In some cases, defining a new base table can make queries clearer and more semantically constrained.