Skip to content

expression.py

QueryExpression

QueryExpression implements query operators to derive new entity set from its input. A QueryExpression object generates a SELECT statement in SQL. QueryExpression operators are restrict, join, proj, aggr, and union.

A QueryExpression object has a support, a restriction (an AndList), and heading. Property heading (type dj.Heading) contains information about the attributes. It is loaded from the database and updated by proj.

Property support is the list of table names or other QueryExpressions to be joined.

The restriction is applied first without having access to the attributes generated by the projection. Then projection is applied by selecting modifying the heading attribute.

Application of operators does not always lead to the creation of a subquery. A subquery is generated when: 1. A restriction is applied on any computed or renamed attributes 2. A projection is applied remapping remapped attributes 3. Subclasses: Join, Aggregation, and Union have additional specific rules.

Source code in datajoint/expression.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
class QueryExpression:
    """
    QueryExpression implements query operators to derive new entity set from its input.
    A QueryExpression object generates a SELECT statement in SQL.
    QueryExpression operators are restrict, join, proj, aggr, and union.

    A QueryExpression object has a support, a restriction (an AndList), and heading.
    Property `heading` (type dj.Heading) contains information about the attributes.
    It is loaded from the database and updated by proj.

    Property `support` is the list of table names or other QueryExpressions to be joined.

    The restriction is applied first without having access to the attributes generated by the projection.
    Then projection is applied by selecting modifying the heading attribute.

    Application of operators does not always lead to the creation of a subquery.
    A subquery is generated when:
        1. A restriction is applied on any computed or renamed attributes
        2. A projection is applied remapping remapped attributes
        3. Subclasses: Join, Aggregation, and Union have additional specific rules.
    """

    _restriction = None
    _restriction_attributes = None
    _left = []  # list of booleans True for left joins, False for inner joins
    _original_heading = None  # heading before projections

    # subclasses or instantiators must provide values
    _connection = None
    _heading = None
    _support = None
    _top = None

    # If the query will be using distinct
    _distinct = False

    @property
    def connection(self):
        """a dj.Connection object"""
        assert self._connection is not None
        return self._connection

    @property
    def support(self):
        """A list of table names or subqueries to from the FROM clause"""
        assert self._support is not None
        return self._support

    @property
    def heading(self):
        """a dj.Heading object, reflects the effects of the projection operator .proj"""
        return self._heading

    @property
    def original_heading(self):
        """a dj.Heading object reflecting the attributes before projection"""
        return self._original_heading or self.heading

    @property
    def restriction(self):
        """a AndList object of restrictions applied to input to produce the result"""
        if self._restriction is None:
            self._restriction = AndList()
        return self._restriction

    @property
    def restriction_attributes(self):
        """the set of attribute names invoked in the WHERE clause"""
        if self._restriction_attributes is None:
            self._restriction_attributes = set()
        return self._restriction_attributes

    @property
    def primary_key(self):
        return self.heading.primary_key

    _subquery_alias_count = count()  # count for alias names used in the FROM clause

    def from_clause(self):
        support = (
            (
                "(" + src.make_sql() + ") as `$%x`" % next(self._subquery_alias_count)
                if isinstance(src, QueryExpression)
                else src
            )
            for src in self.support
        )
        clause = next(support)
        for s, left in zip(support, self._left):
            clause += " NATURAL{left} JOIN {clause}".format(
                left=" LEFT" if left else "", clause=s
            )
        return clause

    def where_clause(self):
        return (
            ""
            if not self.restriction
            else " WHERE (%s)" % ")AND(".join(str(s) for s in self.restriction)
        )

    def sorting_clauses(self):
        if not self._top:
            return ""
        clause = ", ".join(
            _wrap_attributes(
                _flatten_attribute_list(self.primary_key, self._top.order_by)
            )
        )
        if clause:
            clause = f" ORDER BY {clause}"
        if self._top.limit is not None:
            clause += f" LIMIT {self._top.limit}{f' OFFSET {self._top.offset}' if self._top.offset else ''}"

        return clause

    def make_sql(self, fields=None):
        """
        Make the SQL SELECT statement.

        :param fields: used to explicitly set the select attributes
        """
        return "SELECT {distinct}{fields} FROM {from_}{where}{sorting}".format(
            distinct="DISTINCT " if self._distinct else "",
            fields=self.heading.as_sql(fields or self.heading.names),
            from_=self.from_clause(),
            where=self.where_clause(),
            sorting=self.sorting_clauses(),
        )

    # --------- query operators -----------
    def make_subquery(self):
        """create a new SELECT statement where self is the FROM clause"""
        result = QueryExpression()
        result._connection = self.connection
        result._support = [self]
        result._heading = self.heading.make_subquery_heading()
        return result

    def restrict(self, restriction):
        """
        Produces a new expression with the new restriction applied.
        rel.restrict(restriction)  is equivalent to  rel & restriction.
        rel.restrict(Not(restriction))  is equivalent to  rel - restriction
        The primary key of the result is unaffected.
        Successive restrictions are combined as logical AND:   r & a & b  is equivalent to r & AndList((a, b))
        Any QueryExpression, collection, or sequence other than an AndList are treated as OrLists
        (logical disjunction of conditions)
        Inverse restriction is accomplished by either using the subtraction operator or the Not class.

        The expressions in each row equivalent:

        rel & True                          rel
        rel & False                         the empty entity set
        rel & 'TRUE'                        rel
        rel & 'FALSE'                       the empty entity set
        rel - cond                          rel & Not(cond)
        rel - 'TRUE'                        rel & False
        rel - 'FALSE'                       rel
        rel & AndList((cond1,cond2))        rel & cond1 & cond2
        rel & AndList()                     rel
        rel & [cond1, cond2]                rel & OrList((cond1, cond2))
        rel & []                            rel & False
        rel & None                          rel & False
        rel & any_empty_entity_set          rel & False
        rel - AndList((cond1,cond2))        rel & [Not(cond1), Not(cond2)]
        rel - [cond1, cond2]                rel & Not(cond1) & Not(cond2)
        rel - AndList()                     rel & False
        rel - []                            rel
        rel - None                          rel
        rel - any_empty_entity_set          rel

        When arg is another QueryExpression, the restriction  rel & arg  restricts rel to elements that match at least
        one element in arg (hence arg is treated as an OrList).
        Conversely,  rel - arg  restricts rel to elements that do not match any elements in arg.
        Two elements match when their common attributes have equal values or when they have no common attributes.
        All shared attributes must be in the primary key of either rel or arg or both or an error will be raised.

        QueryExpression.restrict is the only access point that modifies restrictions. All other operators must
        ultimately call restrict()

        :param restriction: a sequence or an array (treated as OR list), another QueryExpression, an SQL condition
        string, or an AndList.
        """
        attributes = set()
        if isinstance(restriction, Top):
            result = (
                self.make_subquery()
                if self._top and not self._top.__eq__(restriction)
                else copy.copy(self)
            )  # make subquery to avoid overwriting existing Top
            result._top = restriction
            return result
        new_condition = make_condition(self, restriction, attributes)
        if new_condition is True:
            return self  # restriction has no effect, return the same object
        # check that all attributes in condition are present in the query
        try:
            raise DataJointError(
                "Attribute `%s` is not found in query."
                % next(attr for attr in attributes if attr not in self.heading.names)
            )
        except StopIteration:
            pass  # all ok
        # If the new condition uses any new attributes, a subquery is required.
        # However, Aggregation's HAVING statement works fine with aliased attributes.
        need_subquery = (
            isinstance(self, Union)
            or (not isinstance(self, Aggregation) and self.heading.new_attributes)
            or self._top
        )
        if need_subquery:
            result = self.make_subquery()
        else:
            result = copy.copy(self)
            result._restriction = AndList(
                self.restriction
            )  # copy to preserve the original
        result.restriction.append(new_condition)
        result.restriction_attributes.update(attributes)
        return result

    def restrict_in_place(self, restriction):
        self.__dict__.update(self.restrict(restriction).__dict__)

    def __and__(self, restriction):
        """
        Restriction operator e.g. ``q1 & q2``.
        :return: a restricted copy of the input argument
        See QueryExpression.restrict for more detail.
        """
        return self.restrict(restriction)

    def __xor__(self, restriction):
        """
        Permissive restriction operator ignoring compatibility check  e.g. ``q1 ^ q2``.
        """
        if inspect.isclass(restriction) and issubclass(restriction, QueryExpression):
            restriction = restriction()
        if isinstance(restriction, Not):
            return self.restrict(Not(PromiscuousOperand(restriction.restriction)))
        return self.restrict(PromiscuousOperand(restriction))

    def __sub__(self, restriction):
        """
        Inverted restriction e.g. ``q1 - q2``.
        :return: a restricted copy of the input argument
        See QueryExpression.restrict for more detail.
        """
        return self.restrict(Not(restriction))

    def __neg__(self):
        """
        Convert between restriction and inverted restriction e.g. ``-q1``.
        :return: target restriction
        See QueryExpression.restrict for more detail.
        """
        if isinstance(self, Not):
            return self.restriction
        return Not(self)

    def __mul__(self, other):
        """
        join of query expressions `self` and `other` e.g. ``q1 * q2``.
        """
        return self.join(other)

    def __matmul__(self, other):
        """
        Permissive join of query expressions `self` and `other` ignoring compatibility check
            e.g. ``q1 @ q2``.
        """
        if inspect.isclass(other) and issubclass(other, QueryExpression):
            other = other()  # instantiate
        return self.join(other, semantic_check=False)

    def join(self, other, semantic_check=True, left=False):
        """
        create the joined QueryExpression.
        a * b  is short for A.join(B)
        a @ b  is short for A.join(B, semantic_check=False)
        Additionally, left=True will retain the rows of self, effectively performing a left join.
        """
        # trigger subqueries if joining on renamed attributes
        if isinstance(other, U):
            return other * self
        if inspect.isclass(other) and issubclass(other, QueryExpression):
            other = other()  # instantiate
        if not isinstance(other, QueryExpression):
            raise DataJointError("The argument of join must be a QueryExpression")
        if semantic_check:
            assert_join_compatibility(self, other)
        join_attributes = set(n for n in self.heading.names if n in other.heading.names)
        # needs subquery if self's FROM clause has common attributes with other's FROM clause
        need_subquery1 = need_subquery2 = bool(
            (set(self.original_heading.names) & set(other.original_heading.names))
            - join_attributes
        )
        # need subquery if any of the join attributes are derived
        need_subquery1 = (
            need_subquery1
            or isinstance(self, Aggregation)
            or any(n in self.heading.new_attributes for n in join_attributes)
            or isinstance(self, Union)
        )
        need_subquery2 = (
            need_subquery2
            or isinstance(other, Aggregation)
            or any(n in other.heading.new_attributes for n in join_attributes)
            or isinstance(self, Union)
        )
        if need_subquery1:
            self = self.make_subquery()
        if need_subquery2:
            other = other.make_subquery()
        result = QueryExpression()
        result._connection = self.connection
        result._support = self.support + other.support
        result._left = self._left + [left] + other._left
        result._heading = self.heading.join(other.heading)
        result._restriction = AndList(self.restriction)
        result._restriction.append(other.restriction)
        result._original_heading = self.original_heading.join(other.original_heading)
        assert len(result.support) == len(result._left) + 1
        return result

    def __add__(self, other):
        """union e.g. ``q1 + q2``."""
        return Union.create(self, other)

    def proj(self, *attributes, **named_attributes):
        """
        Projection operator.

        :param attributes:  attributes to be included in the result. (The primary key is already included).
        :param named_attributes: new attributes computed or renamed from existing attributes.
        :return: the projected expression.
        Primary key attributes cannot be excluded but may be renamed.
        If the attribute list contains an Ellipsis ..., then all secondary attributes are included too
        Prefixing an attribute name with a dash '-attr' removes the attribute from the list if present.
        Keyword arguments can be used to rename attributes as in name='attr', duplicate them as in name='(attr)', or
        self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self)
        self.proj() -- include only primary key
        self.proj('attr1', 'attr2')  -- include primary key and attributes attr1 and attr2
        self.proj(..., '-attr1', '-attr2')  -- include all attributes except attr1 and attr2
        self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1
        self.proj('attr1', dup='(attr1)') -- include primary key and attribute attr1 twice, with the duplicate 'dup'
        self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax)
        from other attributes available before the projection.
        Each attribute name can only be used once.
        """
        named_attributes = {
            k: translate_attribute(v)[1] for k, v in named_attributes.items()
        }
        # new attributes in parentheses are included again with the new name without removing original
        duplication_pattern = re.compile(
            rf'^\s*\(\s*(?!{"|".join(CONSTANT_LITERALS)})(?P<name>[a-zA-Z_]\w*)\s*\)\s*$'
        )
        # attributes without parentheses renamed
        rename_pattern = re.compile(
            rf'^\s*(?!{"|".join(CONSTANT_LITERALS)})(?P<name>[a-zA-Z_]\w*)\s*$'
        )
        replicate_map = {
            k: m.group("name")
            for k, m in (
                (k, duplication_pattern.match(v)) for k, v in named_attributes.items()
            )
            if m
        }
        rename_map = {
            k: m.group("name")
            for k, m in (
                (k, rename_pattern.match(v)) for k, v in named_attributes.items()
            )
            if m
        }
        compute_map = {
            k: v
            for k, v in named_attributes.items()
            if not duplication_pattern.match(v) and not rename_pattern.match(v)
        }
        attributes = set(attributes)
        # include primary key
        attributes.update((k for k in self.primary_key if k not in rename_map.values()))
        # include all secondary attributes with Ellipsis
        if Ellipsis in attributes:
            attributes.discard(Ellipsis)
            attributes.update(
                (
                    a
                    for a in self.heading.secondary_attributes
                    if a not in attributes and a not in rename_map.values()
                )
            )
        try:
            raise DataJointError(
                "%s is not a valid data type for an attribute in .proj"
                % next(a for a in attributes if not isinstance(a, str))
            )
        except StopIteration:
            pass  # normal case
        # remove excluded attributes, specified as `-attr'
        excluded = set(a for a in attributes if a.strip().startswith("-"))
        attributes.difference_update(excluded)
        excluded = set(a.lstrip("-").strip() for a in excluded)
        attributes.difference_update(excluded)
        try:
            raise DataJointError(
                "Cannot exclude primary key attribute %s",
                next(a for a in excluded if a in self.primary_key),
            )
        except StopIteration:
            pass  # all ok
        # check that all attributes exist in heading
        try:
            raise DataJointError(
                "Attribute `%s` not found."
                % next(a for a in attributes if a not in self.heading.names)
            )
        except StopIteration:
            pass  # all ok

        # check that all mentioned names are present in heading
        mentions = attributes.union(replicate_map.values()).union(rename_map.values())
        try:
            raise DataJointError(
                "Attribute '%s' not found."
                % next(a for a in mentions if not self.heading.names)
            )
        except StopIteration:
            pass  # all ok

        # check that newly created attributes do not clash with any other selected attributes
        try:
            raise DataJointError(
                "Attribute `%s` already exists"
                % next(
                    a
                    for a in rename_map
                    if a in attributes.union(compute_map).union(replicate_map)
                )
            )
        except StopIteration:
            pass  # all ok
        try:
            raise DataJointError(
                "Attribute `%s` already exists"
                % next(
                    a
                    for a in compute_map
                    if a in attributes.union(rename_map).union(replicate_map)
                )
            )
        except StopIteration:
            pass  # all ok
        try:
            raise DataJointError(
                "Attribute `%s` already exists"
                % next(
                    a
                    for a in replicate_map
                    if a in attributes.union(rename_map).union(compute_map)
                )
            )
        except StopIteration:
            pass  # all ok

        # need a subquery if the projection remaps any remapped attributes
        used = set(q for v in compute_map.values() for q in extract_column_names(v))
        used.update(rename_map.values())
        used.update(replicate_map.values())
        used.intersection_update(self.heading.names)
        need_subquery = isinstance(self, Union) or any(
            self.heading[name].attribute_expression is not None for name in used
        )
        if not need_subquery and self.restriction:
            # need a subquery if the restriction applies to attributes that have been renamed
            need_subquery = any(
                name in self.restriction_attributes
                for name in self.heading.new_attributes
            )

        result = self.make_subquery() if need_subquery else copy.copy(self)
        result._original_heading = result.original_heading
        result._heading = result.heading.select(
            attributes,
            rename_map=dict(**rename_map, **replicate_map),
            compute_map=compute_map,
        )
        return result

    def aggr(self, group, *attributes, keep_all_rows=False, **named_attributes):
        """
        Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression")
        has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of `group`.

        :param group:  The query expression to be aggregated.
        :param keep_all_rows: True=keep all the rows from self. False=keep only rows that match entries in group.
        :param named_attributes: computations of the form new_attribute="sql expression on attributes of group"
        :return: The derived query expression
        """
        if Ellipsis in attributes:
            # expand ellipsis to include only attributes from the left table
            attributes = set(attributes)
            attributes.discard(Ellipsis)
            attributes.update(self.heading.secondary_attributes)
        return Aggregation.create(self, group=group, keep_all_rows=keep_all_rows).proj(
            *attributes, **named_attributes
        )

    aggregate = aggr  # alias for aggr

    # ---------- Fetch operators --------------------
    @property
    def fetch1(self):
        return Fetch1(self)

    @property
    def fetch(self):
        return Fetch(self)

    def head(self, limit=25, **fetch_kwargs):
        """
        shortcut to fetch the first few entries from query expression.
        Equivalent to fetch(order_by="KEY", limit=25)

        :param limit:  number of entries
        :param fetch_kwargs: kwargs for fetch
        :return: query result
        """
        return self.fetch(order_by="KEY", limit=limit, **fetch_kwargs)

    def tail(self, limit=25, **fetch_kwargs):
        """
        shortcut to fetch the last few entries from query expression.
        Equivalent to fetch(order_by="KEY DESC", limit=25)[::-1]

        :param limit:  number of entries
        :param fetch_kwargs: kwargs for fetch
        :return: query result
        """
        return self.fetch(order_by="KEY DESC", limit=limit, **fetch_kwargs)[::-1]

    def __len__(self):
        """:return: number of elements in the result set e.g. ``len(q1)``."""
        result = self.make_subquery() if self._top else copy.copy(self)
        return result.connection.query(
            "SELECT {select_} FROM {from_}{where}".format(
                select_=(
                    "count(*)"
                    if any(result._left)
                    else "count(DISTINCT {fields})".format(
                        fields=result.heading.as_sql(
                            result.primary_key, include_aliases=False
                        )
                    )
                ),
                from_=result.from_clause(),
                where=result.where_clause(),
            )
        ).fetchone()[0]

    def __bool__(self):
        """
        :return: True if the result is not empty. Equivalent to len(self) > 0 but often
            faster e.g. ``bool(q1)``.
        """
        return bool(
            self.connection.query(
                "SELECT EXISTS(SELECT 1 FROM {from_}{where})".format(
                    from_=self.from_clause(), where=self.where_clause()
                )
            ).fetchone()[0]
        )

    def __contains__(self, item):
        """
        returns True if the restriction in item matches any entries in self
            e.g. ``restriction in q1``.

        :param item: any restriction
        (item in query_expression) is equivalent to bool(query_expression & item) but may be
        executed more efficiently.
        """
        return bool(self & item)  # May be optimized e.g. using an EXISTS query

    def __iter__(self):
        """
        returns an iterator-compatible QueryExpression object e.g. ``iter(q1)``.

        :param self: iterator-compatible QueryExpression object
        """
        self._iter_only_key = all(v.in_key for v in self.heading.attributes.values())
        self._iter_keys = self.fetch("KEY")
        return self

    def __next__(self):
        """
        returns the next record on an iterator-compatible QueryExpression object
            e.g. ``next(q1)``.

        :param self: A query expression
        :type self: :class:`QueryExpression`
        :rtype: dict
        """
        try:
            key = self._iter_keys.pop(0)
        except AttributeError:
            # self._iter_keys is missing because __iter__ has not been called.
            raise TypeError(
                "A QueryExpression object is not an iterator. "
                "Use iter(obj) to create an iterator."
            )
        except IndexError:
            raise StopIteration
        else:
            if self._iter_only_key:
                return key
            else:
                try:
                    return (self & key).fetch1()
                except DataJointError:
                    # The data may have been deleted since the moment the keys were fetched
                    # -- move on to next entry.
                    return next(self)

    def cursor(self, as_dict=False):
        """
        See expression.fetch() for input description.
        :return: query cursor
        """
        sql = self.make_sql()
        logger.debug(sql)
        return self.connection.query(sql, as_dict=as_dict)

    def __repr__(self):
        """
        returns the string representation of a QueryExpression object e.g. ``str(q1)``.

        :param self: A query expression
        :type self: :class:`QueryExpression`
        :rtype: str
        """
        return (
            super().__repr__()
            if config["loglevel"].lower() == "debug"
            else self.preview()
        )

    def preview(self, limit=None, width=None):
        """:return: a string of preview of the contents of the query."""
        return preview(self, limit, width)

    def _repr_html_(self):
        """:return: HTML to display table in Jupyter notebook."""
        return repr_html(self)

connection property

a dj.Connection object

support property

A list of table names or subqueries to from the FROM clause

heading property

a dj.Heading object, reflects the effects of the projection operator .proj

original_heading property

a dj.Heading object reflecting the attributes before projection

restriction property

a AndList object of restrictions applied to input to produce the result

restriction_attributes property

the set of attribute names invoked in the WHERE clause

make_sql(fields=None)

Make the SQL SELECT statement.

Parameters:

Name Type Description Default
fields

used to explicitly set the select attributes

None
Source code in datajoint/expression.py
141
142
143
144
145
146
147
148
149
150
151
152
153
def make_sql(self, fields=None):
    """
    Make the SQL SELECT statement.

    :param fields: used to explicitly set the select attributes
    """
    return "SELECT {distinct}{fields} FROM {from_}{where}{sorting}".format(
        distinct="DISTINCT " if self._distinct else "",
        fields=self.heading.as_sql(fields or self.heading.names),
        from_=self.from_clause(),
        where=self.where_clause(),
        sorting=self.sorting_clauses(),
    )

make_subquery()

create a new SELECT statement where self is the FROM clause

Source code in datajoint/expression.py
156
157
158
159
160
161
162
def make_subquery(self):
    """create a new SELECT statement where self is the FROM clause"""
    result = QueryExpression()
    result._connection = self.connection
    result._support = [self]
    result._heading = self.heading.make_subquery_heading()
    return result

restrict(restriction)

Produces a new expression with the new restriction applied. rel.restrict(restriction) is equivalent to rel & restriction. rel.restrict(Not(restriction)) is equivalent to rel - restriction The primary key of the result is unaffected. Successive restrictions are combined as logical AND: r & a & b is equivalent to r & AndList((a, b)) Any QueryExpression, collection, or sequence other than an AndList are treated as OrLists (logical disjunction of conditions) Inverse restriction is accomplished by either using the subtraction operator or the Not class.

The expressions in each row equivalent:

rel & True rel rel & False the empty entity set rel & 'TRUE' rel rel & 'FALSE' the empty entity set rel - cond rel & Not(cond) rel - 'TRUE' rel & False rel - 'FALSE' rel rel & AndList((cond1,cond2)) rel & cond1 & cond2 rel & AndList() rel rel & [cond1, cond2] rel & OrList((cond1, cond2)) rel & [] rel & False rel & None rel & False rel & any_empty_entity_set rel & False rel - AndList((cond1,cond2)) rel & [Not(cond1), Not(cond2)] rel - [cond1, cond2] rel & Not(cond1) & Not(cond2) rel - AndList() rel & False rel - [] rel rel - None rel rel - any_empty_entity_set rel

When arg is another QueryExpression, the restriction rel & arg restricts rel to elements that match at least one element in arg (hence arg is treated as an OrList). Conversely, rel - arg restricts rel to elements that do not match any elements in arg. Two elements match when their common attributes have equal values or when they have no common attributes. All shared attributes must be in the primary key of either rel or arg or both or an error will be raised.

QueryExpression.restrict is the only access point that modifies restrictions. All other operators must ultimately call restrict()

Parameters:

Name Type Description Default
restriction

a sequence or an array (treated as OR list), another QueryExpression, an SQL condition string, or an AndList.

required
Source code in datajoint/expression.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
def restrict(self, restriction):
    """
    Produces a new expression with the new restriction applied.
    rel.restrict(restriction)  is equivalent to  rel & restriction.
    rel.restrict(Not(restriction))  is equivalent to  rel - restriction
    The primary key of the result is unaffected.
    Successive restrictions are combined as logical AND:   r & a & b  is equivalent to r & AndList((a, b))
    Any QueryExpression, collection, or sequence other than an AndList are treated as OrLists
    (logical disjunction of conditions)
    Inverse restriction is accomplished by either using the subtraction operator or the Not class.

    The expressions in each row equivalent:

    rel & True                          rel
    rel & False                         the empty entity set
    rel & 'TRUE'                        rel
    rel & 'FALSE'                       the empty entity set
    rel - cond                          rel & Not(cond)
    rel - 'TRUE'                        rel & False
    rel - 'FALSE'                       rel
    rel & AndList((cond1,cond2))        rel & cond1 & cond2
    rel & AndList()                     rel
    rel & [cond1, cond2]                rel & OrList((cond1, cond2))
    rel & []                            rel & False
    rel & None                          rel & False
    rel & any_empty_entity_set          rel & False
    rel - AndList((cond1,cond2))        rel & [Not(cond1), Not(cond2)]
    rel - [cond1, cond2]                rel & Not(cond1) & Not(cond2)
    rel - AndList()                     rel & False
    rel - []                            rel
    rel - None                          rel
    rel - any_empty_entity_set          rel

    When arg is another QueryExpression, the restriction  rel & arg  restricts rel to elements that match at least
    one element in arg (hence arg is treated as an OrList).
    Conversely,  rel - arg  restricts rel to elements that do not match any elements in arg.
    Two elements match when their common attributes have equal values or when they have no common attributes.
    All shared attributes must be in the primary key of either rel or arg or both or an error will be raised.

    QueryExpression.restrict is the only access point that modifies restrictions. All other operators must
    ultimately call restrict()

    :param restriction: a sequence or an array (treated as OR list), another QueryExpression, an SQL condition
    string, or an AndList.
    """
    attributes = set()
    if isinstance(restriction, Top):
        result = (
            self.make_subquery()
            if self._top and not self._top.__eq__(restriction)
            else copy.copy(self)
        )  # make subquery to avoid overwriting existing Top
        result._top = restriction
        return result
    new_condition = make_condition(self, restriction, attributes)
    if new_condition is True:
        return self  # restriction has no effect, return the same object
    # check that all attributes in condition are present in the query
    try:
        raise DataJointError(
            "Attribute `%s` is not found in query."
            % next(attr for attr in attributes if attr not in self.heading.names)
        )
    except StopIteration:
        pass  # all ok
    # If the new condition uses any new attributes, a subquery is required.
    # However, Aggregation's HAVING statement works fine with aliased attributes.
    need_subquery = (
        isinstance(self, Union)
        or (not isinstance(self, Aggregation) and self.heading.new_attributes)
        or self._top
    )
    if need_subquery:
        result = self.make_subquery()
    else:
        result = copy.copy(self)
        result._restriction = AndList(
            self.restriction
        )  # copy to preserve the original
    result.restriction.append(new_condition)
    result.restriction_attributes.update(attributes)
    return result

join(other, semantic_check=True, left=False)

create the joined QueryExpression. a * b is short for A.join(B) a @ b is short for A.join(B, semantic_check=False) Additionally, left=True will retain the rows of self, effectively performing a left join.

Source code in datajoint/expression.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
def join(self, other, semantic_check=True, left=False):
    """
    create the joined QueryExpression.
    a * b  is short for A.join(B)
    a @ b  is short for A.join(B, semantic_check=False)
    Additionally, left=True will retain the rows of self, effectively performing a left join.
    """
    # trigger subqueries if joining on renamed attributes
    if isinstance(other, U):
        return other * self
    if inspect.isclass(other) and issubclass(other, QueryExpression):
        other = other()  # instantiate
    if not isinstance(other, QueryExpression):
        raise DataJointError("The argument of join must be a QueryExpression")
    if semantic_check:
        assert_join_compatibility(self, other)
    join_attributes = set(n for n in self.heading.names if n in other.heading.names)
    # needs subquery if self's FROM clause has common attributes with other's FROM clause
    need_subquery1 = need_subquery2 = bool(
        (set(self.original_heading.names) & set(other.original_heading.names))
        - join_attributes
    )
    # need subquery if any of the join attributes are derived
    need_subquery1 = (
        need_subquery1
        or isinstance(self, Aggregation)
        or any(n in self.heading.new_attributes for n in join_attributes)
        or isinstance(self, Union)
    )
    need_subquery2 = (
        need_subquery2
        or isinstance(other, Aggregation)
        or any(n in other.heading.new_attributes for n in join_attributes)
        or isinstance(self, Union)
    )
    if need_subquery1:
        self = self.make_subquery()
    if need_subquery2:
        other = other.make_subquery()
    result = QueryExpression()
    result._connection = self.connection
    result._support = self.support + other.support
    result._left = self._left + [left] + other._left
    result._heading = self.heading.join(other.heading)
    result._restriction = AndList(self.restriction)
    result._restriction.append(other.restriction)
    result._original_heading = self.original_heading.join(other.original_heading)
    assert len(result.support) == len(result._left) + 1
    return result

proj(*attributes, **named_attributes)

Projection operator.

Parameters:

Name Type Description Default
attributes

attributes to be included in the result. (The primary key is already included).

()
named_attributes

new attributes computed or renamed from existing attributes.

{}

Returns:

Type Description

the projected expression. Primary key attributes cannot be excluded but may be renamed. If the attribute list contains an Ellipsis ..., then all secondary attributes are included too Prefixing an attribute name with a dash '-attr' removes the attribute from the list if present. Keyword arguments can be used to rename attributes as in name='attr', duplicate them as in name='(attr)', or self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self) self.proj() -- include only primary key self.proj('attr1', 'attr2') -- include primary key and attributes attr1 and attr2 self.proj(..., '-attr1', '-attr2') -- include all attributes except attr1 and attr2 self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1 self.proj('attr1', dup='(attr1)') -- include primary key and attribute attr1 twice, with the duplicate 'dup' self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax) from other attributes available before the projection. Each attribute name can only be used once.

Source code in datajoint/expression.py
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
def proj(self, *attributes, **named_attributes):
    """
    Projection operator.

    :param attributes:  attributes to be included in the result. (The primary key is already included).
    :param named_attributes: new attributes computed or renamed from existing attributes.
    :return: the projected expression.
    Primary key attributes cannot be excluded but may be renamed.
    If the attribute list contains an Ellipsis ..., then all secondary attributes are included too
    Prefixing an attribute name with a dash '-attr' removes the attribute from the list if present.
    Keyword arguments can be used to rename attributes as in name='attr', duplicate them as in name='(attr)', or
    self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self)
    self.proj() -- include only primary key
    self.proj('attr1', 'attr2')  -- include primary key and attributes attr1 and attr2
    self.proj(..., '-attr1', '-attr2')  -- include all attributes except attr1 and attr2
    self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1
    self.proj('attr1', dup='(attr1)') -- include primary key and attribute attr1 twice, with the duplicate 'dup'
    self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax)
    from other attributes available before the projection.
    Each attribute name can only be used once.
    """
    named_attributes = {
        k: translate_attribute(v)[1] for k, v in named_attributes.items()
    }
    # new attributes in parentheses are included again with the new name without removing original
    duplication_pattern = re.compile(
        rf'^\s*\(\s*(?!{"|".join(CONSTANT_LITERALS)})(?P<name>[a-zA-Z_]\w*)\s*\)\s*$'
    )
    # attributes without parentheses renamed
    rename_pattern = re.compile(
        rf'^\s*(?!{"|".join(CONSTANT_LITERALS)})(?P<name>[a-zA-Z_]\w*)\s*$'
    )
    replicate_map = {
        k: m.group("name")
        for k, m in (
            (k, duplication_pattern.match(v)) for k, v in named_attributes.items()
        )
        if m
    }
    rename_map = {
        k: m.group("name")
        for k, m in (
            (k, rename_pattern.match(v)) for k, v in named_attributes.items()
        )
        if m
    }
    compute_map = {
        k: v
        for k, v in named_attributes.items()
        if not duplication_pattern.match(v) and not rename_pattern.match(v)
    }
    attributes = set(attributes)
    # include primary key
    attributes.update((k for k in self.primary_key if k not in rename_map.values()))
    # include all secondary attributes with Ellipsis
    if Ellipsis in attributes:
        attributes.discard(Ellipsis)
        attributes.update(
            (
                a
                for a in self.heading.secondary_attributes
                if a not in attributes and a not in rename_map.values()
            )
        )
    try:
        raise DataJointError(
            "%s is not a valid data type for an attribute in .proj"
            % next(a for a in attributes if not isinstance(a, str))
        )
    except StopIteration:
        pass  # normal case
    # remove excluded attributes, specified as `-attr'
    excluded = set(a for a in attributes if a.strip().startswith("-"))
    attributes.difference_update(excluded)
    excluded = set(a.lstrip("-").strip() for a in excluded)
    attributes.difference_update(excluded)
    try:
        raise DataJointError(
            "Cannot exclude primary key attribute %s",
            next(a for a in excluded if a in self.primary_key),
        )
    except StopIteration:
        pass  # all ok
    # check that all attributes exist in heading
    try:
        raise DataJointError(
            "Attribute `%s` not found."
            % next(a for a in attributes if a not in self.heading.names)
        )
    except StopIteration:
        pass  # all ok

    # check that all mentioned names are present in heading
    mentions = attributes.union(replicate_map.values()).union(rename_map.values())
    try:
        raise DataJointError(
            "Attribute '%s' not found."
            % next(a for a in mentions if not self.heading.names)
        )
    except StopIteration:
        pass  # all ok

    # check that newly created attributes do not clash with any other selected attributes
    try:
        raise DataJointError(
            "Attribute `%s` already exists"
            % next(
                a
                for a in rename_map
                if a in attributes.union(compute_map).union(replicate_map)
            )
        )
    except StopIteration:
        pass  # all ok
    try:
        raise DataJointError(
            "Attribute `%s` already exists"
            % next(
                a
                for a in compute_map
                if a in attributes.union(rename_map).union(replicate_map)
            )
        )
    except StopIteration:
        pass  # all ok
    try:
        raise DataJointError(
            "Attribute `%s` already exists"
            % next(
                a
                for a in replicate_map
                if a in attributes.union(rename_map).union(compute_map)
            )
        )
    except StopIteration:
        pass  # all ok

    # need a subquery if the projection remaps any remapped attributes
    used = set(q for v in compute_map.values() for q in extract_column_names(v))
    used.update(rename_map.values())
    used.update(replicate_map.values())
    used.intersection_update(self.heading.names)
    need_subquery = isinstance(self, Union) or any(
        self.heading[name].attribute_expression is not None for name in used
    )
    if not need_subquery and self.restriction:
        # need a subquery if the restriction applies to attributes that have been renamed
        need_subquery = any(
            name in self.restriction_attributes
            for name in self.heading.new_attributes
        )

    result = self.make_subquery() if need_subquery else copy.copy(self)
    result._original_heading = result.original_heading
    result._heading = result.heading.select(
        attributes,
        rename_map=dict(**rename_map, **replicate_map),
        compute_map=compute_map,
    )
    return result

aggr(group, *attributes, keep_all_rows=False, **named_attributes)

Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression") has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of group.

Parameters:

Name Type Description Default
group

The query expression to be aggregated.

required
keep_all_rows

True=keep all the rows from self. False=keep only rows that match entries in group.

False
named_attributes

computations of the form new_attribute="sql expression on attributes of group"

{}

Returns:

Type Description

The derived query expression

Source code in datajoint/expression.py
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
def aggr(self, group, *attributes, keep_all_rows=False, **named_attributes):
    """
    Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression")
    has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of `group`.

    :param group:  The query expression to be aggregated.
    :param keep_all_rows: True=keep all the rows from self. False=keep only rows that match entries in group.
    :param named_attributes: computations of the form new_attribute="sql expression on attributes of group"
    :return: The derived query expression
    """
    if Ellipsis in attributes:
        # expand ellipsis to include only attributes from the left table
        attributes = set(attributes)
        attributes.discard(Ellipsis)
        attributes.update(self.heading.secondary_attributes)
    return Aggregation.create(self, group=group, keep_all_rows=keep_all_rows).proj(
        *attributes, **named_attributes
    )

head(limit=25, **fetch_kwargs)

shortcut to fetch the first few entries from query expression. Equivalent to fetch(order_by="KEY", limit=25)

Parameters:

Name Type Description Default
limit

number of entries

25
fetch_kwargs

kwargs for fetch

{}

Returns:

Type Description

query result

Source code in datajoint/expression.py
546
547
548
549
550
551
552
553
554
555
def head(self, limit=25, **fetch_kwargs):
    """
    shortcut to fetch the first few entries from query expression.
    Equivalent to fetch(order_by="KEY", limit=25)

    :param limit:  number of entries
    :param fetch_kwargs: kwargs for fetch
    :return: query result
    """
    return self.fetch(order_by="KEY", limit=limit, **fetch_kwargs)

tail(limit=25, **fetch_kwargs)

shortcut to fetch the last few entries from query expression. Equivalent to fetch(order_by="KEY DESC", limit=25)[::-1]

Parameters:

Name Type Description Default
limit

number of entries

25
fetch_kwargs

kwargs for fetch

{}

Returns:

Type Description

query result

Source code in datajoint/expression.py
557
558
559
560
561
562
563
564
565
566
def tail(self, limit=25, **fetch_kwargs):
    """
    shortcut to fetch the last few entries from query expression.
    Equivalent to fetch(order_by="KEY DESC", limit=25)[::-1]

    :param limit:  number of entries
    :param fetch_kwargs: kwargs for fetch
    :return: query result
    """
    return self.fetch(order_by="KEY DESC", limit=limit, **fetch_kwargs)[::-1]

cursor(as_dict=False)

See expression.fetch() for input description.

Returns:

Type Description

query cursor

Source code in datajoint/expression.py
651
652
653
654
655
656
657
658
def cursor(self, as_dict=False):
    """
    See expression.fetch() for input description.
    :return: query cursor
    """
    sql = self.make_sql()
    logger.debug(sql)
    return self.connection.query(sql, as_dict=as_dict)

preview(limit=None, width=None)

Returns:

Type Description

a string of preview of the contents of the query.

Source code in datajoint/expression.py
674
675
676
def preview(self, limit=None, width=None):
    """:return: a string of preview of the contents of the query."""
    return preview(self, limit, width)

Aggregation

Bases: QueryExpression

Aggregation.create(arg, group, comp1='calc1', ..., compn='calcn') yields an entity set with primary key from arg. The computed arguments comp1, ..., compn use aggregation calculations on the attributes of group or simple projections and calculations on the attributes of arg. Aggregation is used QueryExpression.aggr and U.aggr. Aggregation is a private class in DataJoint, not exposed to users.

Source code in datajoint/expression.py
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
class Aggregation(QueryExpression):
    """
    Aggregation.create(arg, group, comp1='calc1', ..., compn='calcn')  yields an entity set
    with primary key from arg.
    The computed arguments comp1, ..., compn use aggregation calculations on the attributes of
    group or simple projections and calculations on the attributes of arg.
    Aggregation is used QueryExpression.aggr and U.aggr.
    Aggregation is a private class in DataJoint, not exposed to users.
    """

    _left_restrict = None  # the pre-GROUP BY conditions for the WHERE clause
    _subquery_alias_count = count()

    @classmethod
    def create(cls, arg, group, keep_all_rows=False):
        if inspect.isclass(group) and issubclass(group, QueryExpression):
            group = group()  # instantiate if a class
        assert isinstance(group, QueryExpression)
        if keep_all_rows and len(group.support) > 1 or group.heading.new_attributes:
            group = group.make_subquery()  # subquery if left joining a join
        join = arg.join(group, left=keep_all_rows)  # reuse the join logic
        result = cls()
        result._connection = join.connection
        result._heading = join.heading.set_primary_key(
            arg.primary_key
        )  # use left operand's primary key
        result._support = join.support
        result._left = join._left
        result._left_restrict = join.restriction  # WHERE clause applied before GROUP BY
        result._grouping_attributes = result.primary_key

        return result

    def where_clause(self):
        return (
            ""
            if not self._left_restrict
            else " WHERE (%s)" % ")AND(".join(str(s) for s in self._left_restrict)
        )

    def make_sql(self, fields=None):
        fields = self.heading.as_sql(fields or self.heading.names)
        assert self._grouping_attributes or not self.restriction
        distinct = set(self.heading.names) == set(self.primary_key)
        return (
            "SELECT {distinct}{fields} FROM {from_}{where}{group_by}{sorting}".format(
                distinct="DISTINCT " if distinct else "",
                fields=fields,
                from_=self.from_clause(),
                where=self.where_clause(),
                group_by=(
                    ""
                    if not self.primary_key
                    else (
                        " GROUP BY `%s`" % "`,`".join(self._grouping_attributes)
                        + (
                            ""
                            if not self.restriction
                            else " HAVING (%s)" % ")AND(".join(self.restriction)
                        )
                    )
                ),
                sorting=self.sorting_clauses(),
            )
        )

    def __len__(self):
        return self.connection.query(
            "SELECT count(1) FROM ({subquery}) `${alias:x}`".format(
                subquery=self.make_sql(), alias=next(self._subquery_alias_count)
            )
        ).fetchone()[0]

    def __bool__(self):
        return bool(
            self.connection.query("SELECT EXISTS({sql})".format(sql=self.make_sql()))
        )

Union

Bases: QueryExpression

Union is the private DataJoint class that implements the union operator.

Source code in datajoint/expression.py
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
class Union(QueryExpression):
    """
    Union is the private DataJoint class that implements the union operator.
    """

    __count = count()

    @classmethod
    def create(cls, arg1, arg2):
        if inspect.isclass(arg2) and issubclass(arg2, QueryExpression):
            arg2 = arg2()  # instantiate if a class
        if not isinstance(arg2, QueryExpression):
            raise DataJointError(
                "A QueryExpression can only be unioned with another QueryExpression"
            )
        if arg1.connection != arg2.connection:
            raise DataJointError(
                "Cannot operate on QueryExpressions originating from different connections."
            )
        if set(arg1.primary_key) != set(arg2.primary_key):
            raise DataJointError(
                "The operands of a union must share the same primary key."
            )
        if set(arg1.heading.secondary_attributes) & set(
            arg2.heading.secondary_attributes
        ):
            raise DataJointError(
                "The operands of a union must not share any secondary attributes."
            )
        result = cls()
        result._connection = arg1.connection
        result._heading = arg1.heading.join(arg2.heading)
        result._support = [arg1, arg2]
        return result

    def make_sql(self):
        arg1, arg2 = self._support
        if (
            not arg1.heading.secondary_attributes
            and not arg2.heading.secondary_attributes
        ):
            # no secondary attributes: use UNION DISTINCT
            fields = arg1.primary_key
            return "SELECT * FROM (({sql1}) UNION ({sql2})) as `_u{alias}{sorting}`".format(
                sql1=(
                    arg1.make_sql()
                    if isinstance(arg1, Union)
                    else arg1.make_sql(fields)
                ),
                sql2=(
                    arg2.make_sql()
                    if isinstance(arg2, Union)
                    else arg2.make_sql(fields)
                ),
                alias=next(self.__count),
                sorting=self.sorting_clauses(),
            )
        # with secondary attributes, use union of left join with antijoin
        fields = self.heading.names
        sql1 = arg1.join(arg2, left=True).make_sql(fields)
        sql2 = (
            (arg2 - arg1)
            .proj(..., **{k: "NULL" for k in arg1.heading.secondary_attributes})
            .make_sql(fields)
        )
        return "({sql1})  UNION ({sql2})".format(sql1=sql1, sql2=sql2)

    def from_clause(self):
        """The union does not use a FROM clause"""
        assert False

    def where_clause(self):
        """The union does not use a WHERE clause"""
        assert False

    def __len__(self):
        return self.connection.query(
            "SELECT count(1) FROM ({subquery}) `${alias:x}`".format(
                subquery=self.make_sql(),
                alias=next(QueryExpression._subquery_alias_count),
            )
        ).fetchone()[0]

    def __bool__(self):
        return bool(
            self.connection.query("SELECT EXISTS({sql})".format(sql=self.make_sql()))
        )

from_clause()

The union does not use a FROM clause

Source code in datajoint/expression.py
829
830
831
def from_clause(self):
    """The union does not use a FROM clause"""
    assert False

where_clause()

The union does not use a WHERE clause

Source code in datajoint/expression.py
833
834
835
def where_clause(self):
    """The union does not use a WHERE clause"""
    assert False

U

dj.U objects are the universal sets representing all possible values of their attributes. dj.U objects cannot be queried on their own but are useful for forming some queries. dj.U('attr1', ..., 'attrn') represents the universal set with the primary key attributes attr1 ... attrn. The universal set is the set of all possible combinations of values of the attributes. Without any attributes, dj.U() represents the set with one element that has no attributes.

Restriction:

dj.U can be used to enumerate unique combinations of values of attributes from other expressions.

The following expression yields all unique combinations of contrast and brightness found in the stimulus set:

dj.U('contrast', 'brightness') & stimulus

Aggregation:

In aggregation, dj.U is used for summary calculation over an entire set:

The following expression yields one element with one attribute s containing the total number of elements in query expression expr:

dj.U().aggr(expr, n='count(*)')

The following expressions both yield one element containing the number n of distinct values of attribute attr in query expression expr.

dj.U().aggr(expr, n='count(distinct attr)') dj.U().aggr(dj.U('attr').aggr(expr), 'n=count(*)')

The following expression yields one element and one attribute s containing the sum of values of attribute attr over entire result set of expression expr:

dj.U().aggr(expr, s='sum(attr)')

The following expression yields the set of all unique combinations of attributes attr1, attr2 and the number of their occurrences in the result set of query expression expr.

dj.U(attr1,attr2).aggr(expr, n='count(*)')

Joins:

If expression expr has attributes 'attr1' and 'attr2', then expr * dj.U('attr1','attr2') yields the same result as expr but attr1 and attr2 are promoted to the the primary key. This is useful for producing a join on non-primary key attributes. For example, if attr is in both expr1 and expr2 but not in their primary keys, then expr1 * expr2 will throw an error because in most cases, it does not make sense to join on non-primary key attributes and users must first rename attr in one of the operands. The expression dj.U('attr') * rel1 * rel2 overrides this constraint.

Source code in datajoint/expression.py
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
class U:
    """
    dj.U objects are the universal sets representing all possible values of their attributes.
    dj.U objects cannot be queried on their own but are useful for forming some queries.
    dj.U('attr1', ..., 'attrn') represents the universal set with the primary key attributes attr1 ... attrn.
    The universal set is the set of all possible combinations of values of the attributes.
    Without any attributes, dj.U() represents the set with one element that has no attributes.

    Restriction:

    dj.U can be used to enumerate unique combinations of values of attributes from other expressions.

    The following expression yields all unique combinations of contrast and brightness found in the `stimulus` set:

    >>> dj.U('contrast', 'brightness') & stimulus

    Aggregation:

    In aggregation, dj.U is used for summary calculation over an entire set:

    The following expression yields one element with one attribute `s` containing the total number of elements in
    query expression `expr`:

    >>> dj.U().aggr(expr, n='count(*)')

    The following expressions both yield one element containing the number `n` of distinct values of attribute `attr` in
    query expression `expr`.

    >>> dj.U().aggr(expr, n='count(distinct attr)')
    >>> dj.U().aggr(dj.U('attr').aggr(expr), 'n=count(*)')

    The following expression yields one element and one attribute `s` containing the sum of values of attribute `attr`
    over entire result set of expression `expr`:

    >>> dj.U().aggr(expr, s='sum(attr)')

    The following expression yields the set of all unique combinations of attributes `attr1`, `attr2` and the number of
    their occurrences in the result set of query expression `expr`.

    >>> dj.U(attr1,attr2).aggr(expr, n='count(*)')

    Joins:

    If expression `expr` has attributes 'attr1' and 'attr2', then expr * dj.U('attr1','attr2') yields the same result
    as `expr` but `attr1` and `attr2` are promoted to the the primary key.  This is useful for producing a join on
    non-primary key attributes.
    For example, if `attr` is in both expr1 and expr2 but not in their primary keys, then expr1 * expr2 will throw
    an error because in most cases, it does not make sense to join on non-primary key attributes and users must first
    rename `attr` in one of the operands.  The expression dj.U('attr') * rel1 * rel2 overrides this constraint.
    """

    def __init__(self, *primary_key):
        self._primary_key = primary_key

    @property
    def primary_key(self):
        return self._primary_key

    def __and__(self, other):
        if inspect.isclass(other) and issubclass(other, QueryExpression):
            other = other()  # instantiate if a class
        if not isinstance(other, QueryExpression):
            raise DataJointError("Set U can only be restricted with a QueryExpression.")
        result = copy.copy(other)
        result._distinct = True
        result._heading = result.heading.set_primary_key(self.primary_key)
        result = result.proj()
        return result

    def join(self, other, left=False):
        """
        Joining U with a query expression has the effect of promoting the attributes of U to
        the primary key of the other query expression.

        :param other: the other query expression to join with.
        :param left: ignored. dj.U always acts as if left=False
        :return: a copy of the other query expression with the primary key extended.
        """
        if inspect.isclass(other) and issubclass(other, QueryExpression):
            other = other()  # instantiate if a class
        if not isinstance(other, QueryExpression):
            raise DataJointError("Set U can only be joined with a QueryExpression.")
        try:
            raise DataJointError(
                "Attribute `%s` not found"
                % next(k for k in self.primary_key if k not in other.heading.names)
            )
        except StopIteration:
            pass  # all ok
        result = copy.copy(other)
        result._heading = result.heading.set_primary_key(
            other.primary_key
            + [k for k in self.primary_key if k not in other.primary_key]
        )
        return result

    def __mul__(self, other):
        """shorthand for join"""
        return self.join(other)

    def aggr(self, group, **named_attributes):
        """
        Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression")
        has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of `group`.

        :param group:  The query expression to be aggregated.
        :param named_attributes: computations of the form new_attribute="sql expression on attributes of group"
        :return: The derived query expression
        """
        if named_attributes.get("keep_all_rows", False):
            raise DataJointError(
                "Cannot set keep_all_rows=True when aggregating on a universal set."
            )
        return Aggregation.create(self, group=group, keep_all_rows=False).proj(
            **named_attributes
        )

    aggregate = aggr  # alias for aggr

join(other, left=False)

Joining U with a query expression has the effect of promoting the attributes of U to the primary key of the other query expression.

Parameters:

Name Type Description Default
other

the other query expression to join with.

required
left

ignored. dj.U always acts as if left=False

False

Returns:

Type Description

a copy of the other query expression with the primary key extended.

Source code in datajoint/expression.py
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
def join(self, other, left=False):
    """
    Joining U with a query expression has the effect of promoting the attributes of U to
    the primary key of the other query expression.

    :param other: the other query expression to join with.
    :param left: ignored. dj.U always acts as if left=False
    :return: a copy of the other query expression with the primary key extended.
    """
    if inspect.isclass(other) and issubclass(other, QueryExpression):
        other = other()  # instantiate if a class
    if not isinstance(other, QueryExpression):
        raise DataJointError("Set U can only be joined with a QueryExpression.")
    try:
        raise DataJointError(
            "Attribute `%s` not found"
            % next(k for k in self.primary_key if k not in other.heading.names)
        )
    except StopIteration:
        pass  # all ok
    result = copy.copy(other)
    result._heading = result.heading.set_primary_key(
        other.primary_key
        + [k for k in self.primary_key if k not in other.primary_key]
    )
    return result

aggr(group, **named_attributes)

Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression") has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of group.

Parameters:

Name Type Description Default
group

The query expression to be aggregated.

required
named_attributes

computations of the form new_attribute="sql expression on attributes of group"

{}

Returns:

Type Description

The derived query expression

Source code in datajoint/expression.py
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
def aggr(self, group, **named_attributes):
    """
    Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression")
    has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of `group`.

    :param group:  The query expression to be aggregated.
    :param named_attributes: computations of the form new_attribute="sql expression on attributes of group"
    :return: The derived query expression
    """
    if named_attributes.get("keep_all_rows", False):
        raise DataJointError(
            "Cannot set keep_all_rows=True when aggregating on a universal set."
        )
    return Aggregation.create(self, group=group, keep_all_rows=False).proj(
        **named_attributes
    )