MySQL 9.1.0
Source Code Documentation
aggregate_check.h
Go to the documentation of this file.
1#ifndef AGGREGATE_CHECK_INCLUDED
2#define AGGREGATE_CHECK_INCLUDED
3
4/* Copyright (c) 2014, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/**
28 @file
29 Checks for some semantic constraints on queries using GROUP
30 BY, or aggregate functions, or DISTINCT. Enforced if
31 sql_mode contains 'only_full_group_by'.
32*/
33
34/**
35 @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY
36
37Checks for some semantic constraints on queries using GROUP BY, or aggregate
38functions, or DISTINCT (ONLY_FULL_GROUP_BY).
39
40We call "aggregation" the operation of taking a group of rows and replacing
41it with a single row. There are three types of aggregation: DISTINCT,
42implicit grouping, explicit grouping.
43
44This text describes MySQL's checks (why certain queries are rejected) and the
45rationale behind them.
46
47References:
48- WL#2489 "better only_full_group_by".
49- if you have access to the SQL standard, we recommend the following parts of
50"SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax
51rule 7 and Conformance rule 2; 4.19 functional dependencies.
52
53@section DISTINCT
54
55DISTINCT: all identical rows in the result of SELECT are "aggregated" to
56one single row - duplicates are eliminated.
57If the result of the SELECT without DISTINCT is
58@verbatim
591 2
603 4
611 2
62@endverbatim
63then the result with DISTINCT is
64@verbatim
651 2
663 4
67@endverbatim
68Here is a problematic query. Assume we have a table T which contains three
69columns C1,C2,C3 and has the following rows:
70@verbatim
71C1 C2 C3
721 2 A
733 4 B
741 2 C
75@endverbatim
76and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3".
77Because we want the result to be ordered, we have to first eliminate
78duplicates then order the result.
79When we eliminate duplicates, should we keep the first or the third row? This
80arbitrary choice will influence the retained value of C3, which will influence
81ordering. In the end, we have arbitrary ordering, which is problematic.
82To prevent this, if, in a query block 'sl', an ORDER BY expression
83is not the same expression as one in the SELECT list of 'sl'
84and contains a column which:
85- is of a table whose qualifying query block is 'sl'
86- and is not in the SELECT list of 'sl'
87
88then 'sl' should not have DISTINCT.
89This rule makes the query above invalid.
90Class Check_distinct implements this rule.
91
92Class Check_distinct implements a second rule, specific of set functions in
93ORDER BY (a non-standard feature).
94Consider these queries labelled (1), (2), (3), with DISTINCT and a set
95function in ORDER BY:
96@verbatim
97SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
98SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2);
99SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
100@endverbatim
101(1) is a random-order query, (2) and (3) are not.
102MySQL has traditionally been permissive, we want to allow (2) and (3).
103
104So the second rule is that Check_distinct rejects a query if it has DISTINCT
105and a set function in ORDER BY which is not the same expression as one in the
106SELECT list. This rejects (1) and allows (2).
107It would reject (3); luckily, before Check_distinct checks, DISTINCT is
108removed (it is redundant with GROUP BY) and thus the query is not processed by
109Check_distinct; the second rule is thus by-passed and (3) is correctly
110accepted.
111
112The implementation of Check_distinct works like this: if the query has
113DISTINCT, examine each element of the ORDER BY list: if the element is not the
114same expression as one in the SELECT list, then examine parts of the element
115(using Item::walk()), to spot offending set functions or columns.
116
117@section IMPLICIT_GROUPING Implicit grouping
118
119A query with set functions without GROUP BY can be seen as having GROUP BY()
120i.e. the set of grouping expressions is empty, all rows are part of a
121single group and are replaced with a single row. So it is just a sub-case of
122the next section.
123
124@section EXPLICIT_GROUPING Explicit grouping (GROUP BY)
125
126MySQL is more permissive than the standard, it allows to group by any
127expression; it does not require that every element of the GROUP BY clause
128should be a column of one table of the FROM clause.
129
130Here is a problematic query, using a table T with columns C1, C2, C3:
131@verbatim
132C1 C2 C3
1331 2 A
1343 4 B
1351 2 C
136
137SELECT C1, C3 FROM T GROUP BY C1;
138@endverbatim
139we first form groups, in each group the value of C1 must be the same for all
140rows. Consider the group made of the first and third row. We have to produce
141one row out of it, and this row must have a value for C3 because C3 is in the
142SELECT list. Among those two rows, which value of C3 should we choose? This is
143arbitrary, and will give a random result.
144
145To prevent this, in a query with GROUP BY or aggregates (known as "a grouped
146query"), any column referenced by an expression in the SELECT list or HAVING
147condition and belonging to one table of the FROM clause, should be one of the
148group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if
149functional dependencies are supported (as in MySQL 5.7), should be
150functionally dependent on the group columns.
151In the table T above, C3 is obviously not functionally dependent on {C1,C2}:
152the values of C1 and C2 for a row do not uniquely determine the value of
153C3. In other words, if two rows have the same value in C1 and the same value
154in C2 they do not necessarily have the same value in C3.
155So this rule correctly rejects the query above.
156
157Note that NULL is treated as one value: two NULLs are considered equal.
158
159In WL#2489, we have implemented the optional standard feature T301
160"functional dependencies" almost entirely.
161
162Here are the functional dependencies which we recognize.
163
164@subsection KEYFD Key-based, in a base table
165
166A key in this text is a unique constraint made of only non-NULLable
167columns. For example, a primary key.
168Considering a base table T, if two rows have the same values of all columns of
169a key of T they are actually one single row, so:
170{ all columns of key} -> { T.* } (notation: what is on the right of the arrow
171is functionally dependent on what is on the left).
172
173@subsection GCOLFD Generated-column-based, in a base table
174
175Considering a base table T, a generated column is functionally dependent on
176the set of columns it references (the so-called parametric
177columns). Note that the SQL standard doesn't allow a parametric column
178to be a generated column, but MySQL does.
179
180@subsection INNEREQ Equality-based, in the result of an inner join
181
182Considering two tables T1 and T2, a condition C, and the result R of an inner
183join T1 JOIN T2 ON C.
184Note that T1/T2 are not necessarily base tables. For example, they can also be
185views, or derived tables, or the results of some joins; in the rest of text,
186we use the vague term "table" for those various objects.
187
188Note that C may only refer to columns of T1 or T2 (outer references
189are forbidden by MySQL in join conditions).
190
191Assuming that C is a conjunction (i.e. is made of one or more conditions,
192"conjuncts", chained together with AND):
193- If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the
194value of A is "the constant" in all rows of R).
195- If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice
196versa) holds in R (the value of T2.B is that of T1.A in all rows of R).
197
198@subsection OUTEREQ Equality-based, in the result of an outer join
199
200Considering the result R of TS LEFT JOIN TW ON C.
201Assuming that C is a conjunction. TS is said to be the strong table, TW is
202said to be the weak table (the one which might be NULL-complemented in the
203result of this join).
204To make this really clear, note that, if we have
205"t1 left join (t2 left join t3 on C) on D":
206- in the t2-t3 join, t2 is strong and t3 is weak.
207- in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak.
208
209If C is deterministic and one conjunct is of the form TW.A=constant or
210TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
211of TS referenced by C.
212Proof: consider in R two rows r1 and r2 which have the same values of DJS
213columns. Consider r1. There are two possibilities when r1 was built from a row
214of TS:
215- no row in TW matched the row of TS (for no row of TW has C been true): so,
216r1 is NULL-complemented for the columns of TW. Given that r2 has the same
217values of DJS columns as r1, and given that C is deterministic, it is sure
218that no row in TW matched when r2 was built. So r2 is also NULL-complemented
219for the columns of TW. So r1 and r2 have the same value of TW.A (NULL).
220- At least one row in TW matched: so, r1 contains real values from TW (not
221NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant
222or to TS.B. Following the same reasoning as above, it is sure that it is also
223the case for r2.
224- In conclusion, we can see that r1 and r2 always have the same value of
225TW.A.
226
227If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R
228Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two
229cases are possible:
230- this value is NULL. Then both rows are NULL-complemented (if not, the
231value of TW.A in TW would be NULL, which cannot match in an equality, so C is
232not true, which is absurd). Thus, in r1 and r2 TW.B is NULL.
233- This value is not NULL. Then both rows are not NULL-complemented, C
234matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2.
235- In conclusion, r1 and r2 have the same value of TW.B.
236
237If any conjunct references column T1.A and is not true if T1.A is NULL
238(e.g. conjunct is T1.A > 3), and T1 is part of TW, we can replace T1 with
239 (SELECT * FROM T1 WHERE T1.A IS NOT NULL) AS T1.
240We do not do such query rewriting, but a consequence is that we can and do
241treat T1.A as "known not nullable in T1".
242Proof: if we do this replacement, it means we remove from T1 some rows where
243T1.A is NULL. For the sake of the proof, let's broaden this to "we remove from
244and/or add to T1 rows where T1.A is NULL". By this operation,
245- the result of T2 JOIN T1 ON C is affected in the same manner (i.e. it's like
246if we remove from and/or add rows to the join's result, rows where T1.A is
247NULL).
248- the same is true for the result of T1 LEFT JOIN T2 ON C,
249- regarding the result of T2 LEFT JOIN T1 ON C, consider a row r2 from T2:
250 - if r2 had a match with a row of T1 where T1.A is not NULL, this match is
251 preserved after replacement of T1
252 - due to removed rows, matches with rows of T1 where T1.A is NULL may be
253 lost, and so NULL-complemented rows may appear
254 - due to added rows, matches with rows of T1 where T1.A is NULL may appear
255 - so in the result of the left join, it's like if we remove from
256 and/or add rows which all have T1.A is NULL.
257So, the net effect in all situations, on the join nest simply containing T1,
258is that "we remove from and/or add to the nest's result rows where T1.A is
259NULL".
260By induction we can go up the nest tree, for every nest it's like if we remove
261from and/or add to the join nest's result rows where T1.A is NULL.
262Going up, finally we come to the nest TS LEFT JOIN TW ON C where our
263NULL-rejecting conjunct is.
264If T1 belonged to TS, we could not say anything. But we have assumed T1
265belongs to TW: then the C condition eliminates added rows, and would have
266eliminated removed rows anyway. Thus the result of TS LEFT JOIN TW ON C is
267unchanged by our operation on T1. We can thus safely treat T1.A as not
268nullable _in_T1_. Thus, if T1.A is part of a unique constraint, we may treat
269this constraint as a primary key of T1 (@see WHEREEQ for an example).
270
271@subsection WHEREEQ Equality-based, in the result of a WHERE clause
272
273Same rule as the result of an inner join.
274Additionally, the rule is extended to T1.A=outer_reference, because an outer
275reference is a constant during one execution of this query.
276
277Moreoever, if any conjunct references column T1.A and is not true if T1.A is
278NULL (e.g. conjunct is T1.A > 3), we can treat T1.A as not nullable in T1.
279Proof: same as in previous section OUTEREQ.
280We can then derive FD information from a unique index on a nullable column;
281consider:
282 create table t1(a int null, b int null, unique(a));
283While
284 select a,b from t1 group by a;
285is invalid (because two null values of 'a' go in same group),
286 select a,b from t1 where 'a' is not null group by a;
287is valid: 'a' can be treated as non-nullable in t1, so the unique index is
288like "unique not null", so 'a' determines 'b'.
289
290Below we examine how functional dependencies in a table propagate to its
291containing join nest.
292
293@subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C.
294
295All functional dependencies in TS are also functional dependencies in R.
296Proof: trivial.
297The same is not necessarily true for TW.
298Let's define a "NULL-friendly functional dependency" (NFFD) as a functional
299dependency between two sets A and B, which has two properties:
300- A is not empty
301- if, in a row, all values of A are NULL, then all values of B are NULL.
302
303All NFFDs in TW are also NFFDs in R.
304Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the
305same values of A columns. Two cases are possible:
306- In r1 and r2, at least one value of A is not NULL. Then r1 is not
307NULL-complemented. Its values for A and B come from TW. By application of the
308functional dependency in TW, because values in A are equal in TW, values in B
309are equal in TW and thus in r1/r2.
310- In r1 and r2, all values of A are NULL. Two cases are possible:
311i) r1 is not NULL-complemented. Its values for A and B come from TW. In the
312row of TW values of A are all NULL. Because the functional dependency in
313NULL-friendly, all values of B are NULL in the row of TW and thus in r1.
314ii) r1 is NULL-complemented. Then all values of B in r1 are NULL.
315iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies
316to r2. So, values of B are identical (NULL) in r1 and r2.
317- In conclusion, values of B are identical in r1/r2, we have proved that
318this is a functional dependency in R, and a NULL-friendly one.
319
320The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to
321have easy propagation of FDs from TW to R. It was preferred to the
322alternative, simpler rule which says that a functional dependency A-> B in TW
323is also a functional dependency in R if A contains a non-nullable
324column. There are two points to note:
325- the functional dependency of the simpler rule is an NFFD, so our rule is not
326more restrictive than the simpler one
327- this simpler rule prevents free propagation of functional dependencies
328through join nests, which complicates implementation and leads to rejecting
329queries which could be accepted. An example follows:
330@verbatim
331SELECT T3.A
332FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE
333GROUP BY T3.PK;
334@endverbatim
335This is what the simpler rule says for this query:
336* In T3, T3.PK->T3.A holds.
337* Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A
338holds, by application of: there is a functional dependency in the
339weak side T3, and T3.PK is not nullable in T3.
340* Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE",
341in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the
342weak side (weak side is R1 here), and T3.PK is nullable _when
343seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3
344is actually a NULL-complemented one).
345
346@subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C.
347
348All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly]
349functional dependencies in R. the same is true for T2.
350Proof: trivial.
351
352@subsection PROPAGSUMMARY Summary rules for propagating FDs
353
354All NULL-friendly functional dependencies propagate freely through join nests
355all the way up to the result of the WHERE clause.
356The same is true for ordinary functional dependencies except if there are weak
357tables along the propagation path between the table where the dependency holds
358and the result of the WHERE clause; in other words, except if the table where
359the dependency holds belongs to some embedding join nest which is on the weak
360side of an outer join.
361
362@subsection NFFDS Which functional dependencies are NULL-friendly
363
364A key-based functional dependency A -> B in the base table is NULL-friendly,
365because, by definition, there can never be a NULL value in any column of A.
366
367A functional dependency A -> B in a base table, between parametric columns A
368and a generated column B, is not NULL-friendly; for more details, see
369@ref FDVIEW .
370
371A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on
372equality of two columns, is NULL-friendly. Indeed, A is not empty and if there
373was some NULL in A, it would not match the equality in C and thus it would not
374exist in the result, absurd. If the equality is rather column=constant, A is
375empty, the dependency is not NULL-friendly. However, in our implementation,
376function @c simplify_joins() moves inner-join conditions to the embedding
377outer-join nest's join condition, or to the WHERE clause. Because our analysis
378of functional dependencies happens after simplify_joins(), when we analyze T1
379JOIN T2 it is guaranteed to have no condition, and this paragraph is
380irrelevant.
381
382A functional dependency in the result of TS LEFT JOIN TW ON C, if based on
383equality of two columns, is NULL-friendly.
384Proof: let's consider, in turn, the two types of equality-based functional
385dependencies which exist in this result R. Let r1 be a row of R.
386- If C is deterministic and one conjunct is of the form TW.A=constant or
387TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
388of TS referenced by C. For NULL-friendliness, we need DJS to not be
389empty. Thus, we exclude the form TW.A=constant and consider only
390TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B
391then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in
392r1.
393- If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in
394R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is
395NULL in R1.
396- In conclusion, this is NULL-friendly.
397
398A functional dependency in the result of a WHERE clause, if based on equality
399of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it
400has an empty set of source columns.
401
402Summary: all functional dependencies which we have seen so far are
403NULL-friendly, except those inferred from TW.A=constant in an outer join
404condition or in a WHERE clause, and those about generated columns.
405
406Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is
407NULL-friendly and propagates, query is accepted.
408
409In our implementation, we take special care of TW.A=constant in an outer join
410condition: we infer a functional dependency DJS->TW.A from such equality only
411if one of these requirements are met:
412- the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the
413weak side of any embedding join nest - in that case, propagation will not meet
414any weak tables so we do not need the dependency to be NULL-friendly, it will
415propagate anyway.
416- DJS contains at least one column from a strong-side table which, if NULL,
417makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is
418NULL-friendly.
419
420Either of these two conditions is satisfied in most practical cases. For
421example, it's very common to have an equality between a strong-side column and
422a weak-side column as a conjunct in the outer join condition (like, "ON
423strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND
424..."); this satisfies the second condition. It's also common to have outer
425joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..."
426is left-deep); this satisfies the first condition.
427Note that the dependency found from TW.A=TS.B in an outer join condition
428always satisfies the second condition.
429
430T1.A=constant in a WHERE clause is exterior to any join nest so does not need
431to propagate, so does not need to be NULL-friendly.
432
433@subsection FDVIEW Functional dependencies in a view or a derived table
434
435In the rest of this text, we will use the term "view" for "view or derived
436table". A view can be merged or materialized, in MySQL.
437Consider a view V defined by a query expression.
438If the query expression contains UNION or ROLLUP (which is theoretically based
439on UNION) there are no functional dependencies in this view.
440So let's assume that the query expression is a query specification (let's note
441it VS):
442@verbatim
443CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ...
444[GROUP BY ...] [HAVING ...] [ORDER BY ...]
445@endverbatim
446
447If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1,
448VE2} -> {VE3} has been deduced from rules in the previous sections [and is
449NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is
450NULL-friendly].
451
452If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a
453deterministic expression depending only on VE1 and VE2, then {C1, C2} ->
454{ C3 } in the view.
455It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3):
456if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example:
457VE3 could be a literal; {}->{C3}, the left set is empty.
458The same examples apply to a generated column in a base table - it is like a
459merged view's expression. For example, in a base table T which has a generated
460column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not
461NULL-friendly.
462
463If VS is a grouped query (which, in MySQL, implies that the view is
464materialized), then in the result of the grouping there is a functional
465dependency from the set of all group expressions to the set of all selected
466expressions (otherwise, this query expression would not have passed its own
467only_full_group_by validation - in the implementation we validate each query
468expression separately). Thus, if all group expressions of VS are in the select
469list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}.
470It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the
471WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL,
472VE3 is not NULL.
473
474It's possible to infer functional dependencies from equality conditions in
475HAVING, but we have not implemented it yet.
476
477Because some functional dependencies above are not NULL-friendly, they
478exist in the view, but may not freely propagate in the result of join nests
479containing the view. This includes examples just given in paragraphs above,
480and the case of T1.A=constant in the WHERE clause of VS.
481
482Thus, when we know of a functional dependency A -> B in the query expression
483of a view, we deduce from it a functional dependency in the view only if:
484- this view is not on the weak side of any embedding join nest (so
485NULL-friendliness is not required for propagation).
486- or A contains at least one non-nullable expression, which makes A -> B
487NULL-friendly.
488
489The above is fine for materialized views. For merged views, we cannot study the
490query expression of the view, it has been merged (and scattered), so we use a
491different rule:
492- a merged view is similar to a join nest inserted in the parent query, so
493for dependencies based on keys or join conditions, we simply follow
494propagation rules of the non-view sections.
495- For expression-based dependencies (VE3 depends on VE1 and VE2, VE3
496belonging to the view SELECT list), which may not be NULL-friendly, we require
497 - the same non-weak-side criterion as above
498 - or that the left set of the dependency be non-empty and that if VE1 and
499VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly.
500- The same solution is used for generated columns in a base table.
501
502 @{
503
504*/
505
506#include <assert.h>
507#include <sys/types.h>
508
509#include "my_alloc.h"
510
511#include "my_inttypes.h"
512#include "my_table_map.h"
513#include "sql/item.h"
514#include "sql/item_cmpfunc.h" // Item_func_any_value
515#include "sql/item_sum.h" // Item_sum
516#include "sql/mem_root_array.h" // Mem_root_array
517
519class Opt_trace_object;
520class Query_block;
521class THD;
522class Table_ref;
523
524template <class T>
525class mem_root_deque;
526
527/**
528 Checks for queries which have DISTINCT.
529*/
531 public:
533 : select(select_arg), failed_ident(nullptr) {}
536
537 bool check_query(THD *thd);
538
539 private:
540 /// Query block which we are validating
542 /// Identifier which triggered an error
544
545 /**
546 Just because we need to go through Item::walk() to reach all items to
547 validate, some work must be delegated to "Item processors" (members of
548 Item); this work conceptually belongs to Distinct_check, and needs
549 privileged access to it.
550 */
555};
556
557/**
558 Checks for queries which have GROUP BY or aggregate functions.
559*/
561 public:
562 Group_check(Query_block *select_arg, MEM_ROOT *root)
563 : select(select_arg),
565 non_null_in_source(false),
566 table(nullptr),
567 group_in_fd(~0ULL),
568 m_root(root),
569 fd(root),
572 mat_tables(root),
574 if (select->m_no_of_added_exprs > 0) {
575 // Temporarily shorten GROUP BY list for full group by checking of
576 // non-field expressions stemming from transform of correlated scalar
577 // subquery to join with derived table, cf.
578 // Query_block::m_no_of_added_exprs
579 select->group_list.split_after(
582 }
583 }
584
586 std::destroy_n(mat_tables.data(), mat_tables.size());
587 if (select->m_no_of_added_exprs > 0) {
588 // restore GROUP BY list
590 m_added_by_transform.clear();
591 }
592 }
593 Group_check(const Group_check &) = delete;
595
596 bool check_query(THD *thd);
597 void to_opt_trace(THD *thd);
598
599 private:
600 /// Query block which we are validating
602
603 /**
604 "Underlying" == expressions which are underlying in an identifier.
605 The identifier can be a column of a view or derived table, both merged or
606 materialized, or a generated column: all of those have an underlying
607 expression.
608 "Materialized table (mat table)" == view or derived table, materialized.
609 If this is true, is_in_fd() will look for FDs in underlying expressions
610 of columns.
611 */
613
614 /**
615 This member is readable only if this is a child Group_check. Is true if
616 one expression of the SELECT list of this->select is non-nullable.
617 */
619
620 /**
621 The Group_check employed to validate one query block, the one on which
622 check_query() runs, is named the "master".
623 If the query block references a materialized table, the master may create
624 a child Group_check, whose job is to discover FDs in the query expression
625 of the mat table (with the ultimate goal of deducing from them some FDs
626 in the mat table and thus in the parent Group_check). A child may have
627 children itself. If this Group_check is a child, 'table' points to the
628 materialized table, otherwise it is NULL.
629 */
631
632 /**
633 Bit N is set if the N-th expression of GROUP BY is functionally dependent
634 on source columns.
635 It serves to find FDs in the query expression of a materialized table
636 having GROUP BY.
637 For a non-child Group-check, all bits are turned on from the start.
638 Currently limited to 64 bits => max 64 GROUP expressions; should probably
639 be MY_BITMAP.
640 */
642
643 /// Memory for allocations (like of 'fd')
645
646 /**
647 Columns which are local to 'select' and functionally dependent on an
648 initial set of "source" columns defined like this:
649 - if !is_child(), the GROUP BY columns
650 - if is_child(), columns of the result of the query expression under
651 'table' which are themselves part of 'fd' of the parent Group_check.
652 */
654 /// Map of tables for which all columns can be considered part of 'fd'.
656 /// Map of tables for which we discovered known-not-nullable columns.
658 /// Children Group_checks of 'this'
660 /// Identifier which triggered an error
662 /// GROUP BY list non single column expressions entries added by transform
664
665 bool is_fd_on_source(Item *item);
666 bool is_child() const { return table != nullptr; }
667
668 /// Private ctor, for a Group_check to build a child Group_check
669 Group_check(Query_block *select_arg, MEM_ROOT *root, Table_ref *table_arg)
670 : select(select_arg),
672 non_null_in_source(false),
673 table(table_arg),
674 group_in_fd(0ULL),
675 m_root(root),
676 fd(root),
679 mat_tables(root) {
680 assert(table);
681 }
682 bool check_expression(THD *thd, Item *expr, bool in_select_list);
683 /// Shortcut for common use of Item::local_column()
684 bool local_column(Item *item) const {
685 return item->local_column(select).is_true();
686 }
687 void add_to_fd(Item *item, bool local_column, bool add_to_mat_table = true);
689 whole_tables_fd |= m;
690 find_group_in_fd(nullptr);
691 }
692 void add_to_source_of_mat_table(Item_field *item_field, Table_ref *tl);
693 bool is_in_fd(Item *item);
695 Item *get_fd_equal(Item *item);
696 void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables,
697 bool weak_side_upwards);
698 void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item,
699 table_map weak_tables, bool weak_side_upwards);
700 void find_fd_in_cond(Item *cond, table_map weak_tables,
701 bool weak_side_upwards);
704 void find_group_in_fd(Item *item);
705 Item *select_expression(uint idx);
706
707 /// Enum for argument of do_ident_check()
710
711 /**
712 Just because we need to go through Item::walk() to reach all items to
713 validate, some work must be delegated to "Item processors" (members of
714 Item); this work conceptually belongs to Group_check, and needs
715 privileged access to it.
716 */
723};
724
725#endif
726/// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY)
Kerberos Client Authentication nullptr
Definition: auth_kerberos_client_plugin.cc:251
bool is_true() const
Definition: item.h:601
Checks for queries which have DISTINCT.
Definition: aggregate_check.h:530
Distinct_check(const Distinct_check &)=delete
Query_block *const select
Query block which we are validating.
Definition: aggregate_check.h:541
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:543
Distinct_check & operator=(const Distinct_check &)=delete
Distinct_check(Query_block *select_arg)
Definition: aggregate_check.h:532
Checks for queries which have GROUP BY or aggregate functions.
Definition: aggregate_check.h:560
SQL_I_List< ORDER > m_added_by_transform
GROUP BY list non single column expressions entries added by transform.
Definition: aggregate_check.h:663
MEM_ROOT *const m_root
Memory for allocations (like of 'fd')
Definition: aggregate_check.h:644
Group_check(Query_block *select_arg, MEM_ROOT *root, Table_ref *table_arg)
Private ctor, for a Group_check to build a child Group_check.
Definition: aggregate_check.h:669
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:661
void add_to_fd(table_map m)
Definition: aggregate_check.h:688
Query_block *const select
Query block which we are validating.
Definition: aggregate_check.h:601
Group_check & operator=(const Group_check &)=delete
enum_ident_check
Enum for argument of do_ident_check()
Definition: aggregate_check.h:708
@ CHECK_COLUMN
Definition: aggregate_check.h:708
@ CHECK_STRONG_SIDE_COLUMN
Definition: aggregate_check.h:708
@ CHECK_GROUP
Definition: aggregate_check.h:708
Group_check(Query_block *select_arg, MEM_ROOT *root)
Definition: aggregate_check.h:562
Group_check(const Group_check &)=delete
Table_ref *const table
The Group_check employed to validate one query block, the one on which check_query() runs,...
Definition: aggregate_check.h:630
~Group_check()
Definition: aggregate_check.h:585
bool search_in_underlying
"Underlying" == expressions which are underlying in an identifier.
Definition: aggregate_check.h:612
bool is_child() const
Definition: aggregate_check.h:666
Mem_root_array< Group_check * > mat_tables
Children Group_checks of 'this'.
Definition: aggregate_check.h:659
table_map recheck_nullable_keys
Map of tables for which we discovered known-not-nullable columns.
Definition: aggregate_check.h:657
bool local_column(Item *item) const
Shortcut for common use of Item::local_column()
Definition: aggregate_check.h:684
bool non_null_in_source
This member is readable only if this is a child Group_check.
Definition: aggregate_check.h:618
ulonglong group_in_fd
Bit N is set if the N-th expression of GROUP BY is functionally dependent on source columns.
Definition: aggregate_check.h:641
Mem_root_array< Item_ident * > fd
Columns which are local to 'select' and functionally dependent on an initial set of "source" columns ...
Definition: aggregate_check.h:653
table_map whole_tables_fd
Map of tables for which all columns can be considered part of 'fd'.
Definition: aggregate_check.h:655
Definition: item.h:4385
bool aggregate_check_group(uchar *arg) override
Definition: item_cmpfunc.cc:7599
bool aggregate_check_distinct(uchar *arg) override
Definition: item_cmpfunc.cc:7606
bool aggregate_check_group(uchar *arg) override
This function is expected to check if GROUPING function with its arguments is "group-invariant".
Definition: item_sum.cc:6370
bool aggregate_check_distinct(uchar *arg) override
Used by Distinct_check::check_query to determine whether an error should be returned if the GROUPING ...
Definition: item_sum.cc:6340
Definition: item.h:4123
bool is_strong_side_column_not_in_fd(uchar *arg) override
Definition: item.cc:10908
bool aggregate_check_group(uchar *arg) override
Definition: item.cc:10903
bool is_column_not_in_fd(uchar *arg) override
Definition: item.cc:10916
bool aggregate_check_distinct(uchar *arg) override
Definition: item.cc:10855
bool aggregate_check_distinct(uchar *arg) override
Definition: item_sum.cc:641
bool aggregate_check_group(uchar *arg) override
Definition: item_sum.cc:665
Utility mixin class to be able to walk() only parts of item trees.
Definition: item.h:738
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:930
virtual Bool3 local_column(const Query_block *) const
Definition: item.h:2934
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:426
A per-session context which is always available at any point of execution, because in practice it's a...
Definition: opt_trace_context.h:94
A JSON object (unordered set of key/value pairs).
Definition: opt_trace.h:802
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:1159
SQL_I_List< ORDER > group_list
GROUP BY clause.
Definition: sql_lex.h:1967
decltype(SQL_I_List< ORDER >::elements) m_no_of_added_exprs
For an explicitly grouped, correlated, scalar subquery which is transformed to join with derived tabl...
Definition: sql_lex.h:1981
Simple intrusive linked list.
Definition: sql_list.h:48
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Definition: table.h:2900
A (partial) implementation of std::deque allocating its blocks on a MEM_ROOT.
Definition: mem_root_deque.h:111
void add_to_fd(Item *item, bool local_column, bool add_to_mat_table=true)
Definition: aggregate_check.cc:498
bool do_ident_check(Item_ident *i, table_map tm, enum enum_ident_check type)
Does one low-level check on one Item_ident.
Definition: aggregate_check.cc:1203
Item * select_expression(uint idx)
Definition: aggregate_check.cc:592
void find_fd_in_cond(Item *cond, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependencies in a condition.
Definition: aggregate_check.cc:1083
void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item, table_map weak_tables, bool weak_side_upwards)
Helper function.
Definition: aggregate_check.cc:994
void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent)
Utility function for to_opt_trace(), as we need recursion in children Group_checks.
Definition: aggregate_check.cc:1166
void find_fd_in_joined_table(mem_root_deque< Table_ref * > *join_list)
Searches for equality-based functional dependencies in the condition of a join nest,...
Definition: aggregate_check.cc:1107
bool is_fd_on_source(Item *item)
Tells if 'item' is functionally dependent ("FD") on source columns.
Definition: aggregate_check.cc:343
Item * get_fd_equal(Item *item)
Definition: aggregate_check.cc:873
bool check_query(THD *thd)
Rejects the query if it has a combination of DISTINCT and ORDER BY which could lead to randomly order...
Definition: aggregate_check.cc:111
bool check_query(THD *thd)
Rejects the query if it does aggregation or grouping, and expressions in its SELECT list,...
Definition: aggregate_check.cc:180
bool check_expression(THD *thd, Item *expr, bool in_select_list)
Validates one expression (this forms one step of check_query()).
Definition: aggregate_check.cc:275
void to_opt_trace(THD *thd)
Writes "check information" to the optimizer trace.
Definition: aggregate_check.cc:1153
bool is_in_fd(Item *item)
is_in_fd() is low-level compared to is_fd_on_source().
Definition: aggregate_check.cc:685
bool is_in_fd_of_underlying(Item_ident *item)
See if we can derive a FD from a column which has an underlying expression.
Definition: aggregate_check.cc:745
void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependences in an AND-ed part of a condition (a conjunct).
Definition: aggregate_check.cc:896
void add_to_source_of_mat_table(Item_field *item_field, Table_ref *tl)
If we just added a column of a materialized table to 'fd', we record this fact in a new Group_check (...
Definition: aggregate_check.cc:620
void find_group_in_fd(Item *item)
This function must be called every time we discover an item which is FD on source columns,...
Definition: aggregate_check.cc:559
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Some integer typedefs for easier portability.
unsigned long long int ulonglong
Definition: my_inttypes.h:56
unsigned char uchar
Definition: my_inttypes.h:52
uint64_t table_map
Definition: my_table_map.h:30
required string type
Definition: replication_group_member_actions.proto:34
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:83