MySQL 8.3.0
Source Code Documentation
aggregate_check.h
Go to the documentation of this file.
1#ifndef AGGREGATE_CHECK_INCLUDED
2#define AGGREGATE_CHECK_INCLUDED
3
4/* Copyright (c) 2014, 2023, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is also distributed with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have included with MySQL.
16
17 This program is distributed in the hope that it will be useful,
18 but WITHOUT ANY WARRANTY; without even the implied warranty of
19 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 GNU General Public License, version 2.0, for more details.
21
22 You should have received a copy of the GNU General Public License
23 along with this program; if not, write to the Free Software
24 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25
26/**
27 @file
28 Checks for some semantic constraints on queries using GROUP
29 BY, or aggregate functions, or DISTINCT. Enforced if
30 sql_mode contains 'only_full_group_by'.
31*/
32
33/**
34 @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY
35
36Checks for some semantic constraints on queries using GROUP BY, or aggregate
37functions, or DISTINCT (ONLY_FULL_GROUP_BY).
38
39We call "aggregation" the operation of taking a group of rows and replacing
40it with a single row. There are three types of aggregation: DISTINCT,
41implicit grouping, explicit grouping.
42
43This text describes MySQL's checks (why certain queries are rejected) and the
44rationale behind them.
45
46References:
47- WL#2489 "better only_full_group_by".
48- if you have access to the SQL standard, we recommend the following parts of
49"SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax
50rule 7 and Conformance rule 2; 4.19 functional dependencies.
51
52@section DISTINCT
53
54DISTINCT: all identical rows in the result of SELECT are "aggregated" to
55one single row - duplicates are eliminated.
56If the result of the SELECT without DISTINCT is
57@verbatim
581 2
593 4
601 2
61@endverbatim
62then the result with DISTINCT is
63@verbatim
641 2
653 4
66@endverbatim
67Here is a problematic query. Assume we have a table T which contains three
68columns C1,C2,C3 and has the following rows:
69@verbatim
70C1 C2 C3
711 2 A
723 4 B
731 2 C
74@endverbatim
75and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3".
76Because we want the result to be ordered, we have to first eliminate
77duplicates then order the result.
78When we eliminate duplicates, should we keep the first or the third row? This
79arbitrary choice will influence the retained value of C3, which will influence
80ordering. In the end, we have arbitrary ordering, which is problematic.
81To prevent this, if, in a query block 'sl', an ORDER BY expression
82is not the same expression as one in the SELECT list of 'sl'
83and contains a column which:
84- is of a table whose qualifying query block is 'sl'
85- and is not in the SELECT list of 'sl'
86
87then 'sl' should not have DISTINCT.
88This rule makes the query above invalid.
89Class Check_distinct implements this rule.
90
91Class Check_distinct implements a second rule, specific of set functions in
92ORDER BY (a non-standard feature).
93Consider these queries labelled (1), (2), (3), with DISTINCT and a set
94function in ORDER BY:
95@verbatim
96SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
97SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2);
98SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
99@endverbatim
100(1) is a random-order query, (2) and (3) are not.
101MySQL has traditionally been permissive, we want to allow (2) and (3).
102
103So the second rule is that Check_distinct rejects a query if it has DISTINCT
104and a set function in ORDER BY which is not the same expression as one in the
105SELECT list. This rejects (1) and allows (2).
106It would reject (3); luckily, before Check_distinct checks, DISTINCT is
107removed (it is redundant with GROUP BY) and thus the query is not processed by
108Check_distinct; the second rule is thus by-passed and (3) is correctly
109accepted.
110
111The implementation of Check_distinct works like this: if the query has
112DISTINCT, examine each element of the ORDER BY list: if the element is not the
113same expression as one in the SELECT list, then examine parts of the element
114(using Item::walk()), to spot offending set functions or columns.
115
116@section IMPLICIT_GROUPING Implicit grouping
117
118A query with set functions without GROUP BY can be seen as having GROUP BY()
119i.e. the set of grouping expressions is empty, all rows are part of a
120single group and are replaced with a single row. So it is just a sub-case of
121the next section.
122
123@section EXPLICIT_GROUPING Explicit grouping (GROUP BY)
124
125MySQL is more permissive than the standard, it allows to group by any
126expression; it does not require that every element of the GROUP BY clause
127should be a column of one table of the FROM clause.
128
129Here is a problematic query, using a table T with columns C1, C2, C3:
130@verbatim
131C1 C2 C3
1321 2 A
1333 4 B
1341 2 C
135
136SELECT C1, C3 FROM T GROUP BY C1;
137@endverbatim
138we first form groups, in each group the value of C1 must be the same for all
139rows. Consider the group made of the first and third row. We have to produce
140one row out of it, and this row must have a value for C3 because C3 is in the
141SELECT list. Among those two rows, which value of C3 should we choose? This is
142arbitrary, and will give a random result.
143
144To prevent this, in a query with GROUP BY or aggregates (known as "a grouped
145query"), any column referenced by an expression in the SELECT list or HAVING
146condition and belonging to one table of the FROM clause, should be one of the
147group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if
148functional dependencies are supported (as in MySQL 5.7), should be
149functionally dependent on the group columns.
150In the table T above, C3 is obviously not functionally dependent on {C1,C2}:
151the values of C1 and C2 for a row do not uniquely determine the value of
152C3. In other words, if two rows have the same value in C1 and the same value
153in C2 they do not necessarily have the same value in C3.
154So this rule correctly rejects the query above.
155
156Note that NULL is treated as one value: two NULLs are considered equal.
157
158In WL#2489, we have implemented the optional standard feature T301
159"functional dependencies" almost entirely.
160
161Here are the functional dependencies which we recognize.
162
163@subsection KEYFD Key-based, in a base table
164
165A key in this text is a unique constraint made of only non-NULLable
166columns. For example, a primary key.
167Considering a base table T, if two rows have the same values of all columns of
168a key of T they are actually one single row, so:
169{ all columns of key} -> { T.* } (notation: what is on the right of the arrow
170is functionally dependent on what is on the left).
171
172@subsection GCOLFD Generated-column-based, in a base table
173
174Considering a base table T, a generated column is functionally dependent on
175the set of columns it references (the so-called parametric
176columns). Note that the SQL standard doesn't allow a parametric column
177to be a generated column, but MySQL does.
178
179@subsection INNEREQ Equality-based, in the result of an inner join
180
181Considering two tables T1 and T2, a condition C, and the result R of an inner
182join T1 JOIN T2 ON C.
183Note that T1/T2 are not necessarily base tables. For example, they can also be
184views, or derived tables, or the results of some joins; in the rest of text,
185we use the vague term "table" for those various objects.
186
187Note that C may only refer to columns of T1 or T2 (outer references
188are forbidden by MySQL in join conditions).
189
190Assuming that C is a conjunction (i.e. is made of one or more conditions,
191"conjuncts", chained together with AND):
192- If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the
193value of A is "the constant" in all rows of R).
194- If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice
195versa) holds in R (the value of T2.B is that of T1.A in all rows of R).
196
197@subsection OUTEREQ Equality-based, in the result of an outer join
198
199Considering the result R of TS LEFT JOIN TW ON C.
200Assuming that C is a conjunction. TS is said to be the strong table, TW is
201said to be the weak table (the one which might be NULL-complemented in the
202result of this join).
203To make this really clear, note that, if we have
204"t1 left join (t2 left join t3 on C) on D":
205- in the t2-t3 join, t2 is strong and t3 is weak.
206- in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak.
207
208If C is deterministic and one conjunct is of the form TW.A=constant or
209TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
210of TS referenced by C.
211Proof: consider in R two rows r1 and r2 which have the same values of DJS
212columns. Consider r1. There are two possibilities when r1 was built from a row
213of TS:
214- no row in TW matched the row of TS (for no row of TW has C been true): so,
215r1 is NULL-complemented for the columns of TW. Given that r2 has the same
216values of DJS columns as r1, and given that C is deterministic, it is sure
217that no row in TW matched when r2 was built. So r2 is also NULL-complemented
218for the columns of TW. So r1 and r2 have the same value of TW.A (NULL).
219- At least one row in TW matched: so, r1 contains real values from TW (not
220NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant
221or to TS.B. Following the same reasoning as above, it is sure that it is also
222the case for r2.
223- In conclusion, we can see that r1 and r2 always have the same value of
224TW.A.
225
226If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R
227Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two
228cases are possible:
229- this value is NULL. Then both rows are NULL-complemented (if not, the
230value of TW.A in TW would be NULL, which cannot match in an equality, so C is
231not true, which is absurd). Thus, in r1 and r2 TW.B is NULL.
232- This value is not NULL. Then both rows are not NULL-complemented, C
233matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2.
234- In conclusion, r1 and r2 have the same value of TW.B.
235
236If any conjunct references column T1.A and is not true if T1.A is NULL
237(e.g. conjunct is T1.A > 3), and T1 is part of TW, we can replace T1 with
238 (SELECT * FROM T1 WHERE T1.A IS NOT NULL) AS T1.
239We do not do such query rewriting, but a consequence is that we can and do
240treat T1.A as "known not nullable in T1".
241Proof: if we do this replacement, it means we remove from T1 some rows where
242T1.A is NULL. For the sake of the proof, let's broaden this to "we remove from
243and/or add to T1 rows where T1.A is NULL". By this operation,
244- the result of T2 JOIN T1 ON C is affected in the same manner (i.e. it's like
245if we remove from and/or add rows to the join's result, rows where T1.A is
246NULL).
247- the same is true for the result of T1 LEFT JOIN T2 ON C,
248- regarding the result of T2 LEFT JOIN T1 ON C, consider a row r2 from T2:
249 - if r2 had a match with a row of T1 where T1.A is not NULL, this match is
250 preserved after replacement of T1
251 - due to removed rows, matches with rows of T1 where T1.A is NULL may be
252 lost, and so NULL-complemented rows may appear
253 - due to added rows, matches with rows of T1 where T1.A is NULL may appear
254 - so in the result of the left join, it's like if we remove from
255 and/or add rows which all have T1.A is NULL.
256So, the net effect in all situations, on the join nest simply containing T1,
257is that "we remove from and/or add to the nest's result rows where T1.A is
258NULL".
259By induction we can go up the nest tree, for every nest it's like if we remove
260from and/or add to the join nest's result rows where T1.A is NULL.
261Going up, finally we come to the nest TS LEFT JOIN TW ON C where our
262NULL-rejecting conjunct is.
263If T1 belonged to TS, we could not say anything. But we have assumed T1
264belongs to TW: then the C condition eliminates added rows, and would have
265eliminated removed rows anyway. Thus the result of TS LEFT JOIN TW ON C is
266unchanged by our operation on T1. We can thus safely treat T1.A as not
267nullable _in_T1_. Thus, if T1.A is part of a unique constraint, we may treat
268this constraint as a primary key of T1 (@see WHEREEQ for an example).
269
270@subsection WHEREEQ Equality-based, in the result of a WHERE clause
271
272Same rule as the result of an inner join.
273Additionally, the rule is extended to T1.A=outer_reference, because an outer
274reference is a constant during one execution of this query.
275
276Moreoever, if any conjunct references column T1.A and is not true if T1.A is
277NULL (e.g. conjunct is T1.A > 3), we can treat T1.A as not nullable in T1.
278Proof: same as in previous section OUTEREQ.
279We can then derive FD information from a unique index on a nullable column;
280consider:
281 create table t1(a int null, b int null, unique(a));
282While
283 select a,b from t1 group by a;
284is invalid (because two null values of 'a' go in same group),
285 select a,b from t1 where 'a' is not null group by a;
286is valid: 'a' can be treated as non-nullable in t1, so the unique index is
287like "unique not null", so 'a' determines 'b'.
288
289Below we examine how functional dependencies in a table propagate to its
290containing join nest.
291
292@subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C.
293
294All functional dependencies in TS are also functional dependencies in R.
295Proof: trivial.
296The same is not necessarily true for TW.
297Let's define a "NULL-friendly functional dependency" (NFFD) as a functional
298dependency between two sets A and B, which has two properties:
299- A is not empty
300- if, in a row, all values of A are NULL, then all values of B are NULL.
301
302All NFFDs in TW are also NFFDs in R.
303Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the
304same values of A columns. Two cases are possible:
305- In r1 and r2, at least one value of A is not NULL. Then r1 is not
306NULL-complemented. Its values for A and B come from TW. By application of the
307functional dependency in TW, because values in A are equal in TW, values in B
308are equal in TW and thus in r1/r2.
309- In r1 and r2, all values of A are NULL. Two cases are possible:
310i) r1 is not NULL-complemented. Its values for A and B come from TW. In the
311row of TW values of A are all NULL. Because the functional dependency in
312NULL-friendly, all values of B are NULL in the row of TW and thus in r1.
313ii) r1 is NULL-complemented. Then all values of B in r1 are NULL.
314iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies
315to r2. So, values of B are identical (NULL) in r1 and r2.
316- In conclusion, values of B are identical in r1/r2, we have proved that
317this is a functional dependency in R, and a NULL-friendly one.
318
319The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to
320have easy propagation of FDs from TW to R. It was preferred to the
321alternative, simpler rule which says that a functional dependency A-> B in TW
322is also a functional dependency in R if A contains a non-nullable
323column. There are two points to note:
324- the functional dependency of the simpler rule is an NFFD, so our rule is not
325more restrictive than the simpler one
326- this simpler rule prevents free propagation of functional dependencies
327through join nests, which complicates implementation and leads to rejecting
328queries which could be accepted. An example follows:
329@verbatim
330SELECT T3.A
331FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE
332GROUP BY T3.PK;
333@endverbatim
334This is what the simpler rule says for this query:
335* In T3, T3.PK->T3.A holds.
336* Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A
337holds, by application of: there is a functional dependency in the
338weak side T3, and T3.PK is not nullable in T3.
339* Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE",
340in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the
341weak side (weak side is R1 here), and T3.PK is nullable _when
342seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3
343is actually a NULL-complemented one).
344
345@subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C.
346
347All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly]
348functional dependencies in R. the same is true for T2.
349Proof: trivial.
350
351@subsection PROPAGSUMMARY Summary rules for propagating FDs
352
353All NULL-friendly functional dependencies propagate freely through join nests
354all the way up to the result of the WHERE clause.
355The same is true for ordinary functional dependencies except if there are weak
356tables along the propagation path between the table where the dependency holds
357and the result of the WHERE clause; in other words, except if the table where
358the dependency holds belongs to some embedding join nest which is on the weak
359side of an outer join.
360
361@subsection NFFDS Which functional dependencies are NULL-friendly
362
363A key-based functional dependency A -> B in the base table is NULL-friendly,
364because, by definition, there can never be a NULL value in any column of A.
365
366A functional dependency A -> B in a base table, between parametric columns A
367and a generated column B, is not NULL-friendly; for more details, see
368@ref FDVIEW .
369
370A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on
371equality of two columns, is NULL-friendly. Indeed, A is not empty and if there
372was some NULL in A, it would not match the equality in C and thus it would not
373exist in the result, absurd. If the equality is rather column=constant, A is
374empty, the dependency is not NULL-friendly. However, in our implementation,
375function @c simplify_joins() moves inner-join conditions to the embedding
376outer-join nest's join condition, or to the WHERE clause. Because our analysis
377of functional dependencies happens after simplify_joins(), when we analyze T1
378JOIN T2 it is guaranteed to have no condition, and this paragraph is
379irrelevant.
380
381A functional dependency in the result of TS LEFT JOIN TW ON C, if based on
382equality of two columns, is NULL-friendly.
383Proof: let's consider, in turn, the two types of equality-based functional
384dependencies which exist in this result R. Let r1 be a row of R.
385- If C is deterministic and one conjunct is of the form TW.A=constant or
386TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
387of TS referenced by C. For NULL-friendliness, we need DJS to not be
388empty. Thus, we exclude the form TW.A=constant and consider only
389TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B
390then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in
391r1.
392- If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in
393R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is
394NULL in R1.
395- In conclusion, this is NULL-friendly.
396
397A functional dependency in the result of a WHERE clause, if based on equality
398of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it
399has an empty set of source columns.
400
401Summary: all functional dependencies which we have seen so far are
402NULL-friendly, except those inferred from TW.A=constant in an outer join
403condition or in a WHERE clause, and those about generated columns.
404
405Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is
406NULL-friendly and propagates, query is accepted.
407
408In our implementation, we take special care of TW.A=constant in an outer join
409condition: we infer a functional dependency DJS->TW.A from such equality only
410if one of these requirements are met:
411- the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the
412weak side of any embedding join nest - in that case, propagation will not meet
413any weak tables so we do not need the dependency to be NULL-friendly, it will
414propagate anyway.
415- DJS contains at least one column from a strong-side table which, if NULL,
416makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is
417NULL-friendly.
418
419Either of these two conditions is satisfied in most practical cases. For
420example, it's very common to have an equality between a strong-side column and
421a weak-side column as a conjunct in the outer join condition (like, "ON
422strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND
423..."); this satisfies the second condition. It's also common to have outer
424joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..."
425is left-deep); this satisfies the first condition.
426Note that the dependency found from TW.A=TS.B in an outer join condition
427always satisfies the second condition.
428
429T1.A=constant in a WHERE clause is exterior to any join nest so does not need
430to propagate, so does not need to be NULL-friendly.
431
432@subsection FDVIEW Functional dependencies in a view or a derived table
433
434In the rest of this text, we will use the term "view" for "view or derived
435table". A view can be merged or materialized, in MySQL.
436Consider a view V defined by a query expression.
437If the query expression contains UNION or ROLLUP (which is theoretically based
438on UNION) there are no functional dependencies in this view.
439So let's assume that the query expression is a query specification (let's note
440it VS):
441@verbatim
442CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ...
443[GROUP BY ...] [HAVING ...] [ORDER BY ...]
444@endverbatim
445
446If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1,
447VE2} -> {VE3} has been deduced from rules in the previous sections [and is
448NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is
449NULL-friendly].
450
451If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a
452deterministic expression depending only on VE1 and VE2, then {C1, C2} ->
453{ C3 } in the view.
454It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3):
455if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example:
456VE3 could be a literal; {}->{C3}, the left set is empty.
457The same examples apply to a generated column in a base table - it is like a
458merged view's expression. For example, in a base table T which has a generated
459column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not
460NULL-friendly.
461
462If VS is a grouped query (which, in MySQL, implies that the view is
463materialized), then in the result of the grouping there is a functional
464dependency from the set of all group expressions to the set of all selected
465expressions (otherwise, this query expression would not have passed its own
466only_full_group_by validation - in the implementation we validate each query
467expression separately). Thus, if all group expressions of VS are in the select
468list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}.
469It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the
470WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL,
471VE3 is not NULL.
472
473It's possible to infer functional dependencies from equality conditions in
474HAVING, but we have not implemented it yet.
475
476Because some functional dependencies above are not NULL-friendly, they
477exist in the view, but may not freely propagate in the result of join nests
478containing the view. This includes examples just given in paragraphs above,
479and the case of T1.A=constant in the WHERE clause of VS.
480
481Thus, when we know of a functional dependency A -> B in the query expression
482of a view, we deduce from it a functional dependency in the view only if:
483- this view is not on the weak side of any embedding join nest (so
484NULL-friendliness is not required for propagation).
485- or A contains at least one non-nullable expression, which makes A -> B
486NULL-friendly.
487
488The above is fine for materialized views. For merged views, we cannot study the
489query expression of the view, it has been merged (and scattered), so we use a
490different rule:
491- a merged view is similar to a join nest inserted in the parent query, so
492for dependencies based on keys or join conditions, we simply follow
493propagation rules of the non-view sections.
494- For expression-based dependencies (VE3 depends on VE1 and VE2, VE3
495belonging to the view SELECT list), which may not be NULL-friendly, we require
496 - the same non-weak-side criterion as above
497 - or that the left set of the dependency be non-empty and that if VE1 and
498VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly.
499- The same solution is used for generated columns in a base table.
500
501 @{
502
503*/
504
505#include <assert.h>
506#include <sys/types.h>
507
508#include "my_alloc.h"
509
510#include "my_inttypes.h"
511#include "my_table_map.h"
512#include "sql/item.h"
513#include "sql/item_cmpfunc.h" // Item_func_any_value
514#include "sql/item_sum.h" // Item_sum
515#include "sql/mem_root_array.h" // Mem_root_array
516
518class Opt_trace_object;
519class Query_block;
520class THD;
521class Table_ref;
522
523template <class T>
524class mem_root_deque;
525
526/**
527 Checks for queries which have DISTINCT.
528*/
530 public:
532 : select(select_arg), failed_ident(nullptr) {}
535
536 bool check_query(THD *thd);
537
538 private:
539 /// Query block which we are validating
541 /// Identifier which triggered an error
543
544 /**
545 Just because we need to go through Item::walk() to reach all items to
546 validate, some work must be delegated to "Item processors" (members of
547 Item); this work conceptually belongs to Distinct_check, and needs
548 privileged access to it.
549 */
554};
555
556/**
557 Checks for queries which have GROUP BY or aggregate functions.
558*/
560 public:
561 Group_check(Query_block *select_arg, MEM_ROOT *root)
562 : select(select_arg),
564 non_null_in_source(false),
565 table(nullptr),
566 group_in_fd(~0ULL),
567 m_root(root),
568 fd(root),
571 mat_tables(root),
573
574 ~Group_check() { std::destroy_n(mat_tables.data(), mat_tables.size()); }
575 Group_check(const Group_check &) = delete;
577
578 bool check_query(THD *thd);
579 void to_opt_trace(THD *thd);
580
581 private:
582 /// Query block which we are validating
584
585 /**
586 "Underlying" == expressions which are underlying in an identifier.
587 The identifier can be a column of a view or derived table, both merged or
588 materialized, or a generated column: all of those have an underlying
589 expression.
590 "Materialized table (mat table)" == view or derived table, materialized.
591 If this is true, is_in_fd() will look for FDs in underlying expressions
592 of columns.
593 */
595
596 /**
597 This member is readable only if this is a child Group_check. Is true if
598 one expression of the SELECT list of this->select is non-nullable.
599 */
601
602 /**
603 The Group_check employed to validate one query block, the one on which
604 check_query() runs, is named the "master".
605 If the query block references a materialized table, the master may create
606 a child Group_check, whose job is to discover FDs in the query expression
607 of the mat table (with the ultimate goal of deducing from them some FDs
608 in the mat table and thus in the parent Group_check). A child may have
609 children itself. If this Group_check is a child, 'table' points to the
610 materialized table, otherwise it is NULL.
611 */
613
614 /**
615 Bit N is set if the N-th expression of GROUP BY is functionally dependent
616 on source columns.
617 It serves to find FDs in the query expression of a materialized table
618 having GROUP BY.
619 For a non-child Group-check, all bits are turned on from the start.
620 Currently limited to 64 bits => max 64 GROUP expressions; should probably
621 be MY_BITMAP.
622 */
624
625 /// Memory for allocations (like of 'fd')
627
628 /**
629 Columns which are local to 'select' and functionally dependent on an
630 initial set of "source" columns defined like this:
631 - if !is_child(), the GROUP BY columns
632 - if is_child(), columns of the result of the query expression under
633 'table' which are themselves part of 'fd' of the parent Group_check.
634 */
636 /// Map of tables for which all columns can be considered part of 'fd'.
638 /// Map of tables for which we discovered known-not-nullable columns.
640 /// Children Group_checks of 'this'
642 /// Identifier which triggered an error
644
645 bool is_fd_on_source(Item *item);
646 bool is_child() const { return table != nullptr; }
647
648 /// Private ctor, for a Group_check to build a child Group_check
649 Group_check(Query_block *select_arg, MEM_ROOT *root, Table_ref *table_arg)
650 : select(select_arg),
652 non_null_in_source(false),
653 table(table_arg),
654 group_in_fd(0ULL),
655 m_root(root),
656 fd(root),
659 mat_tables(root) {
660 assert(table);
661 }
662 bool check_expression(THD *thd, Item *expr, bool in_select_list);
663 /// Shortcut for common use of Item::local_column()
664 bool local_column(Item *item) const {
665 return item->local_column(select).is_true();
666 }
667 void add_to_fd(Item *item, bool local_column, bool add_to_mat_table = true);
669 whole_tables_fd |= m;
670 find_group_in_fd(nullptr);
671 }
672 void add_to_source_of_mat_table(Item_field *item_field, Table_ref *tl);
673 bool is_in_fd(Item *item);
675 Item *get_fd_equal(Item *item);
676 void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables,
677 bool weak_side_upwards);
678 void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item,
679 table_map weak_tables, bool weak_side_upwards);
680 void find_fd_in_cond(Item *cond, table_map weak_tables,
681 bool weak_side_upwards);
684 void find_group_in_fd(Item *item);
685 Item *select_expression(uint idx);
686
687 /// Enum for argument of do_ident_check()
690
691 /**
692 Just because we need to go through Item::walk() to reach all items to
693 validate, some work must be delegated to "Item processors" (members of
694 Item); this work conceptually belongs to Group_check, and needs
695 privileged access to it.
696 */
703};
704
705#endif
706/// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY)
Kerberos Client Authentication nullptr
Definition: auth_kerberos_client_plugin.cc:250
bool is_true() const
Definition: item.h:603
Checks for queries which have DISTINCT.
Definition: aggregate_check.h:529
Distinct_check(const Distinct_check &)=delete
Query_block *const select
Query block which we are validating.
Definition: aggregate_check.h:540
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:542
Distinct_check & operator=(const Distinct_check &)=delete
Distinct_check(Query_block *select_arg)
Definition: aggregate_check.h:531
Checks for queries which have GROUP BY or aggregate functions.
Definition: aggregate_check.h:559
MEM_ROOT *const m_root
Memory for allocations (like of 'fd')
Definition: aggregate_check.h:626
Group_check(Query_block *select_arg, MEM_ROOT *root, Table_ref *table_arg)
Private ctor, for a Group_check to build a child Group_check.
Definition: aggregate_check.h:649
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:643
void add_to_fd(table_map m)
Definition: aggregate_check.h:668
Query_block *const select
Query block which we are validating.
Definition: aggregate_check.h:583
Group_check & operator=(const Group_check &)=delete
enum_ident_check
Enum for argument of do_ident_check()
Definition: aggregate_check.h:688
@ CHECK_COLUMN
Definition: aggregate_check.h:688
@ CHECK_STRONG_SIDE_COLUMN
Definition: aggregate_check.h:688
@ CHECK_GROUP
Definition: aggregate_check.h:688
Group_check(Query_block *select_arg, MEM_ROOT *root)
Definition: aggregate_check.h:561
Group_check(const Group_check &)=delete
Table_ref *const table
The Group_check employed to validate one query block, the one on which check_query() runs,...
Definition: aggregate_check.h:612
~Group_check()
Definition: aggregate_check.h:574
bool search_in_underlying
"Underlying" == expressions which are underlying in an identifier.
Definition: aggregate_check.h:594
bool is_child() const
Definition: aggregate_check.h:646
Mem_root_array< Group_check * > mat_tables
Children Group_checks of 'this'.
Definition: aggregate_check.h:641
table_map recheck_nullable_keys
Map of tables for which we discovered known-not-nullable columns.
Definition: aggregate_check.h:639
bool local_column(Item *item) const
Shortcut for common use of Item::local_column()
Definition: aggregate_check.h:664
bool non_null_in_source
This member is readable only if this is a child Group_check.
Definition: aggregate_check.h:600
ulonglong group_in_fd
Bit N is set if the N-th expression of GROUP BY is functionally dependent on source columns.
Definition: aggregate_check.h:623
Mem_root_array< Item_ident * > fd
Columns which are local to 'select' and functionally dependent on an initial set of "source" columns ...
Definition: aggregate_check.h:635
table_map whole_tables_fd
Map of tables for which all columns can be considered part of 'fd'.
Definition: aggregate_check.h:637
Definition: item.h:4340
bool aggregate_check_group(uchar *arg) override
Definition: item_cmpfunc.cc:7564
bool aggregate_check_distinct(uchar *arg) override
Definition: item_cmpfunc.cc:7571
bool aggregate_check_group(uchar *arg) override
This function is expected to check if GROUPING function with its arguments is "group-invariant".
Definition: item_sum.cc:6322
bool aggregate_check_distinct(uchar *arg) override
Used by Distinct_check::check_query to determine whether an error should be returned if the GROUPING ...
Definition: item_sum.cc:6292
Definition: item.h:4068
bool is_strong_side_column_not_in_fd(uchar *arg) override
Definition: item.cc:11028
bool aggregate_check_group(uchar *arg) override
Definition: item.cc:11023
bool is_column_not_in_fd(uchar *arg) override
Definition: item.cc:11036
bool aggregate_check_distinct(uchar *arg) override
Definition: item.cc:10975
bool aggregate_check_distinct(uchar *arg) override
Definition: item_sum.cc:632
bool aggregate_check_group(uchar *arg) override
Definition: item_sum.cc:656
Utility mixin class to be able to walk() only parts of item trees.
Definition: item.h:741
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:933
virtual Bool3 local_column(const Query_block *) const
Definition: item.h:2905
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:425
A per-session context which is always available at any point of execution, because in practice it's a...
Definition: opt_trace_context.h:91
A JSON object (unordered set of key/value pairs).
Definition: opt_trace.h:801
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:1161
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:35
Definition: table.h:2853
A (partial) implementation of std::deque allocating its blocks on a MEM_ROOT.
Definition: mem_root_deque.h:110
void add_to_fd(Item *item, bool local_column, bool add_to_mat_table=true)
Definition: aggregate_check.cc:497
bool do_ident_check(Item_ident *i, table_map tm, enum enum_ident_check type)
Does one low-level check on one Item_ident.
Definition: aggregate_check.cc:1206
Item * select_expression(uint idx)
Definition: aggregate_check.cc:591
void find_fd_in_cond(Item *cond, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependencies in a condition.
Definition: aggregate_check.cc:1086
void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item, table_map weak_tables, bool weak_side_upwards)
Helper function.
Definition: aggregate_check.cc:997
void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent)
Utility function for to_opt_trace(), as we need recursion in children Group_checks.
Definition: aggregate_check.cc:1169
void find_fd_in_joined_table(mem_root_deque< Table_ref * > *join_list)
Searches for equality-based functional dependencies in the condition of a join nest,...
Definition: aggregate_check.cc:1110
bool is_fd_on_source(Item *item)
Tells if 'item' is functionally dependent ("FD") on source columns.
Definition: aggregate_check.cc:342
Item * get_fd_equal(Item *item)
Definition: aggregate_check.cc:876
bool check_query(THD *thd)
Rejects the query if it has a combination of DISTINCT and ORDER BY which could lead to randomly order...
Definition: aggregate_check.cc:110
bool check_query(THD *thd)
Rejects the query if it does aggregation or grouping, and expressions in its SELECT list,...
Definition: aggregate_check.cc:179
bool check_expression(THD *thd, Item *expr, bool in_select_list)
Validates one expression (this forms one step of check_query()).
Definition: aggregate_check.cc:274
void to_opt_trace(THD *thd)
Writes "check information" to the optimizer trace.
Definition: aggregate_check.cc:1156
bool is_in_fd(Item *item)
is_in_fd() is low-level compared to is_fd_on_source().
Definition: aggregate_check.cc:684
bool is_in_fd_of_underlying(Item_ident *item)
See if we can derive a FD from a column which has an underlying expression.
Definition: aggregate_check.cc:744
void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependences in an AND-ed part of a condition (a conjunct).
Definition: aggregate_check.cc:899
void add_to_source_of_mat_table(Item_field *item_field, Table_ref *tl)
If we just added a column of a materialized table to 'fd', we record this fact in a new Group_check (...
Definition: aggregate_check.cc:619
void find_group_in_fd(Item *item)
This function must be called every time we discover an item which is FD on source columns,...
Definition: aggregate_check.cc:558
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Some integer typedefs for easier portability.
unsigned long long int ulonglong
Definition: my_inttypes.h:55
unsigned char uchar
Definition: my_inttypes.h:51
uint64_t table_map
Definition: my_table_map.h:29
required string type
Definition: replication_group_member_actions.proto:33
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:82