MySQL  8.0.21
Source Code Documentation
aggregate_check.h
Go to the documentation of this file.
1 #ifndef AGGREGATE_CHECK_INCLUDED
2 #define AGGREGATE_CHECK_INCLUDED
3 
4 /* Copyright (c) 2014, 2020, Oracle and/or its affiliates. All rights reserved.
5 
6  This program is free software; you can redistribute it and/or modify
7  it under the terms of the GNU General Public License, version 2.0,
8  as published by the Free Software Foundation.
9 
10  This program is also distributed with certain software (including
11  but not limited to OpenSSL) that is licensed under separate terms,
12  as designated in a particular file or component or in included license
13  documentation. The authors of MySQL hereby grant you an additional
14  permission to link the program and your derivative works with the
15  separately licensed software that they have included with MySQL.
16 
17  This program is distributed in the hope that it will be useful,
18  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20  GNU General Public License, version 2.0, for more details.
21 
22  You should have received a copy of the GNU General Public License
23  along with this program; if not, write to the Free Software
24  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25 
26 /**
27  @file
28  Checks for some semantic constraints on queries using GROUP
29  BY, or aggregate functions, or DISTINCT. Enforced if
30  sql_mode contains 'only_full_group_by'.
31 */
32 
33 /**
34  @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY
35 
36 Checks for some semantic constraints on queries using GROUP BY, or aggregate
37 functions, or DISTINCT (ONLY_FULL_GROUP_BY).
38 
39 We call "aggregation" the operation of taking a group of rows and replacing
40 it with a single row. There are three types of aggregation: DISTINCT,
41 implicit grouping, explicit grouping.
42 
43 This text describes MySQL's checks (why certain queries are rejected) and the
44 rationale behind them.
45 
46 References:
47 - WL#2489 "better only_full_group_by".
48 - if you have access to the SQL standard, we recommend the following parts of
49 "SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax
50 rule 7 and Conformance rule 2; 4.19 functional dependencies.
51 
52 @section DISTINCT
53 
54 DISTINCT: all identical rows in the result of SELECT are "aggregated" to
55 one single row - duplicates are eliminated.
56 If the result of the SELECT without DISTINCT is
57 @verbatim
58 1 2
59 3 4
60 1 2
61 @endverbatim
62 then the result with DISTINCT is
63 @verbatim
64 1 2
65 3 4
66 @endverbatim
67 Here is a problematic query. Assume we have a table T which contains three
68 columns C1,C2,C3 and has the following rows:
69 @verbatim
70 C1 C2 C3
71 1 2 A
72 3 4 B
73 1 2 C
74 @endverbatim
75 and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3".
76 Because we want the result to be ordered, we have to first eliminate
77 duplicates then order the result.
78 When we eliminate duplicates, should we keep the first or the third row? This
79 arbitrary choice will influence the retained value of C3, which will influence
80 ordering. In the end, we have arbitrary ordering, which is problematic.
81 To prevent this, if, in a query block 'sl', an ORDER BY expression
82 is not the same expression as one in the SELECT list of 'sl'
83 and contains a column which:
84 - is of a table whose qualifying query block is 'sl'
85 - and is not in the SELECT list of 'sl'
86 
87 then 'sl' should not have DISTINCT.
88 This rule makes the query above invalid.
89 Class Check_distinct implements this rule.
90 
91 Class Check_distinct implements a second rule, specific of set functions in
92 ORDER BY (a non-standard feature).
93 Consider these queries labelled (1), (2), (3), with DISTINCT and a set
94 function in ORDER BY:
95 @verbatim
96 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
97 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2);
98 SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
99 @endverbatim
100 (1) is a random-order query, (2) and (3) are not.
101 MySQL has traditionally been permissive, we want to allow (2) and (3).
102 
103 So the second rule is that Check_distinct rejects a query if it has DISTINCT
104 and a set function in ORDER BY which is not the same expression as one in the
105 SELECT list. This rejects (1) and allows (2).
106 It would reject (3); luckily, before Check_distinct checks, DISTINCT is
107 removed (it is redundant with GROUP BY) and thus the query is not processed by
108 Check_distinct; the second rule is thus by-passed and (3) is correctly
109 accepted.
110 
111 The implementation of Check_distinct works like this: if the query has
112 DISTINCT, examine each element of the ORDER BY list: if the element is not the
113 same expression as one in the SELECT list, then examine parts of the element
114 (using Item::walk()), to spot offending set functions or columns.
115 
116 @section IMPLICIT_GROUPING Implicit grouping
117 
118 A query with set functions without GROUP BY can be seen as having GROUP BY()
119 i.e. the set of grouping expressions is empty, all rows are part of a
120 single group and are replaced with a single row. So it is just a sub-case of
121 the next section.
122 
123 @section EXPLICIT_GROUPING Explicit grouping (GROUP BY)
124 
125 MySQL is more permissive than the standard, it allows to group by any
126 expression; it does not require that every element of the GROUP BY clause
127 should be a column of one table of the FROM clause.
128 
129 Here is a problematic query, using a table T with columns C1, C2, C3:
130 @verbatim
131 C1 C2 C3
132 1 2 A
133 3 4 B
134 1 2 C
135 
136 SELECT C1, C3 FROM T GROUP BY C1;
137 @endverbatim
138 we first form groups, in each group the value of C1 must be the same for all
139 rows. Consider the group made of the first and third row. We have to produce
140 one row out of it, and this row must have a value for C3 because C3 is in the
141 SELECT list. Among those two rows, which value of C3 should we choose? This is
142 arbitrary, and will give a random result.
143 
144 To prevent this, in a query with GROUP BY or aggregates (known as "a grouped
145 query"), any column referenced by an expression in the SELECT list or HAVING
146 condition and belonging to one table of the FROM clause, should be one of the
147 group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if
148 functional dependencies are supported (as in MySQL 5.7), should be
149 functionally dependent on the group columns.
150 In the table T above, C3 is obviously not functionally dependent on {C1,C2}:
151 the values of C1 and C2 for a row do not uniquely determine the value of
152 C3. In other words, if two rows have the same value in C1 and the same value
153 in C2 they do not necessarily have the same value in C3.
154 So this rule correctly rejects the query above.
155 
156 Note that NULL is treated as one value: two NULLs are considered equal.
157 
158 In WL#2489, we have implemented the optional standard feature T301
159 "functional dependencies" almost entirely.
160 
161 Here are the functional dependencies which we recognize.
162 
163 @subsection KEYFD Key-based, in a base table
164 
165 A key in this text is a unique constraint made of only non-NULLable
166 columns. For example, a primary key.
167 Considering a base table T, if two rows have the same values of all columns of
168 a key of T they are actually one single row, so:
169 { all columns of key} -> { T.* } (notation: what is on the right of the arrow
170 is functionally dependent on what is on the left).
171 
172 @subsection GCOLFD Generated-column-based, in a base table
173 
174 Considering a base table T, a generated column is functionally dependent on
175 the set of columns it references (the so-called parametric
176 columns). Note that the SQL standard doesn't allow a parametric column
177 to be a generated column, but MySQL does.
178 
179 @subsection INNEREQ Equality-based, in the result of an inner join
180 
181 Considering two tables T1 and T2, a condition C, and the result R of an inner
182 join T1 JOIN T2 ON C.
183 Note that T1/T2 are not necessarily base tables. For example, they can also be
184 views, or derived tables, or the results of some joins; in the rest of text,
185 we use the vague term "table" for those various objects.
186 
187 Note that C may only refer to columns of T1 or T2 (outer references
188 are forbidden by MySQL in join conditions).
189 
190 Assuming that C is a conjunction (i.e. is made of one or more conditions,
191 "conjuncts", chained together with AND):
192 - If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the
193 value of A is "the constant" in all rows of R).
194 - If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice
195 versa) holds in R (the value of T2.B is that of T1.A in all rows of R).
196 
197 @subsection OUTEREQ Equality-based, in the result of an outer join
198 
199 Considering the result R of TS LEFT JOIN TW ON C.
200 Assuming that C is a conjunction. TS is said to be the strong table, TW is
201 said to be the weak table (the one which might be NULL-complemented in the
202 result of this join).
203 To make this really clear, note that, if we have
204 "t1 left join (t2 left join t3 on C) on D":
205 - in the t2-t3 join, t2 is strong and t3 is weak.
206 - in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak.
207 
208 If C is deterministic and one conjunct is of the form TW.A=constant or
209 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
210 of TS referenced by C.
211 Proof: consider in R two rows r1 and r2 which have the same values of DJS
212 columns. Consider r1. There are two possibilities when r1 was built from a row
213 of TS:
214 - no row in TW matched the row of TS (for no row of TW has C been true): so,
215 r1 is NULL-complemented for the columns of TW. Given that r2 has the same
216 values of DJS columns as r1, and given that C is deterministic, it is sure
217 that no row in TW matched when r2 was built. So r2 is also NULL-complemented
218 for the columns of TW. So r1 and r2 have the same value of TW.A (NULL).
219 - At least one row in TW matched: so, r1 contains real values from TW (not
220 NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant
221 or to TS.B. Following the same reasoning as above, it is sure that it is also
222 the case for r2.
223 - In conclusion, we can see that r1 and r2 always have the same value of
224 TW.A.
225 
226 If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R
227 Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two
228 cases are possible:
229 - this value is NULL. Then both rows are NULL-complemented (if not, the
230 value of TW.A in TW would be NULL, which cannot match in an equality, so C is
231 not true, which is absurd). Thus, in r1 and r2 TW.B is NULL.
232 - This value is not NULL. Then both rows are not NULL-complemented, C
233 matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2.
234 - In conclusion, r1 and r2 have the same value of TW.B.
235 
236 If any conjunct references column T1.A and is not true if T1.A is NULL
237 (e.g. conjunct is T1.A > 3), and T1 is part of TW, we can replace T1 with
238  (SELECT * FROM T1 WHERE T1.A IS NOT NULL) AS T1.
239 We do not do such query rewriting, but a consequence is that we can and do
240 treat T1.A as "known not nullable in T1".
241 Proof: if we do this replacement, it means we remove from T1 some rows where
242 T1.A is NULL. For the sake of the proof, let's broaden this to "we remove from
243 and/or add to T1 rows where T1.A is NULL". By this operation,
244 - the result of T2 JOIN T1 ON C is affected in the same manner (i.e. it's like
245 if we remove from and/or add rows to the join's result, rows where T1.A is
246 NULL).
247 - the same is true for the result of T1 LEFT JOIN T2 ON C,
248 - regarding the result of T2 LEFT JOIN T1 ON C, consider a row r2 from T2:
249  - if r2 had a match with a row of T1 where T1.A is not NULL, this match is
250  preserved after replacement of T1
251  - due to removed rows, matches with rows of T1 where T1.A is NULL may be
252  lost, and so NULL-complemented rows may appear
253  - due to added rows, matches with rows of T1 where T1.A is NULL may appear
254  - so in the result of the left join, it's like if we remove from
255  and/or add rows which all have T1.A is NULL.
256 So, the net effect in all situations, on the join nest simply containing T1,
257 is that "we remove from and/or add to the nest's result rows where T1.A is
258 NULL".
259 By induction we can go up the nest tree, for every nest it's like if we remove
260 from and/or add to the join nest's result rows where T1.A is NULL.
261 Going up, finally we come to the nest TS LEFT JOIN TW ON C where our
262 NULL-rejecting conjunct is.
263 If T1 belonged to TS, we could not say anything. But we have assumed T1
264 belongs to TW: then the C condition eliminates added rows, and would have
265 eliminated removed rows anyway. Thus the result of TS LEFT JOIN TW ON C is
266 unchanged by our operation on T1. We can thus safely treat T1.A as not
267 nullable _in_T1_. Thus, if T1.A is part of a unique constraint, we may treat
268 this constraint as a primary key of T1 (@see WHEREEQ for an example).
269 
270 @subsection WHEREEQ Equality-based, in the result of a WHERE clause
271 
272 Same rule as the result of an inner join.
273 Additionally, the rule is extended to T1.A=outer_reference, because an outer
274 reference is a constant during one execution of this query.
275 
276 Moreoever, if any conjunct references column T1.A and is not true if T1.A is
277 NULL (e.g. conjunct is T1.A > 3), we can treat T1.A as not nullable in T1.
278 Proof: same as in previous section OUTEREQ.
279 We can then derive FD information from a unique index on a nullable column;
280 consider:
281  create table t1(a int null, b int null, unique(a));
282 While
283  select a,b from t1 group by a;
284 is invalid (because two null values of 'a' go in same group),
285  select a,b from t1 where 'a' is not null group by a;
286 is valid: 'a' can be treated as non-nullable in t1, so the unique index is
287 like "unique not null", so 'a' determines 'b'.
288 
289 Below we examine how functional dependencies in a table propagate to its
290 containing join nest.
291 
292 @subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C.
293 
294 All functional dependencies in TS are also functional dependencies in R.
295 Proof: trivial.
296 The same is not necessarily true for TW.
297 Let's define a "NULL-friendly functional dependency" (NFFD) as a functional
298 dependency between two sets A and B, which has two properties:
299 - A is not empty
300 - if, in a row, all values of A are NULL, then all values of B are NULL.
301 
302 All NFFDs in TW are also NFFDs in R.
303 Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the
304 same values of A columns. Two cases are possible:
305 - In r1 and r2, at least one value of A is not NULL. Then r1 is not
306 NULL-complemented. Its values for A and B come from TW. By application of the
307 functional dependency in TW, because values in A are equal in TW, values in B
308 are equal in TW and thus in r1/r2.
309 - In r1 and r2, all values of A are NULL. Two cases are possible:
310 i) r1 is not NULL-complemented. Its values for A and B come from TW. In the
311 row of TW values of A are all NULL. Because the functional dependency in
312 NULL-friendly, all values of B are NULL in the row of TW and thus in r1.
313 ii) r1 is NULL-complemented. Then all values of B in r1 are NULL.
314 iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies
315 to r2. So, values of B are identical (NULL) in r1 and r2.
316 - In conclusion, values of B are identical in r1/r2, we have proved that
317 this is a functional dependency in R, and a NULL-friendly one.
318 
319 The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to
320 have easy propagation of FDs from TW to R. It was preferred to the
321 alternative, simpler rule which says that a functional dependency A-> B in TW
322 is also a functional dependency in R if A contains a non-nullable
323 column. There are two points to note:
324 - the functional dependency of the simpler rule is an NFFD, so our rule is not
325 more restrictive than the simpler one
326 - this simpler rule prevents free propagation of functional dependencies
327 through join nests, which complicates implementation and leads to rejecting
328 queries which could be accepted. An example follows:
329 @verbatim
330 SELECT T3.A
331 FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE
332 GROUP BY T3.PK;
333 @endverbatim
334 This is what the simpler rule says for this query:
335 * In T3, T3.PK->T3.A holds.
336 * Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A
337 holds, by application of: there is a functional dependency in the
338 weak side T3, and T3.PK is not nullable in T3.
339 * Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE",
340 in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the
341 weak side (weak side is R1 here), and T3.PK is nullable _when
342 seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3
343 is actually a NULL-complemented one).
344 
345 @subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C.
346 
347 All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly]
348 functional dependencies in R. the same is true for T2.
349 Proof: trivial.
350 
351 @subsection PROPAGSUMMARY Summary rules for propagating FDs
352 
353 All NULL-friendly functional dependencies propagate freely through join nests
354 all the way up to the result of the WHERE clause.
355 The same is true for ordinary functional dependencies except if there are weak
356 tables along the propagation path between the table where the dependency holds
357 and the result of the WHERE clause; in other words, except if the table where
358 the dependency holds belongs to some embedding join nest which is on the weak
359 side of an outer join.
360 
361 @subsection NFFDS Which functional dependencies are NULL-friendly
362 
363 A key-based functional dependency A -> B in the base table is NULL-friendly,
364 because, by definition, there can never be a NULL value in any column of A.
365 
366 A functional dependency A -> B in a base table, between parametric columns A
367 and a generated column B, is not NULL-friendly; for more details, see
368 @ref FDVIEW .
369 
370 A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on
371 equality of two columns, is NULL-friendly. Indeed, A is not empty and if there
372 was some NULL in A, it would not match the equality in C and thus it would not
373 exist in the result, absurd. If the equality is rather column=constant, A is
374 empty, the dependency is not NULL-friendly. However, in our implementation,
375 function @c simplify_joins() moves inner-join conditions to the embedding
376 outer-join nest's join condition, or to the WHERE clause. Because our analysis
377 of functional dependencies happens after simplify_joins(), when we analyze T1
378 JOIN T2 it is guaranteed to have no condition, and this paragraph is
379 irrelevant.
380 
381 A functional dependency in the result of TS LEFT JOIN TW ON C, if based on
382 equality of two columns, is NULL-friendly.
383 Proof: let's consider, in turn, the two types of equality-based functional
384 dependencies which exist in this result R. Let r1 be a row of R.
385 - If C is deterministic and one conjunct is of the form TW.A=constant or
386 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
387 of TS referenced by C. For NULL-friendliness, we need DJS to not be
388 empty. Thus, we exclude the form TW.A=constant and consider only
389 TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B
390 then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in
391 r1.
392 - If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in
393 R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is
394 NULL in R1.
395 - In conclusion, this is NULL-friendly.
396 
397 A functional dependency in the result of a WHERE clause, if based on equality
398 of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it
399 has an empty set of source columns.
400 
401 Summary: all functional dependencies which we have seen so far are
402 NULL-friendly, except those inferred from TW.A=constant in an outer join
403 condition or in a WHERE clause, and those about generated columns.
404 
405 Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is
406 NULL-friendly and propagates, query is accepted.
407 
408 In our implementation, we take special care of TW.A=constant in an outer join
409 condition: we infer a functional dependency DJS->TW.A from such equality only
410 if one of these requirements are met:
411 - the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the
412 weak side of any embedding join nest - in that case, propagation will not meet
413 any weak tables so we do not need the dependency to be NULL-friendly, it will
414 propagate anyway.
415 - DJS contains at least one column from a strong-side table which, if NULL,
416 makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is
417 NULL-friendly.
418 
419 Either of these two conditions is satisfied in most practical cases. For
420 example, it's very common to have an equality between a strong-side column and
421 a weak-side column as a conjunct in the outer join condition (like, "ON
422 strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND
423 ..."); this satisfies the second condition. It's also common to have outer
424 joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..."
425 is left-deep); this satisfies the first condition.
426 Note that the dependency found from TW.A=TS.B in an outer join condition
427 always satisfies the second condition.
428 
429 T1.A=constant in a WHERE clause is exterior to any join nest so does not need
430 to propagate, so does not need to be NULL-friendly.
431 
432 @subsection FDVIEW Functional dependencies in a view or a derived table
433 
434 In the rest of this text, we will use the term "view" for "view or derived
435 table". A view can be merged or materialized, in MySQL.
436 Consider a view V defined by a query expression.
437 If the query expression contains UNION or ROLLUP (which is theoretically based
438 on UNION) there are no functional dependencies in this view.
439 So let's assume that the query expression is a query specification (let's note
440 it VS):
441 @verbatim
442 CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ...
443 [GROUP BY ...] [HAVING ...] [ORDER BY ...]
444 @endverbatim
445 
446 If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1,
447 VE2} -> {VE3} has been deduced from rules in the previous sections [and is
448 NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is
449 NULL-friendly].
450 
451 If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a
452 deterministic expression depending only on VE1 and VE2, then {C1, C2} ->
453 { C3 } in the view.
454 It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3):
455 if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example:
456 VE3 could be a literal; {}->{C3}, the left set is empty.
457 The same examples apply to a generated column in a base table - it is like a
458 merged view's expression. For example, in a base table T which has a generated
459 column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not
460 NULL-friendly.
461 
462 If VS is a grouped query (which, in MySQL, implies that the view is
463 materialized), then in the result of the grouping there is a functional
464 dependency from the set of all group expressions to the set of all selected
465 expressions (otherwise, this query expression would not have passed its own
466 only_full_group_by validation - in the implementation we validate each query
467 expression separately). Thus, if all group expressions of VS are in the select
468 list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}.
469 It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the
470 WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL,
471 VE3 is not NULL.
472 
473 It's possible to infer functional dependencies from equality conditions in
474 HAVING, but we have not implemented it yet.
475 
476 Because some functional dependencies above are not NULL-friendly, they
477 exist in the view, but may not freely propagate in the result of join nests
478 containing the view. This includes examples just given in paragraphs above,
479 and the case of T1.A=constant in the WHERE clause of VS.
480 
481 Thus, when we know of a functional dependency A -> B in the query expression
482 of a view, we deduce from it a functional dependency in the view only if:
483 - this view is not on the weak side of any embedding join nest (so
484 NULL-friendliness is not required for propagation).
485 - or A contains at least one non-nullable expression, which makes A -> B
486 NULL-friendly.
487 
488 The above is fine for materialized views. For merged views, we cannot study the
489 query expression of the view, it has been merged (and scattered), so we use a
490 different rule:
491 - a merged view is similar to a join nest inserted in the parent query, so
492 for dependencies based on keys or join conditions, we simply follow
493 propagation rules of the non-view sections.
494 - For expression-based dependencies (VE3 depends on VE1 and VE2, VE3
495 belonging to the view SELECT list), which may not be NULL-friendly, we require
496  - the same non-weak-side criterion as above
497  - or that the left set of the dependency be non-empty and that if VE1 and
498 VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly.
499 - The same solution is used for generated columns in a base table.
500 
501  @{
502 
503 */
504 
505 #include <stddef.h>
506 #include <sys/types.h>
507 
508 #include "my_alloc.h"
509 #include "my_dbug.h"
510 #include "my_inttypes.h"
511 #include "my_table_map.h"
512 #include "sql/item.h"
513 #include "sql/item_cmpfunc.h" // Item_func_any_value
514 #include "sql/item_sum.h" // Item_sum
515 #include "sql/mem_root_array.h" // Mem_root_array
516 #include "sql/sql_lex.h"
517 
518 class Opt_trace_context;
519 class Opt_trace_object;
520 class THD;
521 struct TABLE_LIST;
522 template <class T>
523 class List;
524 
525 /**
526  Checks for queries which have DISTINCT.
527 */
529  public:
531  : select(select_arg), failed_ident(nullptr) {}
532 
533  bool check_query(THD *thd);
534 
535  private:
536  /// Query block which we are validating
538  /// Identifier which triggered an error
540 
541  /**
542  Just because we need to go through Item::walk() to reach all items to
543  validate, some work must be delegated to "Item processors" (members of
544  Item); this work conceptually belongs to Distinct_check, and needs
545  privileged access to it.
546  */
547  friend bool Item_sum::aggregate_check_distinct(uchar *arg);
550 
552 };
553 
554 /**
555  Checks for queries which have GROUP BY or aggregate functions.
556 */
558  public:
559  Group_check(SELECT_LEX *select_arg, MEM_ROOT *root)
560  : select(select_arg),
561  search_in_underlying(false),
562  non_null_in_source(false),
563  table(nullptr),
564  group_in_fd(~0ULL),
565  m_root(root),
566  fd(root),
567  whole_tables_fd(0),
568  recheck_nullable_keys(0),
569  mat_tables(root),
571 
573  for (uint j = 0; j < mat_tables.size(); ++j) destroy(mat_tables.at(j));
574  }
575 
576  bool check_query(THD *thd);
577  void to_opt_trace(THD *thd);
578 
579  private:
580  /// Query block which we are validating
582 
583  /**
584  "Underlying" == expressions which are underlying in an identifier.
585  The identifier can be a column of a view or derived table, both merged or
586  materialized, or a generated column: all of those have an underlying
587  expression.
588  "Materialized table (mat table)" == view or derived table, materialized.
589  If this is true, is_in_fd() will look for FDs in underlying expressions
590  of columns.
591  */
593 
594  /**
595  This member is readable only if this is a child Group_check. Is true if
596  one expression of the SELECT list of this->select is non-nullable.
597  */
599 
600  /**
601  The Group_check employed to validate one query block, the one on which
602  check_query() runs, is named the "master".
603  If the query block references a materialized table, the master may create
604  a child Group_check, whose job is to discover FDs in the query expression
605  of the mat table (with the ultimate goal of deducing from them some FDs
606  in the mat table and thus in the parent Group_check). A child may have
607  children itself. If this Group_check is a child, 'table' points to the
608  materialized table, otherwise it is NULL.
609  */
611 
612  /**
613  Bit N is set if the N-th expression of GROUP BY is functionally dependent
614  on source columns.
615  It serves to find FDs in the query expression of a materialized table
616  having GROUP BY.
617  For a non-child Group-check, all bits are turned on from the start.
618  Currently limited to 64 bits => max 64 GROUP expressions; should probably
619  be MY_BITMAP.
620  */
622 
623  /// Memory for allocations (like of 'fd')
624  MEM_ROOT *const m_root;
625 
626  /**
627  Columns which are local to 'select' and functionally dependent on an
628  initial set of "source" columns defined like this:
629  - if !is_child(), the GROUP BY columns
630  - if is_child(), columns of the result of the query expression under
631  'table' which are themselves part of 'fd' of the parent Group_check.
632  */
634  /// Map of tables for which all columns can be considered part of 'fd'.
636  /// Map of tables for which we discovered known-not-nullable columns.
638  /// Children Group_checks of 'this'
640  /// Identifier which triggered an error
642 
643  bool is_fd_on_source(Item *item);
644  bool is_child() const { return table != nullptr; }
645 
646  /// Private ctor, for a Group_check to build a child Group_check
647  Group_check(SELECT_LEX *select_arg, MEM_ROOT *root, TABLE_LIST *table_arg)
648  : select(select_arg),
649  search_in_underlying(false),
650  non_null_in_source(false),
651  table(table_arg),
652  group_in_fd(0ULL),
653  m_root(root),
654  fd(root),
655  whole_tables_fd(0),
656  recheck_nullable_keys(0),
657  mat_tables(root) {
658  DBUG_ASSERT(table);
659  }
660  bool check_expression(THD *thd, Item *expr, bool in_select_list);
661  /// Shortcut for common use of Item::local_column()
662  bool local_column(Item *item) const {
663  return item->local_column(select).is_true();
664  }
665  void add_to_fd(Item *item, bool local_column, bool add_to_mat_table = true);
667  whole_tables_fd |= m;
668  find_group_in_fd(nullptr);
669  }
670  void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl);
671  bool is_in_fd(Item *item);
672  bool is_in_fd_of_underlying(Item_ident *item);
673  Item *get_fd_equal(Item *item);
674  void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables,
675  bool weak_side_upwards);
676  void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item,
677  table_map weak_tables, bool weak_side_upwards);
678  void find_fd_in_cond(Item *cond, table_map weak_tables,
679  bool weak_side_upwards);
680  void find_fd_in_joined_table(mem_root_deque<TABLE_LIST *> *join_list);
681  void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent);
682  void find_group_in_fd(Item *item);
683  Item *select_expression(uint idx);
684 
685  /// Enum for argument of do_ident_check()
686  enum enum_ident_check { CHECK_GROUP, CHECK_STRONG_SIDE_COLUMN, CHECK_COLUMN };
687  bool do_ident_check(Item_ident *i, table_map tm, enum enum_ident_check type);
688 
689  /**
690  Just because we need to go through Item::walk() to reach all items to
691  validate, some work must be delegated to "Item processors" (members of
692  Item); this work conceptually belongs to Group_check, and needs
693  privileged access to it.
694  */
695  friend bool Item_ident::aggregate_check_group(uchar *arg);
696  friend bool Item_sum::aggregate_check_group(uchar *arg);
699  friend bool Item_ident::is_column_not_in_fd(uchar *arg);
701 
703 };
704 
705 #endif
706 /// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY)
Definition: item.h:3593
unsigned long long int ulonglong
Definition: my_inttypes.h:55
TABLE_LIST *const table
The Group_check employed to validate one query block, the one on which check_query() runs...
Definition: aggregate_check.h:610
unsigned char uchar
Definition: my_inttypes.h:51
Definition: item.h:3354
A JSON object (unordered set of key/value pairs).
Definition: opt_trace.h:813
Some integer typedefs for easier portability.
A utility for having an std::deque which stores its elements on a MEM_ROOT.
Definition: mem_root_deque.h:34
bool is_strong_side_column_not_in_fd(uchar *arg) override
Definition: item.cc:9957
Group_check(SELECT_LEX *select_arg, MEM_ROOT *root, TABLE_LIST *table_arg)
Private ctor, for a Group_check to build a child Group_check.
Definition: aggregate_check.h:647
bool local_column(Item *item) const
Shortcut for common use of Item::local_column()
Definition: aggregate_check.h:662
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:1001
ulonglong group_in_fd
Bit N is set if the N-th expression of GROUP BY is functionally dependent on source columns...
Definition: aggregate_check.h:621
Utility mixin class to be able to walk() only parts of item trees.
Definition: item.h:701
bool check_query(THD *thd)
Rejects the query if it has a combination of DISTINCT and ORDER BY which could lead to randomly order...
Definition: aggregate_check.cc:109
FORBID_COPY_CTOR_AND_ASSIGN_OP(Distinct_check)
bool aggregate_check_group(uchar *arg) override
This function is expected to check if GROUPING function with its arguments is "group-invariant".
Definition: item_sum.cc:6055
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:539
~Group_check()
Definition: aggregate_check.h:572
enum_ident_check
Enum for argument of do_ident_check()
Definition: aggregate_check.h:686
#define DBUG_ASSERT(A)
Definition: my_dbug.h:199
bool aggregate_check_group(uchar *arg) override
Definition: item_sum.cc:656
bool non_null_in_source
This member is readable only if this is a child Group_check.
Definition: aggregate_check.h:598
virtual Bool3 local_column(const SELECT_LEX *) const
Definition: item.h:2364
Definition: aggregate_check.h:523
SELECT_LEX *const select
Query block which we are validating.
Definition: aggregate_check.h:537
Group_check(SELECT_LEX *select_arg, MEM_ROOT *root)
Definition: aggregate_check.h:559
Mem_root_array< Item_ident * > fd
Columns which are local to &#39;select&#39; and functionally dependent on an initial set of "source" columns ...
Definition: aggregate_check.h:633
Definition: item.h:741
unsigned int uint
Definition: uca-dump.cc:29
Mem_root_array< Group_check * > mat_tables
Children Group_checks of &#39;this&#39;.
Definition: aggregate_check.h:639
bool is_child() const
Definition: aggregate_check.h:644
MEM_ROOT *const m_root
Memory for allocations (like of &#39;fd&#39;)
Definition: aggregate_check.h:624
bool aggregate_check_distinct(uchar *arg) override
Definition: item.cc:9905
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:389
SELECT_LEX *const select
Query block which we are validating.
Definition: aggregate_check.h:581
bool is_true() const
Definition: item.h:595
A per-session context which is always available at any point of execution, because in practice it&#39;s a...
Definition: opt_trace_context.h:88
bool is_column_not_in_fd(uchar *arg) override
Definition: item.cc:9965
void destroy(T *ptr)
Definition: my_alloc.h:381
table_map recheck_nullable_keys
Map of tables for which we discovered known-not-nullable columns.
Definition: aggregate_check.h:637
bool search_in_underlying
"Underlying" == expressions which are underlying in an identifier.
Definition: aggregate_check.h:592
uint64_t table_map
Definition: my_table_map.h:30
bool aggregate_check_distinct(uchar *arg) override
Definition: item_sum.cc:632
void add_to_fd(table_map m)
Definition: aggregate_check.h:666
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:641
Distinct_check(SELECT_LEX *select_arg)
Definition: aggregate_check.h:530
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:77
Definition: table.h:2494
bool aggregate_check_distinct(uchar *arg) override
Definition: item_cmpfunc.cc:6963
bool aggregate_check_group(uchar *arg) override
Definition: item_cmpfunc.cc:6956
Checks for queries which have GROUP BY or aggregate functions.
Definition: aggregate_check.h:557
table_map whole_tables_fd
Map of tables for which all columns can be considered part of &#39;fd&#39;.
Definition: aggregate_check.h:635
#define false
Definition: config_static.h:43
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Checks for queries which have DISTINCT.
Definition: aggregate_check.h:528
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:799
bool aggregate_check_group(uchar *arg) override
Definition: item.cc:9952
Dialog Client Authentication nullptr
Definition: dialog.cc:353