MySQL  8.0.18
Source Code Documentation
aggregate_check.h
Go to the documentation of this file.
1 #ifndef AGGREGATE_CHECK_INCLUDED
2 #define AGGREGATE_CHECK_INCLUDED
3 
4 /* Copyright (c) 2014, 2019, Oracle and/or its affiliates. All rights reserved.
5 
6  This program is free software; you can redistribute it and/or modify
7  it under the terms of the GNU General Public License, version 2.0,
8  as published by the Free Software Foundation.
9 
10  This program is also distributed with certain software (including
11  but not limited to OpenSSL) that is licensed under separate terms,
12  as designated in a particular file or component or in included license
13  documentation. The authors of MySQL hereby grant you an additional
14  permission to link the program and your derivative works with the
15  separately licensed software that they have included with MySQL.
16 
17  This program is distributed in the hope that it will be useful,
18  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20  GNU General Public License, version 2.0, for more details.
21 
22  You should have received a copy of the GNU General Public License
23  along with this program; if not, write to the Free Software
24  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25 
26 /**
27  @file
28  Checks for some semantic constraints on queries using GROUP
29  BY, or aggregate functions, or DISTINCT. Enforced if
30  sql_mode contains 'only_full_group_by'.
31 */
32 
33 /**
34  @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY
35 
36 Checks for some semantic constraints on queries using GROUP BY, or aggregate
37 functions, or DISTINCT (ONLY_FULL_GROUP_BY).
38 
39 We call "aggregation" the operation of taking a group of rows and replacing
40 it with a single row. There are three types of aggregation: DISTINCT,
41 implicit grouping, explicit grouping.
42 
43 This text describes MySQL's checks (why certain queries are rejected) and the
44 rationale behind them.
45 
46 References:
47 - WL#2489 "better only_full_group_by".
48 - if you have access to the SQL standard, we recommend the following parts of
49 "SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax
50 rule 7 and Conformance rule 2; 4.19 functional dependencies.
51 
52 @section DISTINCT
53 
54 DISTINCT: all identical rows in the result of SELECT are "aggregated" to
55 one single row - duplicates are eliminated.
56 If the result of the SELECT without DISTINCT is
57 @verbatim
58 1 2
59 3 4
60 1 2
61 @endverbatim
62 then the result with DISTINCT is
63 @verbatim
64 1 2
65 3 4
66 @endverbatim
67 Here is a problematic query. Assume we have a table T which contains three
68 columns C1,C2,C3 and has the following rows:
69 @verbatim
70 C1 C2 C3
71 1 2 A
72 3 4 B
73 1 2 C
74 @endverbatim
75 and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3".
76 Because we want the result to be ordered, we have to first eliminate
77 duplicates then order the result.
78 When we eliminate duplicates, should we keep the first or the third row? This
79 arbitrary choice will influence the retained value of C3, which will influence
80 ordering. In the end, we have arbitrary ordering, which is problematic.
81 To prevent this, if, in a query block 'sl', an ORDER BY expression
82 is not the same expression as one in the SELECT list of 'sl'
83 and contains a column which:
84 - is of a table whose qualifying query block is 'sl'
85 - and is not in the SELECT list of 'sl'
86 
87 then 'sl' should not have DISTINCT.
88 This rule makes the query above invalid.
89 Class Check_distinct implements this rule.
90 
91 Class Check_distinct implements a second rule, specific of set functions in
92 ORDER BY (a non-standard feature).
93 Consider these queries labelled (1), (2), (3), with DISTINCT and a set
94 function in ORDER BY:
95 @verbatim
96 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
97 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2);
98 SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
99 @endverbatim
100 (1) is a random-order query, (2) and (3) are not.
101 MySQL has traditionally been permissive, we want to allow (2) and (3).
102 
103 So the second rule is that Check_distinct rejects a query if it has DISTINCT
104 and a set function in ORDER BY which is not the same expression as one in the
105 SELECT list. This rejects (1) and allows (2).
106 It would reject (3); luckily, before Check_distinct checks, DISTINCT is
107 removed (it is redundant with GROUP BY) and thus the query is not processed by
108 Check_distinct; the second rule is thus by-passed and (3) is correctly
109 accepted.
110 
111 The implementation of Check_distinct works like this: if the query has
112 DISTINCT, examine each element of the ORDER BY list: if the element is not the
113 same expression as one in the SELECT list, then examine parts of the element
114 (using Item::walk()), to spot offending set functions or columns.
115 
116 @section IMPLICIT_GROUPING Implicit grouping
117 
118 A query with set functions without GROUP BY can be seen as having GROUP BY()
119 i.e. the set of grouping expressions is empty, all rows are part of a
120 single group and are replaced with a single row. So it is just a sub-case of
121 the next section.
122 
123 @section EXPLICIT_GROUPING Explicit grouping (GROUP BY)
124 
125 MySQL is more permissive than the standard, it allows to group by any
126 expression; it does not require that every element of the GROUP BY clause
127 should be a column of one table of the FROM clause.
128 
129 Here is a problematic query, using a table T with columns C1, C2, C3:
130 @verbatim
131 C1 C2 C3
132 1 2 A
133 3 4 B
134 1 2 C
135 
136 SELECT C1, C3 FROM T GROUP BY C1;
137 @endverbatim
138 we first form groups, in each group the value of C1 must be the same for all
139 rows. Consider the group made of the first and third row. We have to produce
140 one row out of it, and this row must have a value for C3 because C3 is in the
141 SELECT list. Among those two rows, which value of C3 should we choose? This is
142 arbitrary, and will give a random result.
143 
144 To prevent this, in a query with GROUP BY or aggregates (known as "a grouped
145 query"), any column referenced by an expression in the SELECT list or HAVING
146 condition and belonging to one table of the FROM clause, should be one of the
147 group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if
148 functional dependencies are supported (as in MySQL 5.7), should be
149 functionally dependent on the group columns.
150 In the table T above, C3 is obviously not functionally dependent on {C1,C2}:
151 the values of C1 and C2 for a row do not uniquely determine the value of
152 C3. In other words, if two rows have the same value in C1 and the same value
153 in C2 they do not necessarily have the same value in C3.
154 So this rule correctly rejects the query above.
155 
156 Note that NULL is treated as one value: two NULLs are considered equal.
157 
158 In WL#2489, we have implemented the optional standard feature T301
159 "functional dependencies" almost entirely.
160 
161 Here are the functional dependencies which we recognize.
162 
163 @subsection KEYFD Key-based, in a base table
164 
165 A key in this text is a unique constraint made of only non-NULLable
166 columns. For example, a primary key.
167 Considering a base table T, if two rows have the same values of all columns of
168 a key of T they are actually one single row, so:
169 { all columns of key} -> { T.* } (notation: what is on the right of the arrow
170 is functionally dependent on what is on the left).
171 
172 @subsection GCOLFD Generated-column-based, in a base table
173 
174 Considering a base table T, a generated column is functionally dependent on
175 the set of columns it references (the so-called parametric
176 columns). Note that the SQL standard doesn't allow a parametric column
177 to be a generated column, but MySQL does.
178 
179 @subsection INNEREQ Equality-based, in the result of an inner join
180 
181 Considering two tables T1 and T2, a condition C, and the result R of an inner
182 join T1 JOIN T2 ON C.
183 Note that T1/T2 are not necessarily base tables. For example, they can also be
184 views, or derived tables, or the results of some joins; in the rest of text,
185 we use the vague term "table" for those various objects.
186 
187 Note that C may only refer to columns of T1 or T2 (outer references
188 are forbidden by MySQL in join conditions).
189 
190 Assuming that C is a conjunction (i.e. is made of one or more conditions,
191 "conjuncts", chained together with AND):
192 - If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the
193 value of A is "the constant" in all rows of R).
194 - If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice
195 versa) holds in R (the value of T2.B is that of T1.A in all rows of R).
196 
197 @subsection OUTEREQ Equality-based, in the result of an outer join
198 
199 Considering the result R of TS LEFT JOIN TW ON C.
200 Assuming that C is a conjunction. TS is said to be the strong table, TW is
201 said to be the weak table (the one which might be NULL-complemented in the
202 result of this join).
203 To make this really clear, note that, if we have
204 "t1 left join (t2 left join t3 on C) on D":
205 - in the t2-t3 join, t2 is strong and t3 is weak.
206 - in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak.
207 
208 If C is deterministic and one conjunct is of the form TW.A=constant or
209 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
210 of TS referenced by C.
211 Proof: consider in R two rows r1 and r2 which have the same values of DJS
212 columns. Consider r1. There are two possibilities when r1 was built from a row
213 of TS:
214 - no row in TW matched the row of TS (for no row of TW has C been true): so,
215 r1 is NULL-complemented for the columns of TW. Given that r2 has the same
216 values of DJS columns as r1, and given that C is deterministic, it is sure
217 that no row in TW matched when r2 was built. So r2 is also NULL-complemented
218 for the columns of TW. So r1 and r2 have the same value of TW.A (NULL).
219 - At least one row in TW matched: so, r1 contains real values from TW (not
220 NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant
221 or to TS.B. Following the same reasoning as above, it is sure that it is also
222 the case for r2.
223 - In conclusion, we can see that r1 and r2 always have the same value of
224 TW.A.
225 
226 If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R
227 Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two
228 cases are possible:
229 - this value is NULL. Then both rows are NULL-complemented (if not, the
230 value of TW.A in TW would be NULL, which cannot match in an equality, so C is
231 not true, which is absurd). Thus, in r1 and r2 TW.B is NULL.
232 - This value is not NULL. Then both rows are not NULL-complemented, C
233 matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2.
234 - In conclusion, r1 and r2 have the same value of TW.B.
235 
236 If any conjunct references column T1.A and is not true if T1.A is NULL
237 (e.g. conjunct is T1.A > 3), and T1 is part of TW, we can replace T1 with
238  (SELECT * FROM T1 WHERE T1.A IS NOT NULL) AS T1.
239 We do not do such query rewriting, but a consequence is that we can and do
240 treat T1.A as "known not nullable in T1".
241 Proof: if we do this replacement, it means we remove from T1 some rows where
242 T1.A is NULL. For the sake of the proof, let's broaden this to "we remove from
243 and/or add to T1 rows where T1.A is NULL". By this operation,
244 - the result of T2 JOIN T1 ON C is affected in the same manner (i.e. it's like
245 if we remove from and/or add rows to the join's result, rows where T1.A is
246 NULL).
247 - the same is true for the result of T1 LEFT JOIN T2 ON C,
248 - regarding the result of T2 LEFT JOIN T1 ON C, consider a row r2 from T2:
249  - if r2 had a match with a row of T1 where T1.A is not NULL, this match is
250  preserved after replacement of T1
251  - due to removed rows, matches with rows of T1 where T1.A is NULL may be
252  lost, and so NULL-complemented rows may appear
253  - due to added rows, matches with rows of T1 where T1.A is NULL may appear
254  - so in the result of the left join, it's like if we remove from
255  and/or add rows which all have T1.A is NULL.
256 So, the net effect in all situations, on the join nest simply containing T1,
257 is that "we remove from and/or add to the nest's result rows where T1.A is
258 NULL".
259 By induction we can go up the nest tree, for every nest it's like if we remove
260 from and/or add to the join nest's result rows where T1.A is NULL.
261 Going up, finally we come to the nest TS LEFT JOIN TW ON C where our
262 NULL-rejecting conjunct is.
263 If T1 belonged to TS, we could not say anything. But we have assumed T1
264 belongs to TW: then the C condition eliminates added rows, and would have
265 eliminated removed rows anyway. Thus the result of TS LEFT JOIN TW ON C is
266 unchanged by our operation on T1. We can thus safely treat T1.A as not
267 nullable _in_T1_. Thus, if T1.A is part of a unique constraint, we may treat
268 this constraint as a primary key of T1 (@see WHEREEQ for an example).
269 
270 @subsection WHEREEQ Equality-based, in the result of a WHERE clause
271 
272 Same rule as the result of an inner join.
273 Additionally, the rule is extended to T1.A=outer_reference, because an outer
274 reference is a constant during one execution of this query.
275 
276 Moreoever, if any conjunct references column T1.A and is not true if T1.A is
277 NULL (e.g. conjunct is T1.A > 3), we can treat T1.A as not nullable in T1.
278 Proof: same as in previous section OUTEREQ.
279 We can then derive FD information from a unique index on a nullable column;
280 consider:
281  create table t1(a int null, b int null, unique(a));
282 While
283  select a,b from t1 group by a;
284 is invalid (because two null values of 'a' go in same group),
285  select a,b from t1 where 'a' is not null group by a;
286 is valid: 'a' can be treated as non-nullable in t1, so the unique index is
287 like "unique not null", so 'a' determines 'b'.
288 
289 Below we examine how functional dependencies in a table propagate to its
290 containing join nest.
291 
292 @subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C.
293 
294 All functional dependencies in TS are also functional dependencies in R.
295 Proof: trivial.
296 The same is not necessarily true for TW.
297 Let's define a "NULL-friendly functional dependency" (NFFD) as a functional
298 dependency between two sets A and B, which has two properties:
299 - A is not empty
300 - if, in a row, all values of A are NULL, then all values of B are NULL.
301 
302 All NFFDs in TW are also NFFDs in R.
303 Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the
304 same values of A columns. Two cases are possible:
305 - In r1 and r2, at least one value of A is not NULL. Then r1 is not
306 NULL-complemented. Its values for A and B come from TW. By application of the
307 functional dependency in TW, because values in A are equal in TW, values in B
308 are equal in TW and thus in r1/r2.
309 - In r1 and r2, all values of A are NULL. Two cases are possible:
310 i) r1 is not NULL-complemented. Its values for A and B come from TW. In the
311 row of TW values of A are all NULL. Because the functional dependency in
312 NULL-friendly, all values of B are NULL in the row of TW and thus in r1.
313 ii) r1 is NULL-complemented. Then all values of B in r1 are NULL.
314 iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies
315 to r2. So, values of B are identical (NULL) in r1 and r2.
316 - In conclusion, values of B are identical in r1/r2, we have proved that
317 this is a functional dependency in R, and a NULL-friendly one.
318 
319 The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to
320 have easy propagation of FDs from TW to R. It was preferred to the
321 alternative, simpler rule which says that a functional dependency A-> B in TW
322 is also a functional dependency in R if A contains a non-nullable
323 column. There are two points to note:
324 - the functional dependency of the simpler rule is an NFFD, so our rule is not
325 more restrictive than the simpler one
326 - this simpler rule prevents free propagation of functional dependencies
327 through join nests, which complicates implementation and leads to rejecting
328 queries which could be accepted. An example follows:
329 @verbatim
330 SELECT T3.A
331 FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE
332 GROUP BY T3.PK;
333 @endverbatim
334 This is what the simpler rule says for this query:
335 * In T3, T3.PK->T3.A holds.
336 * Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A
337 holds, by application of: there is a functional dependency in the
338 weak side T3, and T3.PK is not nullable in T3.
339 * Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE",
340 in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the
341 weak side (weak side is R1 here), and T3.PK is nullable _when
342 seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3
343 is actually a NULL-complemented one).
344 
345 @subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C.
346 
347 All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly]
348 functional dependencies in R. the same is true for T2.
349 Proof: trivial.
350 
351 @subsection PROPAGSUMMARY Summary rules for propagating FDs
352 
353 All NULL-friendly functional dependencies propagate freely through join nests
354 all the way up to the result of the WHERE clause.
355 The same is true for ordinary functional dependencies except if there are weak
356 tables along the propagation path between the table where the dependency holds
357 and the result of the WHERE clause; in other words, except if the table where
358 the dependency holds belongs to some embedding join nest which is on the weak
359 side of an outer join.
360 
361 @subsection NFFDS Which functional dependencies are NULL-friendly
362 
363 A key-based functional dependency A -> B in the base table is NULL-friendly,
364 because, by definition, there can never be a NULL value in any column of A.
365 
366 A functional dependency A -> B in a base table, between parametric columns A
367 and a generated column B, is not NULL-friendly; for more details, see
368 @ref FDVIEW .
369 
370 A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on
371 equality of two columns, is NULL-friendly. Indeed, A is not empty and if there
372 was some NULL in A, it would not match the equality in C and thus it would not
373 exist in the result, absurd. If the equality is rather column=constant, A is
374 empty, the dependency is not NULL-friendly. However, in our implementation,
375 function @c simplify_joins() moves inner-join conditions to the embedding
376 outer-join nest's join condition, or to the WHERE clause. Because our analysis
377 of functional dependencies happens after simplify_joins(), when we analyze T1
378 JOIN T2 it is guaranteed to have no condition, and this paragraph is
379 irrelevant.
380 
381 A functional dependency in the result of TS LEFT JOIN TW ON C, if based on
382 equality of two columns, is NULL-friendly.
383 Proof: let's consider, in turn, the two types of equality-based functional
384 dependencies which exist in this result R. Let r1 be a row of R.
385 - If C is deterministic and one conjunct is of the form TW.A=constant or
386 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
387 of TS referenced by C. For NULL-friendliness, we need DJS to not be
388 empty. Thus, we exclude the form TW.A=constant and consider only
389 TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B
390 then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in
391 r1.
392 - If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in
393 R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is
394 NULL in R1.
395 - In conclusion, this is NULL-friendly.
396 
397 A functional dependency in the result of a WHERE clause, if based on equality
398 of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it
399 has an empty set of source columns.
400 
401 Summary: all functional dependencies which we have seen so far are
402 NULL-friendly, except those inferred from TW.A=constant in an outer join
403 condition or in a WHERE clause, and those about generated columns.
404 
405 Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is
406 NULL-friendly and propagates, query is accepted.
407 
408 In our implementation, we take special care of TW.A=constant in an outer join
409 condition: we infer a functional dependency DJS->TW.A from such equality only
410 if one of these requirements are met:
411 - the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the
412 weak side of any embedding join nest - in that case, propagation will not meet
413 any weak tables so we do not need the dependency to be NULL-friendly, it will
414 propagate anyway.
415 - DJS contains at least one column from a strong-side table which, if NULL,
416 makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is
417 NULL-friendly.
418 
419 Either of these two conditions is satisfied in most practical cases. For
420 example, it's very common to have an equality between a strong-side column and
421 a weak-side column as a conjunct in the outer join condition (like, "ON
422 strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND
423 ..."); this satisfies the second condition. It's also common to have outer
424 joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..."
425 is left-deep); this satisfies the first condition.
426 Note that the dependency found from TW.A=TS.B in an outer join condition
427 always satisfies the second condition.
428 
429 T1.A=constant in a WHERE clause is exterior to any join nest so does not need
430 to propagate, so does not need to be NULL-friendly.
431 
432 @subsection FDVIEW Functional dependencies in a view or a derived table
433 
434 In the rest of this text, we will use the term "view" for "view or derived
435 table". A view can be merged or materialized, in MySQL.
436 Consider a view V defined by a query expression.
437 If the query expression contains UNION or ROLLUP (which is theoretically based
438 on UNION) there are no functional dependencies in this view.
439 So let's assume that the query expression is a query specification (let's note
440 it VS):
441 @verbatim
442 CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ...
443 [GROUP BY ...] [HAVING ...] [ORDER BY ...]
444 @endverbatim
445 
446 If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1,
447 VE2} -> {VE3} has been deduced from rules in the previous sections [and is
448 NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is
449 NULL-friendly].
450 
451 If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a
452 deterministic expression depending only on VE1 and VE2, then {C1, C2} ->
453 { C3 } in the view.
454 It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3):
455 if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example:
456 VE3 could be a literal; {}->{C3}, the left set is empty.
457 The same examples apply to a generated column in a base table - it is like a
458 merged view's expression. For example, in a base table T which has a generated
459 column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not
460 NULL-friendly.
461 
462 If VS is a grouped query (which, in MySQL, implies that the view is
463 materialized), then in the result of the grouping there is a functional
464 dependency from the set of all group expressions to the set of all selected
465 expressions (otherwise, this query expression would not have passed its own
466 only_full_group_by validation - in the implementation we validate each query
467 expression separately). Thus, if all group expressions of VS are in the select
468 list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}.
469 It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the
470 WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL,
471 VE3 is not NULL.
472 
473 It's possible to infer functional dependencies from equality conditions in
474 HAVING, but we have not implemented it yet.
475 
476 Because some functional dependencies above are not NULL-friendly, they
477 exist in the view, but may not freely propagate in the result of join nests
478 containing the view. This includes examples just given in paragraphs above,
479 and the case of T1.A=constant in the WHERE clause of VS.
480 
481 Thus, when we know of a functional dependency A -> B in the query expression
482 of a view, we deduce from it a functional dependency in the view only if:
483 - this view is not on the weak side of any embedding join nest (so
484 NULL-friendliness is not required for propagation).
485 - or A contains at least one non-nullable expression, which makes A -> B
486 NULL-friendly.
487 
488 The above is fine for materialized views. For merged views, we cannot study the
489 query expression of the view, it has been merged (and scattered), so we use a
490 different rule:
491 - a merged view is similar to a join nest inserted in the parent query, so
492 for dependencies based on keys or join conditions, we simply follow
493 propagation rules of the non-view sections.
494 - For expression-based dependencies (VE3 depends on VE1 and VE2, VE3
495 belonging to the view SELECT list), which may not be NULL-friendly, we require
496  - the same non-weak-side criterion as above
497  - or that the left set of the dependency be non-empty and that if VE1 and
498 VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly.
499 - The same solution is used for generated columns in a base table.
500 
501  @{
502 
503 */
504 
505 #include <stddef.h>
506 #include <sys/types.h>
507 
508 #include "my_alloc.h"
509 #include "my_dbug.h"
510 #include "my_inttypes.h"
511 #include "my_table_map.h"
512 #include "sql/item.h"
513 #include "sql/item_cmpfunc.h" // Item_func_any_value
514 #include "sql/item_sum.h" // Item_sum
515 #include "sql/mem_root_array.h" // Mem_root_array
516 #include "sql/sql_lex.h"
517 
518 class Opt_trace_context;
519 class Opt_trace_object;
520 class THD;
521 struct TABLE_LIST;
522 template <class T>
523 class List;
524 
525 /**
526  Re-usable shortcut, when it does not make sense to do copy objects of a
527  class named "myclass"; add this to a private section of the class. The
528  implementations are intentionally not created, so if someone tries to use
529  them like in "myclass A= B" there will be a linker error.
530 */
531 #define FORBID_COPY_CTOR_AND_ASSIGN_OP(myclass) \
532  myclass(myclass const &); \
533  void operator=(myclass const &)
534 
535 /**
536  Utility mixin class to be able to walk() only parts of item trees.
537 
538  Used with PREFIX+POSTFIX walk: in the prefix call of the Item
539  processor, we process the item X, may decide that its children should not
540  be processed (just like if they didn't exist): processor calls stop_at(X)
541  for that. Then walk() goes to a child Y; the processor tests is_stopped(Y)
542  which returns true, so processor sees that it must not do any processing
543  and returns immediately. Finally, the postfix call to the processor on X
544  tests is_stopped(X) which returns "true" and understands that the
545  not-to-be-processed children have been skipped so calls restart(). Thus,
546  any sibling of X, any part of the Item tree not under X, can then be
547  processed.
548 */
550  protected:
553 
554  /// Stops walking children of this item
555  void stop_at(const Item *i) {
557  stopped_at_item = i;
558  }
559 
560  /**
561  @returns if we are stopped. If item 'i' is where we stopped, restarts the
562  walk for next items.
563  */
564  bool is_stopped(const Item *i) {
565  if (stopped_at_item) {
566  /*
567  Walking was disabled for a tree part rooted a one ancestor of 'i' or
568  rooted at 'i'.
569  */
570  if (stopped_at_item == i) {
571  /*
572  Walking was disabled for the tree part rooted at 'i'; we have now just
573  returned back to this root (POSTFIX call), left the tree part:
574  enable the walk again, for other tree parts.
575  */
577  }
578  // No further processing to do for this item:
579  return true;
580  }
581  return false;
582  }
583 
584  private:
587 };
588 
589 /**
590  Checks for queries which have DISTINCT.
591 */
593  public:
595  : select(select_arg), failed_ident(NULL) {}
596 
597  bool check_query(THD *thd);
598 
599  private:
600  /// Query block which we are validating
602  /// Identifier which triggered an error
604 
605  /**
606  Just because we need to go through Item::walk() to reach all items to
607  validate, some work must be delegated to "Item processors" (members of
608  Item); this work conceptually belongs to Distinct_check, and needs
609  privileged access to it.
610  */
611  friend bool Item_sum::aggregate_check_distinct(uchar *arg);
614 
616 };
617 
618 /**
619  Checks for queries which have GROUP BY or aggregate functions.
620 */
622  public:
623  Group_check(SELECT_LEX *select_arg, MEM_ROOT *root)
624  : select(select_arg),
627  table(NULL),
628  group_in_fd(~0ULL),
629  m_root(root),
630  fd(root),
631  whole_tables_fd(0),
633  mat_tables(root),
634  failed_ident(NULL) {}
635 
637  for (uint j = 0; j < mat_tables.size(); ++j) destroy(mat_tables.at(j));
638  }
639 
640  bool check_query(THD *thd);
641  void to_opt_trace(THD *thd);
642 
643  private:
644  /// Query block which we are validating
646 
647  /**
648  "Underlying" == expressions which are underlying in an identifier.
649  The identifier can be a column of a view or derived table, both merged or
650  materialized, or a generated column: all of those have an underlying
651  expression.
652  "Materialized table (mat table)" == view or derived table, materialized.
653  If this is true, is_in_fd() will look for FDs in underlying expressions
654  of columns.
655  */
657 
658  /**
659  This member is readable only if this is a child Group_check. Is true if
660  one expression of the SELECT list of this->select is non-nullable.
661  */
663 
664  /**
665  The Group_check employed to validate one query block, the one on which
666  check_query() runs, is named the "master".
667  If the query block references a materialized table, the master may create
668  a child Group_check, whose job is to discover FDs in the query expression
669  of the mat table (with the ultimate goal of deducing from them some FDs
670  in the mat table and thus in the parent Group_check). A child may have
671  children itself. If this Group_check is a child, 'table' points to the
672  materialized table, otherwise it is NULL.
673  */
675 
676  /**
677  Bit N is set if the N-th expression of GROUP BY is functionally dependent
678  on source columns.
679  It serves to find FDs in the query expression of a materialized table
680  having GROUP BY.
681  For a non-child Group-check, all bits are turned on from the start.
682  Currently limited to 64 bits => max 64 GROUP expressions; should probably
683  be MY_BITMAP.
684  */
686 
687  /// Memory for allocations (like of 'fd')
688  MEM_ROOT *const m_root;
689 
690  /**
691  Columns which are local to 'select' and functionally dependent on an
692  initial set of "source" columns defined like this:
693  - if !is_child(), the GROUP BY columns
694  - if is_child(), columns of the result of the query expression under
695  'table' which are themselves part of 'fd' of the parent Group_check.
696  */
698  /// Map of tables for which all columns can be considered part of 'fd'.
700  /// Map of tables for which we discovered known-not-nullable columns.
702  /// Children Group_checks of 'this'
704  /// Identifier which triggered an error
706 
707  bool is_fd_on_source(Item *item);
708  bool is_child() const { return table != NULL; }
709 
710  /// Private ctor, for a Group_check to build a child Group_check
711  Group_check(SELECT_LEX *select_arg, MEM_ROOT *root, TABLE_LIST *table_arg)
712  : select(select_arg),
715  table(table_arg),
716  group_in_fd(0ULL),
717  m_root(root),
718  fd(root),
719  whole_tables_fd(0),
721  mat_tables(root) {
723  }
724  bool check_expression(THD *thd, Item *expr, bool in_select_list);
725  /// Shortcut for common use of Item::local_column()
726  bool local_column(Item *item) const {
727  return item->local_column(select).is_true();
728  }
729  void add_to_fd(Item *item, bool local_column, bool add_to_mat_table = true);
731  whole_tables_fd |= m;
733  }
734  void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl);
735  bool is_in_fd(Item *item);
737  Item *get_fd_equal(Item *item);
738  void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables,
739  bool weak_side_upwards);
740  void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item,
741  table_map weak_tables, bool weak_side_upwards);
742  void find_fd_in_cond(Item *cond, table_map weak_tables,
743  bool weak_side_upwards);
744  void find_fd_in_joined_table(List<TABLE_LIST> *join_list);
746  void find_group_in_fd(Item *item);
748 
749  /// Enum for argument of do_ident_check()
752 
753  /**
754  Just because we need to go through Item::walk() to reach all items to
755  validate, some work must be delegated to "Item processors" (members of
756  Item); this work conceptually belongs to Group_check, and needs
757  privileged access to it.
758  */
759  friend bool Item_ident::aggregate_check_group(uchar *arg);
760  friend bool Item_sum::aggregate_check_group(uchar *arg);
763  friend bool Item_ident::is_column_not_in_fd(uchar *arg);
765 
767 };
768 
769 #endif
770 /// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY)
Definition: item.h:3341
unsigned long long int ulonglong
Definition: my_inttypes.h:55
TABLE_LIST *const table
The Group_check employed to validate one query block, the one on which check_query() runs...
Definition: aggregate_check.h:674
unsigned char uchar
Definition: my_inttypes.h:51
bool is_fd_on_source(Item *item)
Tells if &#39;item&#39; is functionally dependent ("FD") on source columns.
Definition: aggregate_check.cc:333
Definition: item.h:3126
bool is_in_fd_of_underlying(Item_ident *item)
See if we can derive a FD from a column which has an underlying expression.
Definition: aggregate_check.cc:721
bool is_in_fd(Item *item)
is_in_fd() is low-level compared to is_fd_on_source().
Definition: aggregate_check.cc:661
void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent)
Utility function for to_opt_trace(), as we need recursion in children Group_checks.
Definition: aggregate_check.cc:1145
ulonglong table_map
Definition: my_table_map.h:32
bool do_ident_check(Item_ident *i, table_map tm, enum enum_ident_check type)
Does one low-level check on one Item_ident.
Definition: aggregate_check.cc:1182
A JSON object (unordered set of key/value pairs).
Definition: opt_trace.h:813
bool check_query(THD *thd)
Rejects the query if it does aggregation or grouping, and expressions in its SELECT list...
Definition: aggregate_check.cc:177
Some integer typedefs for easier portability.
bool is_strong_side_column_not_in_fd(uchar *arg) override
Definition: item.cc:9591
Group_check(SELECT_LEX *select_arg, MEM_ROOT *root, TABLE_LIST *table_arg)
Private ctor, for a Group_check to build a child Group_check.
Definition: aggregate_check.h:711
bool local_column(Item *item) const
Shortcut for common use of Item::local_column()
Definition: aggregate_check.h:726
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:971
void to_opt_trace(THD *thd)
Writes "check information" to the optimizer trace.
Definition: aggregate_check.cc:1132
void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependences in an AND-ed part of a condition (a conjunct)...
Definition: aggregate_check.cc:875
ulonglong group_in_fd
Bit N is set if the N-th expression of GROUP BY is functionally dependent on source columns...
Definition: aggregate_check.h:685
FORBID_COPY_CTOR_AND_ASSIGN_OP(Item_tree_walker)
Utility mixin class to be able to walk() only parts of item trees.
Definition: aggregate_check.h:549
bool check_query(THD *thd)
Rejects the query if it has a combination of DISTINCT and ORDER BY which could lead to randomly order...
Definition: aggregate_check.cc:108
Item * select_expression(uint idx)
Definition: aggregate_check.cc:575
FORBID_COPY_CTOR_AND_ASSIGN_OP(Distinct_check)
bool aggregate_check_group(uchar *arg) override
This function is expected to check if GROUPING function with its arguments is "group-invariant".
Definition: item_sum.cc:5899
Element_type & at(size_t n)
Definition: mem_root_array.h:97
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:603
~Item_tree_walker()
Definition: aggregate_check.h:552
~Group_check()
Definition: aggregate_check.h:636
enum_ident_check
Enum for argument of do_ident_check()
Definition: aggregate_check.h:750
#define DBUG_ASSERT(A)
Definition: my_dbug.h:197
bool aggregate_check_group(uchar *arg) override
Definition: item_sum.cc:625
void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl)
If we just added a column of a materialized table to &#39;fd&#39;, we record this fact in a new Group_check (...
Definition: aggregate_check.cc:599
bool non_null_in_source
This member is readable only if this is a child Group_check.
Definition: aggregate_check.h:662
FORBID_COPY_CTOR_AND_ASSIGN_OP(Group_check)
virtual Bool3 local_column(const SELECT_LEX *) const
Definition: item.h:2243
void find_group_in_fd(Item *item)
This function must be called every time we discover an item which is FD on source columns...
Definition: aggregate_check.cc:542
void find_fd_in_joined_table(List< TABLE_LIST > *join_list)
Searches for equality-based functional dependencies in the condition of a join nest, and recursively in all contained join nests.
Definition: aggregate_check.cc:1085
Definition: aggregate_check.h:523
bool is_stopped(const Item *i)
Definition: aggregate_check.h:564
SELECT_LEX *const select
Query block which we are validating.
Definition: aggregate_check.h:601
Group_check(SELECT_LEX *select_arg, MEM_ROOT *root)
Definition: aggregate_check.h:623
Mem_root_array< Item_ident * > fd
Columns which are local to &#39;select&#39; and functionally dependent on an initial set of "source" columns ...
Definition: aggregate_check.h:697
Definition: item.h:668
unsigned int uint
Definition: uca-dump.cc:29
Mem_root_array< Group_check * > mat_tables
Children Group_checks of &#39;this&#39;.
Definition: aggregate_check.h:703
size_t size() const
Definition: mem_root_array.h:379
bool is_child() const
Definition: aggregate_check.h:708
MEM_ROOT *const m_root
Memory for allocations (like of &#39;fd&#39;)
Definition: aggregate_check.h:688
bool aggregate_check_distinct(uchar *arg) override
Definition: item.cc:9539
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:398
Item_tree_walker()
Definition: aggregate_check.h:551
SELECT_LEX *const select
Query block which we are validating.
Definition: aggregate_check.h:645
const Item * stopped_at_item
Definition: aggregate_check.h:585
bool is_true() const
Definition: item.h:594
A per-session context which is always available at any point of execution, because in practice it&#39;s ...
Definition: opt_trace_context.h:88
void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item, table_map weak_tables, bool weak_side_upwards)
Helper function.
Definition: aggregate_check.cc:972
bool is_column_not_in_fd(uchar *arg) override
Definition: item.cc:9599
int type
Definition: http_common.h:411
void destroy(T *ptr)
Definition: my_alloc.h:381
table_map recheck_nullable_keys
Map of tables for which we discovered known-not-nullable columns.
Definition: aggregate_check.h:701
#define NULL
Definition: types.h:55
bool search_in_underlying
"Underlying" == expressions which are underlying in an identifier.
Definition: aggregate_check.h:656
void stop_at(const Item *i)
Stops walking children of this item.
Definition: aggregate_check.h:555
bool aggregate_check_distinct(uchar *arg) override
Definition: item_sum.cc:601
void add_to_fd(table_map m)
Definition: aggregate_check.h:730
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:705
Item * get_fd_equal(Item *item)
Definition: aggregate_check.cc:852
Definition: aggregate_check.h:750
Definition: aggregate_check.h:750
Distinct_check(SELECT_LEX *select_arg)
Definition: aggregate_check.h:594
bool check_expression(THD *thd, Item *expr, bool in_select_list)
Validates one expression (this forms one step of check_query()).
Definition: aggregate_check.cc:267
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:77
Definition: table.h:2468
bool aggregate_check_distinct(uchar *arg) override
Definition: item_cmpfunc.cc:6872
bool aggregate_check_group(uchar *arg) override
Definition: item_cmpfunc.cc:6865
Checks for queries which have GROUP BY or aggregate functions.
Definition: aggregate_check.h:621
void add_to_fd(Item *item, bool local_column, bool add_to_mat_table=true)
Definition: aggregate_check.cc:481
Definition: aggregate_check.h:750
table_map whole_tables_fd
Map of tables for which all columns can be considered part of &#39;fd&#39;.
Definition: aggregate_check.h:699
void find_fd_in_cond(Item *cond, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependencies in a condition.
Definition: aggregate_check.cc:1061
#define false
Definition: config_static.h:43
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Checks for queries which have DISTINCT.
Definition: aggregate_check.h:592
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:778
bool aggregate_check_group(uchar *arg) override
Definition: item.cc:9586