MySQL  8.0.27
Source Code Documentation
aggregate_check.h
Go to the documentation of this file.
1 #ifndef AGGREGATE_CHECK_INCLUDED
2 #define AGGREGATE_CHECK_INCLUDED
3 
4 /* Copyright (c) 2014, 2021, Oracle and/or its affiliates.
5 
6  This program is free software; you can redistribute it and/or modify
7  it under the terms of the GNU General Public License, version 2.0,
8  as published by the Free Software Foundation.
9 
10  This program is also distributed with certain software (including
11  but not limited to OpenSSL) that is licensed under separate terms,
12  as designated in a particular file or component or in included license
13  documentation. The authors of MySQL hereby grant you an additional
14  permission to link the program and your derivative works with the
15  separately licensed software that they have included with MySQL.
16 
17  This program is distributed in the hope that it will be useful,
18  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20  GNU General Public License, version 2.0, for more details.
21 
22  You should have received a copy of the GNU General Public License
23  along with this program; if not, write to the Free Software
24  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25 
26 /**
27  @file
28  Checks for some semantic constraints on queries using GROUP
29  BY, or aggregate functions, or DISTINCT. Enforced if
30  sql_mode contains 'only_full_group_by'.
31 */
32 
33 /**
34  @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY
35 
36 Checks for some semantic constraints on queries using GROUP BY, or aggregate
37 functions, or DISTINCT (ONLY_FULL_GROUP_BY).
38 
39 We call "aggregation" the operation of taking a group of rows and replacing
40 it with a single row. There are three types of aggregation: DISTINCT,
41 implicit grouping, explicit grouping.
42 
43 This text describes MySQL's checks (why certain queries are rejected) and the
44 rationale behind them.
45 
46 References:
47 - WL#2489 "better only_full_group_by".
48 - if you have access to the SQL standard, we recommend the following parts of
49 "SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax
50 rule 7 and Conformance rule 2; 4.19 functional dependencies.
51 
52 @section DISTINCT
53 
54 DISTINCT: all identical rows in the result of SELECT are "aggregated" to
55 one single row - duplicates are eliminated.
56 If the result of the SELECT without DISTINCT is
57 @verbatim
58 1 2
59 3 4
60 1 2
61 @endverbatim
62 then the result with DISTINCT is
63 @verbatim
64 1 2
65 3 4
66 @endverbatim
67 Here is a problematic query. Assume we have a table T which contains three
68 columns C1,C2,C3 and has the following rows:
69 @verbatim
70 C1 C2 C3
71 1 2 A
72 3 4 B
73 1 2 C
74 @endverbatim
75 and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3".
76 Because we want the result to be ordered, we have to first eliminate
77 duplicates then order the result.
78 When we eliminate duplicates, should we keep the first or the third row? This
79 arbitrary choice will influence the retained value of C3, which will influence
80 ordering. In the end, we have arbitrary ordering, which is problematic.
81 To prevent this, if, in a query block 'sl', an ORDER BY expression
82 is not the same expression as one in the SELECT list of 'sl'
83 and contains a column which:
84 - is of a table whose qualifying query block is 'sl'
85 - and is not in the SELECT list of 'sl'
86 
87 then 'sl' should not have DISTINCT.
88 This rule makes the query above invalid.
89 Class Check_distinct implements this rule.
90 
91 Class Check_distinct implements a second rule, specific of set functions in
92 ORDER BY (a non-standard feature).
93 Consider these queries labelled (1), (2), (3), with DISTINCT and a set
94 function in ORDER BY:
95 @verbatim
96 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
97 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2);
98 SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
99 @endverbatim
100 (1) is a random-order query, (2) and (3) are not.
101 MySQL has traditionally been permissive, we want to allow (2) and (3).
102 
103 So the second rule is that Check_distinct rejects a query if it has DISTINCT
104 and a set function in ORDER BY which is not the same expression as one in the
105 SELECT list. This rejects (1) and allows (2).
106 It would reject (3); luckily, before Check_distinct checks, DISTINCT is
107 removed (it is redundant with GROUP BY) and thus the query is not processed by
108 Check_distinct; the second rule is thus by-passed and (3) is correctly
109 accepted.
110 
111 The implementation of Check_distinct works like this: if the query has
112 DISTINCT, examine each element of the ORDER BY list: if the element is not the
113 same expression as one in the SELECT list, then examine parts of the element
114 (using Item::walk()), to spot offending set functions or columns.
115 
116 @section IMPLICIT_GROUPING Implicit grouping
117 
118 A query with set functions without GROUP BY can be seen as having GROUP BY()
119 i.e. the set of grouping expressions is empty, all rows are part of a
120 single group and are replaced with a single row. So it is just a sub-case of
121 the next section.
122 
123 @section EXPLICIT_GROUPING Explicit grouping (GROUP BY)
124 
125 MySQL is more permissive than the standard, it allows to group by any
126 expression; it does not require that every element of the GROUP BY clause
127 should be a column of one table of the FROM clause.
128 
129 Here is a problematic query, using a table T with columns C1, C2, C3:
130 @verbatim
131 C1 C2 C3
132 1 2 A
133 3 4 B
134 1 2 C
135 
136 SELECT C1, C3 FROM T GROUP BY C1;
137 @endverbatim
138 we first form groups, in each group the value of C1 must be the same for all
139 rows. Consider the group made of the first and third row. We have to produce
140 one row out of it, and this row must have a value for C3 because C3 is in the
141 SELECT list. Among those two rows, which value of C3 should we choose? This is
142 arbitrary, and will give a random result.
143 
144 To prevent this, in a query with GROUP BY or aggregates (known as "a grouped
145 query"), any column referenced by an expression in the SELECT list or HAVING
146 condition and belonging to one table of the FROM clause, should be one of the
147 group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if
148 functional dependencies are supported (as in MySQL 5.7), should be
149 functionally dependent on the group columns.
150 In the table T above, C3 is obviously not functionally dependent on {C1,C2}:
151 the values of C1 and C2 for a row do not uniquely determine the value of
152 C3. In other words, if two rows have the same value in C1 and the same value
153 in C2 they do not necessarily have the same value in C3.
154 So this rule correctly rejects the query above.
155 
156 Note that NULL is treated as one value: two NULLs are considered equal.
157 
158 In WL#2489, we have implemented the optional standard feature T301
159 "functional dependencies" almost entirely.
160 
161 Here are the functional dependencies which we recognize.
162 
163 @subsection KEYFD Key-based, in a base table
164 
165 A key in this text is a unique constraint made of only non-NULLable
166 columns. For example, a primary key.
167 Considering a base table T, if two rows have the same values of all columns of
168 a key of T they are actually one single row, so:
169 { all columns of key} -> { T.* } (notation: what is on the right of the arrow
170 is functionally dependent on what is on the left).
171 
172 @subsection GCOLFD Generated-column-based, in a base table
173 
174 Considering a base table T, a generated column is functionally dependent on
175 the set of columns it references (the so-called parametric
176 columns). Note that the SQL standard doesn't allow a parametric column
177 to be a generated column, but MySQL does.
178 
179 @subsection INNEREQ Equality-based, in the result of an inner join
180 
181 Considering two tables T1 and T2, a condition C, and the result R of an inner
182 join T1 JOIN T2 ON C.
183 Note that T1/T2 are not necessarily base tables. For example, they can also be
184 views, or derived tables, or the results of some joins; in the rest of text,
185 we use the vague term "table" for those various objects.
186 
187 Note that C may only refer to columns of T1 or T2 (outer references
188 are forbidden by MySQL in join conditions).
189 
190 Assuming that C is a conjunction (i.e. is made of one or more conditions,
191 "conjuncts", chained together with AND):
192 - If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the
193 value of A is "the constant" in all rows of R).
194 - If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice
195 versa) holds in R (the value of T2.B is that of T1.A in all rows of R).
196 
197 @subsection OUTEREQ Equality-based, in the result of an outer join
198 
199 Considering the result R of TS LEFT JOIN TW ON C.
200 Assuming that C is a conjunction. TS is said to be the strong table, TW is
201 said to be the weak table (the one which might be NULL-complemented in the
202 result of this join).
203 To make this really clear, note that, if we have
204 "t1 left join (t2 left join t3 on C) on D":
205 - in the t2-t3 join, t2 is strong and t3 is weak.
206 - in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak.
207 
208 If C is deterministic and one conjunct is of the form TW.A=constant or
209 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
210 of TS referenced by C.
211 Proof: consider in R two rows r1 and r2 which have the same values of DJS
212 columns. Consider r1. There are two possibilities when r1 was built from a row
213 of TS:
214 - no row in TW matched the row of TS (for no row of TW has C been true): so,
215 r1 is NULL-complemented for the columns of TW. Given that r2 has the same
216 values of DJS columns as r1, and given that C is deterministic, it is sure
217 that no row in TW matched when r2 was built. So r2 is also NULL-complemented
218 for the columns of TW. So r1 and r2 have the same value of TW.A (NULL).
219 - At least one row in TW matched: so, r1 contains real values from TW (not
220 NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant
221 or to TS.B. Following the same reasoning as above, it is sure that it is also
222 the case for r2.
223 - In conclusion, we can see that r1 and r2 always have the same value of
224 TW.A.
225 
226 If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R
227 Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two
228 cases are possible:
229 - this value is NULL. Then both rows are NULL-complemented (if not, the
230 value of TW.A in TW would be NULL, which cannot match in an equality, so C is
231 not true, which is absurd). Thus, in r1 and r2 TW.B is NULL.
232 - This value is not NULL. Then both rows are not NULL-complemented, C
233 matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2.
234 - In conclusion, r1 and r2 have the same value of TW.B.
235 
236 If any conjunct references column T1.A and is not true if T1.A is NULL
237 (e.g. conjunct is T1.A > 3), and T1 is part of TW, we can replace T1 with
238  (SELECT * FROM T1 WHERE T1.A IS NOT NULL) AS T1.
239 We do not do such query rewriting, but a consequence is that we can and do
240 treat T1.A as "known not nullable in T1".
241 Proof: if we do this replacement, it means we remove from T1 some rows where
242 T1.A is NULL. For the sake of the proof, let's broaden this to "we remove from
243 and/or add to T1 rows where T1.A is NULL". By this operation,
244 - the result of T2 JOIN T1 ON C is affected in the same manner (i.e. it's like
245 if we remove from and/or add rows to the join's result, rows where T1.A is
246 NULL).
247 - the same is true for the result of T1 LEFT JOIN T2 ON C,
248 - regarding the result of T2 LEFT JOIN T1 ON C, consider a row r2 from T2:
249  - if r2 had a match with a row of T1 where T1.A is not NULL, this match is
250  preserved after replacement of T1
251  - due to removed rows, matches with rows of T1 where T1.A is NULL may be
252  lost, and so NULL-complemented rows may appear
253  - due to added rows, matches with rows of T1 where T1.A is NULL may appear
254  - so in the result of the left join, it's like if we remove from
255  and/or add rows which all have T1.A is NULL.
256 So, the net effect in all situations, on the join nest simply containing T1,
257 is that "we remove from and/or add to the nest's result rows where T1.A is
258 NULL".
259 By induction we can go up the nest tree, for every nest it's like if we remove
260 from and/or add to the join nest's result rows where T1.A is NULL.
261 Going up, finally we come to the nest TS LEFT JOIN TW ON C where our
262 NULL-rejecting conjunct is.
263 If T1 belonged to TS, we could not say anything. But we have assumed T1
264 belongs to TW: then the C condition eliminates added rows, and would have
265 eliminated removed rows anyway. Thus the result of TS LEFT JOIN TW ON C is
266 unchanged by our operation on T1. We can thus safely treat T1.A as not
267 nullable _in_T1_. Thus, if T1.A is part of a unique constraint, we may treat
268 this constraint as a primary key of T1 (@see WHEREEQ for an example).
269 
270 @subsection WHEREEQ Equality-based, in the result of a WHERE clause
271 
272 Same rule as the result of an inner join.
273 Additionally, the rule is extended to T1.A=outer_reference, because an outer
274 reference is a constant during one execution of this query.
275 
276 Moreoever, if any conjunct references column T1.A and is not true if T1.A is
277 NULL (e.g. conjunct is T1.A > 3), we can treat T1.A as not nullable in T1.
278 Proof: same as in previous section OUTEREQ.
279 We can then derive FD information from a unique index on a nullable column;
280 consider:
281  create table t1(a int null, b int null, unique(a));
282 While
283  select a,b from t1 group by a;
284 is invalid (because two null values of 'a' go in same group),
285  select a,b from t1 where 'a' is not null group by a;
286 is valid: 'a' can be treated as non-nullable in t1, so the unique index is
287 like "unique not null", so 'a' determines 'b'.
288 
289 Below we examine how functional dependencies in a table propagate to its
290 containing join nest.
291 
292 @subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C.
293 
294 All functional dependencies in TS are also functional dependencies in R.
295 Proof: trivial.
296 The same is not necessarily true for TW.
297 Let's define a "NULL-friendly functional dependency" (NFFD) as a functional
298 dependency between two sets A and B, which has two properties:
299 - A is not empty
300 - if, in a row, all values of A are NULL, then all values of B are NULL.
301 
302 All NFFDs in TW are also NFFDs in R.
303 Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the
304 same values of A columns. Two cases are possible:
305 - In r1 and r2, at least one value of A is not NULL. Then r1 is not
306 NULL-complemented. Its values for A and B come from TW. By application of the
307 functional dependency in TW, because values in A are equal in TW, values in B
308 are equal in TW and thus in r1/r2.
309 - In r1 and r2, all values of A are NULL. Two cases are possible:
310 i) r1 is not NULL-complemented. Its values for A and B come from TW. In the
311 row of TW values of A are all NULL. Because the functional dependency in
312 NULL-friendly, all values of B are NULL in the row of TW and thus in r1.
313 ii) r1 is NULL-complemented. Then all values of B in r1 are NULL.
314 iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies
315 to r2. So, values of B are identical (NULL) in r1 and r2.
316 - In conclusion, values of B are identical in r1/r2, we have proved that
317 this is a functional dependency in R, and a NULL-friendly one.
318 
319 The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to
320 have easy propagation of FDs from TW to R. It was preferred to the
321 alternative, simpler rule which says that a functional dependency A-> B in TW
322 is also a functional dependency in R if A contains a non-nullable
323 column. There are two points to note:
324 - the functional dependency of the simpler rule is an NFFD, so our rule is not
325 more restrictive than the simpler one
326 - this simpler rule prevents free propagation of functional dependencies
327 through join nests, which complicates implementation and leads to rejecting
328 queries which could be accepted. An example follows:
329 @verbatim
330 SELECT T3.A
331 FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE
332 GROUP BY T3.PK;
333 @endverbatim
334 This is what the simpler rule says for this query:
335 * In T3, T3.PK->T3.A holds.
336 * Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A
337 holds, by application of: there is a functional dependency in the
338 weak side T3, and T3.PK is not nullable in T3.
339 * Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE",
340 in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the
341 weak side (weak side is R1 here), and T3.PK is nullable _when
342 seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3
343 is actually a NULL-complemented one).
344 
345 @subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C.
346 
347 All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly]
348 functional dependencies in R. the same is true for T2.
349 Proof: trivial.
350 
351 @subsection PROPAGSUMMARY Summary rules for propagating FDs
352 
353 All NULL-friendly functional dependencies propagate freely through join nests
354 all the way up to the result of the WHERE clause.
355 The same is true for ordinary functional dependencies except if there are weak
356 tables along the propagation path between the table where the dependency holds
357 and the result of the WHERE clause; in other words, except if the table where
358 the dependency holds belongs to some embedding join nest which is on the weak
359 side of an outer join.
360 
361 @subsection NFFDS Which functional dependencies are NULL-friendly
362 
363 A key-based functional dependency A -> B in the base table is NULL-friendly,
364 because, by definition, there can never be a NULL value in any column of A.
365 
366 A functional dependency A -> B in a base table, between parametric columns A
367 and a generated column B, is not NULL-friendly; for more details, see
368 @ref FDVIEW .
369 
370 A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on
371 equality of two columns, is NULL-friendly. Indeed, A is not empty and if there
372 was some NULL in A, it would not match the equality in C and thus it would not
373 exist in the result, absurd. If the equality is rather column=constant, A is
374 empty, the dependency is not NULL-friendly. However, in our implementation,
375 function @c simplify_joins() moves inner-join conditions to the embedding
376 outer-join nest's join condition, or to the WHERE clause. Because our analysis
377 of functional dependencies happens after simplify_joins(), when we analyze T1
378 JOIN T2 it is guaranteed to have no condition, and this paragraph is
379 irrelevant.
380 
381 A functional dependency in the result of TS LEFT JOIN TW ON C, if based on
382 equality of two columns, is NULL-friendly.
383 Proof: let's consider, in turn, the two types of equality-based functional
384 dependencies which exist in this result R. Let r1 be a row of R.
385 - If C is deterministic and one conjunct is of the form TW.A=constant or
386 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
387 of TS referenced by C. For NULL-friendliness, we need DJS to not be
388 empty. Thus, we exclude the form TW.A=constant and consider only
389 TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B
390 then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in
391 r1.
392 - If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in
393 R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is
394 NULL in R1.
395 - In conclusion, this is NULL-friendly.
396 
397 A functional dependency in the result of a WHERE clause, if based on equality
398 of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it
399 has an empty set of source columns.
400 
401 Summary: all functional dependencies which we have seen so far are
402 NULL-friendly, except those inferred from TW.A=constant in an outer join
403 condition or in a WHERE clause, and those about generated columns.
404 
405 Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is
406 NULL-friendly and propagates, query is accepted.
407 
408 In our implementation, we take special care of TW.A=constant in an outer join
409 condition: we infer a functional dependency DJS->TW.A from such equality only
410 if one of these requirements are met:
411 - the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the
412 weak side of any embedding join nest - in that case, propagation will not meet
413 any weak tables so we do not need the dependency to be NULL-friendly, it will
414 propagate anyway.
415 - DJS contains at least one column from a strong-side table which, if NULL,
416 makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is
417 NULL-friendly.
418 
419 Either of these two conditions is satisfied in most practical cases. For
420 example, it's very common to have an equality between a strong-side column and
421 a weak-side column as a conjunct in the outer join condition (like, "ON
422 strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND
423 ..."); this satisfies the second condition. It's also common to have outer
424 joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..."
425 is left-deep); this satisfies the first condition.
426 Note that the dependency found from TW.A=TS.B in an outer join condition
427 always satisfies the second condition.
428 
429 T1.A=constant in a WHERE clause is exterior to any join nest so does not need
430 to propagate, so does not need to be NULL-friendly.
431 
432 @subsection FDVIEW Functional dependencies in a view or a derived table
433 
434 In the rest of this text, we will use the term "view" for "view or derived
435 table". A view can be merged or materialized, in MySQL.
436 Consider a view V defined by a query expression.
437 If the query expression contains UNION or ROLLUP (which is theoretically based
438 on UNION) there are no functional dependencies in this view.
439 So let's assume that the query expression is a query specification (let's note
440 it VS):
441 @verbatim
442 CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ...
443 [GROUP BY ...] [HAVING ...] [ORDER BY ...]
444 @endverbatim
445 
446 If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1,
447 VE2} -> {VE3} has been deduced from rules in the previous sections [and is
448 NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is
449 NULL-friendly].
450 
451 If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a
452 deterministic expression depending only on VE1 and VE2, then {C1, C2} ->
453 { C3 } in the view.
454 It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3):
455 if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example:
456 VE3 could be a literal; {}->{C3}, the left set is empty.
457 The same examples apply to a generated column in a base table - it is like a
458 merged view's expression. For example, in a base table T which has a generated
459 column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not
460 NULL-friendly.
461 
462 If VS is a grouped query (which, in MySQL, implies that the view is
463 materialized), then in the result of the grouping there is a functional
464 dependency from the set of all group expressions to the set of all selected
465 expressions (otherwise, this query expression would not have passed its own
466 only_full_group_by validation - in the implementation we validate each query
467 expression separately). Thus, if all group expressions of VS are in the select
468 list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}.
469 It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the
470 WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL,
471 VE3 is not NULL.
472 
473 It's possible to infer functional dependencies from equality conditions in
474 HAVING, but we have not implemented it yet.
475 
476 Because some functional dependencies above are not NULL-friendly, they
477 exist in the view, but may not freely propagate in the result of join nests
478 containing the view. This includes examples just given in paragraphs above,
479 and the case of T1.A=constant in the WHERE clause of VS.
480 
481 Thus, when we know of a functional dependency A -> B in the query expression
482 of a view, we deduce from it a functional dependency in the view only if:
483 - this view is not on the weak side of any embedding join nest (so
484 NULL-friendliness is not required for propagation).
485 - or A contains at least one non-nullable expression, which makes A -> B
486 NULL-friendly.
487 
488 The above is fine for materialized views. For merged views, we cannot study the
489 query expression of the view, it has been merged (and scattered), so we use a
490 different rule:
491 - a merged view is similar to a join nest inserted in the parent query, so
492 for dependencies based on keys or join conditions, we simply follow
493 propagation rules of the non-view sections.
494 - For expression-based dependencies (VE3 depends on VE1 and VE2, VE3
495 belonging to the view SELECT list), which may not be NULL-friendly, we require
496  - the same non-weak-side criterion as above
497  - or that the left set of the dependency be non-empty and that if VE1 and
498 VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly.
499 - The same solution is used for generated columns in a base table.
500 
501  @{
502 
503 */
504 
505 #include <assert.h>
506 #include <sys/types.h>
507 
508 #include "my_alloc.h"
509 
510 #include "my_inttypes.h"
511 #include "my_table_map.h"
512 #include "sql/item.h"
513 #include "sql/item_cmpfunc.h" // Item_func_any_value
514 #include "sql/item_sum.h" // Item_sum
515 #include "sql/mem_root_array.h" // Mem_root_array
516 
517 class Opt_trace_context;
518 class Opt_trace_object;
519 class Query_block;
520 class THD;
521 struct TABLE_LIST;
522 
523 template <class T>
524 class mem_root_deque;
525 
526 /**
527  Checks for queries which have DISTINCT.
528 */
530  public:
532  : select(select_arg), failed_ident(nullptr) {}
533  Distinct_check(const Distinct_check &) = delete;
535 
536  bool check_query(THD *thd);
537 
538  private:
539  /// Query block which we are validating
541  /// Identifier which triggered an error
543 
544  /**
545  Just because we need to go through Item::walk() to reach all items to
546  validate, some work must be delegated to "Item processors" (members of
547  Item); this work conceptually belongs to Distinct_check, and needs
548  privileged access to it.
549  */
553 };
554 
555 /**
556  Checks for queries which have GROUP BY or aggregate functions.
557 */
559  public:
560  Group_check(Query_block *select_arg, MEM_ROOT *root)
561  : select(select_arg),
562  search_in_underlying(false),
563  non_null_in_source(false),
564  table(nullptr),
565  group_in_fd(~0ULL),
566  m_root(root),
567  fd(root),
568  whole_tables_fd(0),
570  mat_tables(root),
572 
574  for (uint j = 0; j < mat_tables.size(); ++j) destroy(mat_tables.at(j));
575  }
576  Group_check(const Group_check &) = delete;
577  Group_check &operator=(const Group_check &) = delete;
578 
579  bool check_query(THD *thd);
580  void to_opt_trace(THD *thd);
581 
582  private:
583  /// Query block which we are validating
585 
586  /**
587  "Underlying" == expressions which are underlying in an identifier.
588  The identifier can be a column of a view or derived table, both merged or
589  materialized, or a generated column: all of those have an underlying
590  expression.
591  "Materialized table (mat table)" == view or derived table, materialized.
592  If this is true, is_in_fd() will look for FDs in underlying expressions
593  of columns.
594  */
596 
597  /**
598  This member is readable only if this is a child Group_check. Is true if
599  one expression of the SELECT list of this->select is non-nullable.
600  */
602 
603  /**
604  The Group_check employed to validate one query block, the one on which
605  check_query() runs, is named the "master".
606  If the query block references a materialized table, the master may create
607  a child Group_check, whose job is to discover FDs in the query expression
608  of the mat table (with the ultimate goal of deducing from them some FDs
609  in the mat table and thus in the parent Group_check). A child may have
610  children itself. If this Group_check is a child, 'table' points to the
611  materialized table, otherwise it is NULL.
612  */
614 
615  /**
616  Bit N is set if the N-th expression of GROUP BY is functionally dependent
617  on source columns.
618  It serves to find FDs in the query expression of a materialized table
619  having GROUP BY.
620  For a non-child Group-check, all bits are turned on from the start.
621  Currently limited to 64 bits => max 64 GROUP expressions; should probably
622  be MY_BITMAP.
623  */
625 
626  /// Memory for allocations (like of 'fd')
627  MEM_ROOT *const m_root;
628 
629  /**
630  Columns which are local to 'select' and functionally dependent on an
631  initial set of "source" columns defined like this:
632  - if !is_child(), the GROUP BY columns
633  - if is_child(), columns of the result of the query expression under
634  'table' which are themselves part of 'fd' of the parent Group_check.
635  */
637  /// Map of tables for which all columns can be considered part of 'fd'.
639  /// Map of tables for which we discovered known-not-nullable columns.
641  /// Children Group_checks of 'this'
643  /// Identifier which triggered an error
645 
646  bool is_fd_on_source(Item *item);
647  bool is_child() const { return table != nullptr; }
648 
649  /// Private ctor, for a Group_check to build a child Group_check
650  Group_check(Query_block *select_arg, MEM_ROOT *root, TABLE_LIST *table_arg)
651  : select(select_arg),
652  search_in_underlying(false),
653  non_null_in_source(false),
654  table(table_arg),
655  group_in_fd(0ULL),
656  m_root(root),
657  fd(root),
658  whole_tables_fd(0),
660  mat_tables(root) {
661  assert(table);
662  }
663  bool check_expression(THD *thd, Item *expr, bool in_select_list);
664  /// Shortcut for common use of Item::local_column()
665  bool local_column(Item *item) const {
666  return item->local_column(select).is_true();
667  }
668  void add_to_fd(Item *item, bool local_column, bool add_to_mat_table = true);
670  whole_tables_fd |= m;
671  find_group_in_fd(nullptr);
672  }
673  void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl);
674  bool is_in_fd(Item *item);
676  Item *get_fd_equal(Item *item);
677  void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables,
678  bool weak_side_upwards);
679  void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item,
680  table_map weak_tables, bool weak_side_upwards);
681  void find_fd_in_cond(Item *cond, table_map weak_tables,
682  bool weak_side_upwards);
685  void find_group_in_fd(Item *item);
687 
688  /// Enum for argument of do_ident_check()
691 
692  /**
693  Just because we need to go through Item::walk() to reach all items to
694  validate, some work must be delegated to "Item processors" (members of
695  Item); this work conceptually belongs to Group_check, and needs
696  privileged access to it.
697  */
704 };
705 
706 #endif
707 /// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY)
bool is_true() const
Definition: item.h:598
Checks for queries which have DISTINCT.
Definition: aggregate_check.h:529
Distinct_check & operator=(const Distinct_check &)=delete
Distinct_check(const Distinct_check &)=delete
Query_block *const select
Query block which we are validating.
Definition: aggregate_check.h:540
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:542
Distinct_check(Query_block *select_arg)
Definition: aggregate_check.h:531
Checks for queries which have GROUP BY or aggregate functions.
Definition: aggregate_check.h:558
MEM_ROOT *const m_root
Memory for allocations (like of 'fd')
Definition: aggregate_check.h:627
Item_ident * failed_ident
Identifier which triggered an error.
Definition: aggregate_check.h:644
Group_check & operator=(const Group_check &)=delete
void add_to_fd(table_map m)
Definition: aggregate_check.h:669
TABLE_LIST *const table
The Group_check employed to validate one query block, the one on which check_query() runs,...
Definition: aggregate_check.h:613
Query_block *const select
Query block which we are validating.
Definition: aggregate_check.h:584
enum_ident_check
Enum for argument of do_ident_check()
Definition: aggregate_check.h:689
@ CHECK_COLUMN
Definition: aggregate_check.h:689
@ CHECK_STRONG_SIDE_COLUMN
Definition: aggregate_check.h:689
@ CHECK_GROUP
Definition: aggregate_check.h:689
Group_check(Query_block *select_arg, MEM_ROOT *root)
Definition: aggregate_check.h:560
Group_check(const Group_check &)=delete
~Group_check()
Definition: aggregate_check.h:573
bool search_in_underlying
"Underlying" == expressions which are underlying in an identifier.
Definition: aggregate_check.h:595
bool is_child() const
Definition: aggregate_check.h:647
Mem_root_array< Group_check * > mat_tables
Children Group_checks of 'this'.
Definition: aggregate_check.h:642
table_map recheck_nullable_keys
Map of tables for which we discovered known-not-nullable columns.
Definition: aggregate_check.h:640
bool local_column(Item *item) const
Shortcut for common use of Item::local_column()
Definition: aggregate_check.h:665
Group_check(Query_block *select_arg, MEM_ROOT *root, TABLE_LIST *table_arg)
Private ctor, for a Group_check to build a child Group_check.
Definition: aggregate_check.h:650
bool non_null_in_source
This member is readable only if this is a child Group_check.
Definition: aggregate_check.h:601
ulonglong group_in_fd
Bit N is set if the N-th expression of GROUP BY is functionally dependent on source columns.
Definition: aggregate_check.h:624
Mem_root_array< Item_ident * > fd
Columns which are local to 'select' and functionally dependent on an initial set of "source" columns ...
Definition: aggregate_check.h:636
table_map whole_tables_fd
Map of tables for which all columns can be considered part of 'fd'.
Definition: aggregate_check.h:638
Definition: item.h:4027
bool aggregate_check_group(uchar *arg) override
Definition: item_cmpfunc.cc:7336
bool aggregate_check_distinct(uchar *arg) override
Definition: item_cmpfunc.cc:7343
bool aggregate_check_group(uchar *arg) override
This function is expected to check if GROUPING function with its arguments is "group-invariant".
Definition: item_sum.cc:6162
Definition: item.h:3755
bool is_strong_side_column_not_in_fd(uchar *arg) override
Definition: item.cc:10264
bool aggregate_check_group(uchar *arg) override
Definition: item.cc:10259
bool is_column_not_in_fd(uchar *arg) override
Definition: item.cc:10272
bool aggregate_check_distinct(uchar *arg) override
Definition: item.cc:10213
bool aggregate_check_distinct(uchar *arg) override
Definition: item_sum.cc:598
bool aggregate_check_group(uchar *arg) override
Definition: item_sum.cc:622
Utility mixin class to be able to walk() only parts of item trees.
Definition: item.h:736
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:802
virtual Bool3 local_column(const Query_block *) const
Definition: item.h:2671
Element_type & at(size_t n)
Definition: mem_root_array.h:90
size_t size() const
Definition: mem_root_array.h:399
A per-session context which is always available at any point of execution, because in practice it's a...
Definition: opt_trace_context.h:89
A JSON object (unordered set of key/value pairs).
Definition: opt_trace.h:801
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:1123
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:821
A (partial) implementation of std::deque allocating its blocks on a MEM_ROOT.
Definition: mem_root_deque.h:109
Dialog Client Authentication nullptr
Definition: dialog.cc:352
void add_to_fd(Item *item, bool local_column, bool add_to_mat_table=true)
Definition: aggregate_check.cc:490
bool do_ident_check(Item_ident *i, table_map tm, enum enum_ident_check type)
Does one low-level check on one Item_ident.
Definition: aggregate_check.cc:1199
Item * select_expression(uint idx)
Definition: aggregate_check.cc:584
void find_fd_in_cond(Item *cond, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependencies in a condition.
Definition: aggregate_check.cc:1079
void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item, table_map weak_tables, bool weak_side_upwards)
Helper function.
Definition: aggregate_check.cc:990
void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent)
Utility function for to_opt_trace(), as we need recursion in children Group_checks.
Definition: aggregate_check.cc:1162
bool is_fd_on_source(Item *item)
Tells if 'item' is functionally dependent ("FD") on source columns.
Definition: aggregate_check.cc:335
void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl)
If we just added a column of a materialized table to 'fd', we record this fact in a new Group_check (...
Definition: aggregate_check.cc:612
Item * get_fd_equal(Item *item)
Definition: aggregate_check.cc:869
bool check_query(THD *thd)
Rejects the query if it has a combination of DISTINCT and ORDER BY which could lead to randomly order...
Definition: aggregate_check.cc:110
void find_fd_in_joined_table(mem_root_deque< TABLE_LIST * > *join_list)
Searches for equality-based functional dependencies in the condition of a join nest,...
Definition: aggregate_check.cc:1103
bool check_query(THD *thd)
Rejects the query if it does aggregation or grouping, and expressions in its SELECT list,...
Definition: aggregate_check.cc:179
bool check_expression(THD *thd, Item *expr, bool in_select_list)
Validates one expression (this forms one step of check_query()).
Definition: aggregate_check.cc:267
void to_opt_trace(THD *thd)
Writes "check information" to the optimizer trace.
Definition: aggregate_check.cc:1149
bool is_in_fd(Item *item)
is_in_fd() is low-level compared to is_fd_on_source().
Definition: aggregate_check.cc:677
bool is_in_fd_of_underlying(Item_ident *item)
See if we can derive a FD from a column which has an underlying expression.
Definition: aggregate_check.cc:737
void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables, bool weak_side_upwards)
Searches for equality-based functional dependences in an AND-ed part of a condition (a conjunct).
Definition: aggregate_check.cc:892
void find_group_in_fd(Item *item)
This function must be called every time we discover an item which is FD on source columns,...
Definition: aggregate_check.cc:551
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Some integer typedefs for easier portability.
unsigned long long int ulonglong
Definition: my_inttypes.h:55
unsigned char uchar
Definition: my_inttypes.h:51
uint64_t table_map
Definition: my_table_map.h:29
static mysql_service_status_t destroy(reference_caching_channel channel) noexcept
Definition: component.cc:49
required string type
Definition: replication_group_member_actions.proto:33
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:78
Definition: table.h:2694
unsigned int uint
Definition: uca-dump.cc:29