WL#9727: Additional aggregate window functions
Affects: Server-8.0
—
Status: Complete
This is a followup of the work done in WL#9236 Add SQL window functions. Please refer to that WL for a general background and specifications. Under this work log we will add the aggregate window functions missing from that WL: - MIN - MAX - STD, STDDEV, STDDEV_POP (same function) - STDDEV_SAMP - VARIANCE, VAR_POP (same function) - VAR_SAMP These will be postponed till a later worklog: - BIT_AND - BIT_OR - BIT_XOR - JSON_ARRAYAGG - JSON_OBJECTAGG - GROUP_CONCAT
The evaluation of MIN/MAX is not possible using inversion, unless we know something about the ordering of the argument. We should evaluate using first/last value logic if there is an ORDER BY for the window which has as its first argument (in the list of ordering expressions), the expression which we seek MIN/MAX of. Otherwise, we need to visit all rows in a window's frame to determine MIN/MAX in a moving window, which gives a non-linear performance. NOTE: If a MIN/MAX is used on a window for which other window functions are also evaluated, the performance characteristics of those other window functions will also be non-linear, due to the presence of the MIN/MAX function. The evaluation of the statistics functions STD, STDDEV_SAMP, VARIANCE and VAR_SAMP (and their synonyms) can be evaluated in two modes: an optimized mode and a default mode, cf. the evaluation if SUM of floating point types in WL#9236. Ideally, we would always use optimized mode, but it we do not due to the following reason: For optimized mode we need to use an incremental algorithm to compute the aggregate functions. When the window frame moves, old values may move out of the frame and new values may move into the frame. In optimized mode, the function is evaluated by inverting the contribution to the function result from the row values that move out of the frame and by adding the contribution of those row values that move into the frame. There exists an incremental algorithm for computing variance in this way, but it produces slightly different results in the last significant digits than the algorithm MySQL uses for Grouped aggregates of the same functions. In practice the results are probably just fine, but in the interest of giving expected results, we do not use this incremental algorithm unless instructed by the user to do so. The same variable that is used for controlling optimized evaluation for floating values for SUM is used: windowing_use_high_precision [false / true (default) ] For large result sets the default algorithm may be too slow; so turning this variable to true should be considered.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.