WL#12788: Iterator executor analytics queries

Affects: Server-8.0   —   Status: Complete

Building on the work in WL#12074 and WL#12470, this worklog seeks to broaden 
the scope of the analytics queries the iterator executor can handle, by 
supporting window functions, rollup, and final deduplication (used on DISTINCT 
queries that can't deduplicate directly in the final temporary table).
No (new) functional requirements. All queries should remain the same. Even 
performance schema results should be the same. Some warnings may end up on 
different line numbers.

No (new) non-functional requirements. Performance should not regress.
The new features are:

Windowing functions: Two new iterators, WindowingIterator and 
BufferingWindowingIterator. These work pretty much like the existing aggregate 
iterators, except that the aggregates are windowing aggregates. They reuse most 
of the existing WF functions behind the scenes.

Rollup: The biggest challenge with rollup is that we need to replace the grouping 
items with NULLs in the rollup rows, even for non-nullable fields. In the old 
executor, which was push-based, this was solved by switching the field list when 
pushing the additional rows. With the current iterator design, which is pull-
based and has no explicit field list (arguably a shortcoming, but not one which 
is too easily fix), we cannot easily do this. Instead, we replace the field list 
with a set of Item_refs, which can then be pointed to different places when 
rollup rows are to be returned.

Final deduplication: Normally, deduplication (SELECT DISTINCT) happens by adding 
a unique index to the final temporary table. However, in the cases where we do 
aggregation directly into a temporary table, we cannot use such an index, since 
rows change during query execution and thus cannot be deduplicated on-the-fly. 
(E.g., consider SELECT DISTINCT COUNT(*) FROM t1 GROUP BY f1.) The old executor 
solves this by adding a post-pass that actually deletes rows from the temporary 
table; for small tables, it uses a hash table to deduplicate, but for larger 
ones, it uses an O(n^2) algorithm based on pairwise comparison, which is 
extremely slow. Neither fits very well in an iterator design, and thus, we 
replace this step by filesort, which is consistently O(n log n) with a small 
constant factor. Filesort needs to be extended with support for deduplicating 
rows, which is done as part of this work.

As part of this worklog, we are also finally removing the concept of execution 
stages (creating sort index, sorting for group, sorting for order, sorting 
result, removing duplicates); all of execution will now simply become 
"executing". Execution stages have not given out meaningful information for many, 
many years as execution has been overlapping, and with iterators, they overlap 
even more, so it is meaningless to say that we are doing one stage or the other.