MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Filesort optimization in 5.7.3: pack values in the sort buffer

In MySQL, filesort is the catch-all algorithm for producing sorted results for ORDER-BY or GROUP-BY queries. MySQL has two algorithms for filesort, both the original and the modified algorithms are described in the user manual.  The most commonly used algorithm is the so called modified algorithm, it is used for all cases except when BLOB and TEXT column are involved.

In 5.7.3, Tor Didriksen in the optimizer team introduced one more optimization that applies to the modified algorithm.  Let us first take a look at how MySQL´s modified filesort algorithm worked up to 5.7.2.

  1. Read the rows that match the WHERE clause.
  2. For each row, record a tuple of values consisting of the sort key value and the additional fields referenced by the query.
  3. When the sort buffer becomes full, sort the tuples by sort key value in memory and write it to a temporary file.
  4. After merge-sorting the temporary file, retrieve the rows in sorted order, read the required columns directly from the sorted tuples

The size of the sort buffer is controlled by the system variable sort_buffer_size. Obviously, the more tuples we can pack into the sort buffer, the faster filesort will execute. Our new filesort optimization allows just that. Up to 5.7.2, the values of additional fields were not packed, as a result a lot of space in the sort buffer and in the external sort file was wasted. E.g. a value of the type varchar(255) of the length 3 characters took 255 bytes. It caused a significant slow down of the external sort procedure, as less sorted data can be allocated in the sort buffer and consequently more merge passes have to performed.

Starting from 5.7.3, for additional fields of type CHAR or VARCHAR, or any nullable fixed-size data type, the values are packed.  The actual performance improvements depend on the data to be sorted of course. If the data contains many strings that are shorter than the maximum length or there are many NULL values in nullable columns, we can get significant improvements.

More info can be found at WL#1509.