Documentation Home
MySQL 5.6 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 30.9Mb
PDF (A4) - 31.0Mb
PDF (RPM) - 30.2Mb
EPUB - 7.7Mb
HTML Download (TGZ) - 7.5Mb
HTML Download (Zip) - 7.5Mb
HTML Download (RPM) - 6.5Mb
Eclipse Doc Plugin (TGZ) - 8.2Mb
Eclipse Doc Plugin (Zip) - 10.1Mb
Man Pages (TGZ) - 181.3Kb
Man Pages (Zip) - 292.4Kb
Info (Gzip) - 2.9Mb
Info (Zip) - 2.9Mb
Excerpts from this Manual

8.2.1.15 ORDER BY Optimization

In some cases, MySQL can use an index to satisfy an ORDER BY clause without doing extra sorting.

The index can also be used even if the ORDER BY does not match the index exactly, as long as all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause. The following queries use the index to resolve the ORDER BY part:

SELECT * FROM t1
  ORDER BY key_part1,key_part2,... ;

SELECT * FROM t1
  WHERE key_part1 = constant
  ORDER BY key_part2;

SELECT * FROM t1
  ORDER BY key_part1 DESC, key_part2 DESC;

SELECT * FROM t1
  WHERE key_part1 = 1
  ORDER BY key_part1 DESC, key_part2 DESC;

SELECT * FROM t1
  WHERE key_part1 > constant
  ORDER BY key_part1 ASC;

SELECT * FROM t1
  WHERE key_part1 < constant
  ORDER BY key_part1 DESC;

SELECT * FROM t1
  WHERE key_part1 = constant1 AND key_part2 > constant2
  ORDER BY key_part2;

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

  • The query uses ORDER BY on different indexes:

    SELECT * FROM t1 ORDER BY key1, key2;
    
  • The query uses ORDER BY on nonconsecutive parts of an index:

    SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2;
    
  • The query mixes ASC and DESC:

    SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
    
  • The index used to fetch the rows differs from the one used in the ORDER BY:

    SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
    
  • The query uses ORDER BY with an expression that includes terms other than the index column name:

    SELECT * FROM t1 ORDER BY ABS(key);
    SELECT * FROM t1 ORDER BY -key;
    
  • The query joins many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

  • The query has different ORDER BY and GROUP BY expressions.

  • There is an index on only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if only the first 10 bytes of a CHAR(20) column are indexed, the index cannot distinguish values past the 10th byte and a filesort will be needed.

  • The index does not store rows in order. For example, this is true for a HASH index in a MEMORY table.

Availability of an index for sorting may be affected by the use of column aliases. Suppose that the column t1.a is indexed. In this statement, the name of the column in the select list is a. It refers to t1.a, so for the reference to a in the ORDER BY, the index can be used:

SELECT a FROM t1 ORDER BY a;

In this statement, the name of the column in the select list is also a, but it is the alias name. It refers to ABS(a), so for the reference to a in the ORDER BY, the index cannot be used:

SELECT ABS(a) AS a FROM t1 ORDER BY a;

In the following statement, the ORDER BY refers to a name that is not the name of a column in the select list. But there is a column in t1 named a, so the ORDER BY uses that and the index can be used. (The resulting sort order may be completely different from the order for ABS(a), of course.)

SELECT ABS(a) AS b FROM t1 ORDER BY a;

By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well. If you include an explicit ORDER BY clause that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs.

Note

Relying on implicit GROUP BY sorting is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause. GROUP BY sorting is a MySQL extension that may change in a future release; for example, to make it possible for the optimizer to order groupings in whatever manner it deems most efficient and to avoid the sorting overhead.

If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL. For example:

INSERT INTO foo
SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;

The optimizer may still choose to use sorting to implement grouping operations. ORDER BY NULL suppresses sorting of the result, not prior sorting done by grouping operations to determine the result.

With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column. See Section 8.8.1, “Optimizing Queries with EXPLAIN”. Filesort uses a fixed-length row-storage format similar to that used by the MEMORY storage engine. Variable-length types such as VARCHAR are stored using a fixed length.

MySQL has two filesort algorithms for sorting and retrieving results. The original method uses only the ORDER BY columns. The modified method uses not just the ORDER BY columns, but all the columns referenced by the query.

The optimizer selects which filesort algorithm to use. It normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm. For both algorithms, the sort buffer size is the sort_buffer_size system variable value.

The original filesort algorithm works as follows:

  1. Read all rows according to key or by table scanning. Skip rows that do not match the WHERE clause.

  2. For each row, store in the sort buffer a tuple consisting of a pair of values (the sort key value and the row ID).

  3. If all pairs fit into the sort buffer, no temporary file is created. Otherwise, when the sort buffer becomes full, run a qsort (quicksort) on it in memory and write it to a temporary file. Save a pointer to the sorted block.

  4. Repeat the preceding steps until all rows have been read.

  5. Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.

  6. Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left.

  7. On the last multi-merge, only the row ID (the last part of the value pair) is written to a result file.

  8. Read the rows in sorted order using the row IDs in the result file. To optimize this, read in a large block of row IDs, sort them, and use them to read the rows in sorted order into a row buffer. The row buffer size is the read_rnd_buffer_size system variable value. The code for this step is in the sql/records.cc source file.

One problem with this approach is that it reads rows twice: One time during WHERE clause evaluation, and again after sorting the value pairs. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.)

The modified filesort algorithm incorporates an optimization to avoid reading the rows twice: It records the sort key value, but instead of the row ID, it records the columns referenced by the query. The modified filesort algorithm works like this:

  1. Read the rows that match the WHERE clause.

  2. For each row, store in the sort buffer a tuple consisting of the sort key value and the columns referenced by the query.

  3. When the sort buffer becomes full, sort the tuples by sort key value in memory and write it to a temporary file.

  4. After merge-sorting the temporary file, retrieve the rows in sorted order, but read the columns required by the query directly from the sorted tuples rather than by accessing the table a second time.

The tuples used by the modified filesort algorithm are longer than the pairs used by the original algorithm, and fewer of them fit in the sort buffer. As a result, it is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the optimizer uses the modified algorithm only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is a combination of high disk activity and low CPU activity.)

If a filesort is done, EXPLAIN output includes Using filesort in the Extra column. Also, optimizer trace output includes a filesort_summary block. For example:

"filesort_summary": {
  "rows": 100,
  "examined_rows": 100,
  "number_of_tmp_files": 0,
  "sort_buffer_size": 25192,
  "sort_mode": "<sort_key, additional_fields>"
}

The sort_mode value provides information about the filesort algorithm used and the contents of tuples in the sort buffer:

  • <sort_key, rowid>: This indicates use of the original algorithm. Sort buffer tuples are pairs that contain the sort key value and row ID of the original table row. Tuples are sorted by sort key value and the row ID is used to read the row from the table.

  • <sort_key, additional_fields>: This indicates use of the modified algorithm. Sort buffer tuples contain the sort key value and columns referenced by the query. Tuples are sorted by sort key value and column values are read directly from the tuple.

For information about the optimizer trace, see MySQL Internals: Tracing the Optimizer.

Suppose that a table t1 has four VARCHAR columns a, b, c, and d and that the optimizer uses filesort for this query:

SELECT * FROM t1 ORDER BY a, b;

The query sorts by a and b, but returns all columns, so the columns referenced by the query are a, b, c, and d. Depending on which filesort algorithm the optimizer chooses, the query executes as follows:

For the original algorithm, sort buffer tuples have these contents:

(fixed size a value, fixed size b value,
row ID into t1)

The optimizer sorts on the fixed size values. After sorting, the optimizer reads the tuples in order and uses the row ID in each tuple to read rows from t1 to obtain the select list column values.

For the modified algorithm, sort buffer tuples have these contents:

(fixed size a value, fixed size b value,
a value, b value, c value, d value)

The optimizer sorts on the fixed size values. After sorting, the optimizer reads the tuples in order and uses the values for a, b, c, and d to obtain the select list column values without reading t1 again.

For slow queries for which filesort is not used, try lowering max_length_for_sort_data to a value that is appropriate to trigger a filesort.

To increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:

  • Increase the sort_buffer_size variable value. Ideally, the value should be large enough for the entire result set to fit in the sort buffer (to avoid writes to disk and merge passes), but at minimum the value must be large enough to accommodate fifteen tuples.

    Take into account that the size of column values stored in the sort buffer is affected by the max_sort_length system variable value. For example, if tuples store values of long string columns and you increase the value of max_sort_length, the size of sort buffer tuples increases as well and may require you to increase sort_buffer_size. For column values calculated as a result of string expressions (such as those that invoke a string-valued function), the filesort algorithm cannot tell the maximum length of expression values, so it must allocate max_sort_length bytes for each tuple.

    To monitor the number of merge passes, check the Sort_merge_passes status variable.

  • Increase the read_rnd_buffer_size variable value.

  • Use less RAM per row by declaring columns only as large as they need to be to hold the values stored in them. For example, CHAR(16) is better than CHAR(200) if values never exceed 16 characters.

  • Change the tmpdir system variable to point to a dedicated file system with large amounts of free space. The variable value can list several paths that are used in round-robin fashion; you can use this feature to spread the load across several directories. Paths should be separated by colon characters (:) on Unix and semicolon characters (;) on Windows. The paths should name directories in file systems located on different physical disks, not different partitions on the same disk.

If an index is not used for ORDER BY but a LIMIT clause is also present, the optimizer may be able to avoid using a merge file and sort the rows in memory. For details, see Section 8.2.1.19, “LIMIT Query Optimization”.


User Comments
  Posted by Henrik Grubbström on March 24, 2006
To be certain that this optimization is done you often need to force the correct index to be used with FORCE INDEX.

Another problem is that the optimization is lost when you have an OR on a prefix of the index:

SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=1 OR key1=2 ORDER BY key2,key3 LIMIT 5, 5;

This can be worked around by using UNION:

(SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=1 ORDER BY key2,key3 LIMIT 10)
UNION
(SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=2 ORDER BY key2,key3 LIMIT 10)
ORDER BY key2,key3 LIMIT 5, 5

This rewrite can make the query go ~20000 times faster on a table with ~2M rows.
  Posted by Milen Kostadinov on August 29, 2006
===========================================================
Posted by Ravenous Bugblatter Beast on August 15 2006

If you are selecting a wide result set and using ORDER BY RAND() with a LIMIT, you can often speed things up by changing a query of the form:

SELECT id, col1, col2, ... , colN FROM tab WHERE conditions ORDER BY RAND() LIMIT m

To a query of the form:

SELECT id, col1, col2, ... , colN FROM tab WHERE id IN (SELECT id FROM tab WHERE conditions ORDER BY RAND() LIMIT m)

Although the second query has to perform an additional select, it only has to sort a result set containing the single id column, rather than the full result set you are returning from the query.
===========================================================

it's true, but if we have a big database with 1000ths of table rows and we must to join them ...... so oooops ;)
Lately I saw the following link http://jan.kneschke.de/projects/mysql/order-by-rand/ .....you can read here how to get more speed from your executing query in some cases. First read the explanation and see bottom of the page:

============================================================
Performance

Now let's see what happends to our performance. We have 3 different queries for solving our problems.

* Q1. ORDER BY RAND()
* Q2. RAND() * MAX(ID)
* Q3. RAND() * MAX(ID) + ORDER BY ID

Q1 is expected to cost N * log2(N), Q2 and Q3 are nearly constant.

The get real values we filled the table with N rows ( one thousand to one million) and executed each query 1000 times.
----------------------------------------------------------
Rows ||100 ||1.000 ||10.000 ||100.000 ||1.000.000
----------------------------------------------------------
Q1||0:00.718s||0:02.092s||0:18.684s||2:59.081s||58:20.000s
Q2||0:00.519s||0:00.607s||0:00.614s||0:00.628s||0:00.637s
Q3||0:00.570s||0:00.607s||0:00.614s|0:00.628s ||0:00.637s
----------------------------------------------------------

As you can see the plain ORDER BY RAND() is already behind the optimized query at only 100 rows in the table.
============================================================
  Posted by Hugo Azevedo on November 12, 2006
All this is much faster than an ORDER BY RAND(), the problem is, it does not do the same.

Imagine you have a table with two columns (id, name) of thousands of names of people.

If you want to produce a result of only one record, then this is by far the fastest way to do it. You select a random ID from the table, and query it directly.

The problem is, if you want to retrieve two random records from the table.

ORDER BY RAND() will give you, for example, ID 390 and ID 4957 in a result, but the sugested way, will calculate a random index (ex. 390) and return two records based on that index, ID 390 and ID 391...
This means, that the order of the result is not random, only the first "fetched" ID. If you query for more than two records, it will give you ID 390, 391, 392, 393...

In other words, this does not produce an actual random selected result on more than one record.
  Posted by Allan Bogh on February 21, 2007
When performing an ORDER BY col1 ASC - where col1 is an ENUM data type - the ORDER BY clause will not order by the natural language order, but the order of the values in the definition of the ENUM field.

To order by an ENUM field make sure the ENUM values are in alphabetical order or reverse-alphabetical order and then perform the ORDER BY clause.
  Posted by Dominique Thiebaut on May 16, 2007
If you need to "hard" sort a table, i.e. you want the table to be sorted in some predefined order, so that your query doesn't have to do it, use the following syntax:

alter table `yourtablename` order by `yourfieldname` desc;

In this case, when you execute:

select * from `yourtablename` limit 1;

you get the record for which yourfieldname is the largest.
  Posted by Markus Zeller on May 6, 2009
* Q2. RAND() * MAX(ID)
* Q3. RAND() * MAX(ID) + ORDER BY ID

You should NOT use these examples, because it could return invalid data:

With that you receive the highest stored index. Then a random number between 0 and MAX(id) is returned. 0 is a unwanted result. Imagine ID=3 is returned and this row has been deleted. Then this is a unwanted result, too.

If you need only 1 row in case of quotes or similar things then you ever should use the LIMIT 1 at the end of the query.
  Posted by Vladimir G. on February 26, 2012
About the problem with optimization lost when you have an OR in WHERE:

SELECT * FROM t1 FORCE INDEX (key1_key2_key3)
WHERE key1=1 OR key1=2 ORDER BY key2,key3 LIMIT 5, 5;

I found that IGNORE INDEX (key1) and use key3 instead works perfectly sometimes so you don't need to use UNION:

SELECT * FROM t1 IGNORE INDEX (key1)
WHERE key1=1 OR key1=2 ORDER BY key3 LIMIT 5, 5;

And it is interesting that the more elements I have - the faster it works, i.e. (key1=1 OR key1=2 OR key1=3 OR key1=4) works much faster on my table (>100 000 rows) than (key1=1 OR key1=2) .. (MySQL 5.0.77)
  Posted by Luc Vidal on October 25, 2012
With an InnoDB table you can do a query like this :

SELECT * FROM table WHERE col=<constant> ORDER BY id

where "id" is the primary key, and "col" is indexed.
MySQL will be able to use the index on "col" to sort. No need to create a composite index on (col, id).

I believe this is because InnoDB tables are stored with a clustered index (the primary key).

It doesn't work on MyISAM table however.
Always check with an "explain" if the optimisation is used (the Extra column from the explain must not contain "using filesort")
Sign Up Login You must be logged in to post a comment.