The most efficient way to process GROUP BY
is when the index is used to directly retrieve the group
fields. With this access method, MySQL uses the property of
some index types that the keys are ordered (for example,
BTREE). This property enables use of lookup
groups in an index without having to consider all keys in the
index that satisfy all WHERE conditions.
This access method considers only a fraction of the keys in an
index, so it is called a loose index
scan. When there is no WHERE
clause, a loose index scan reads as many keys as the number of
groups, which may be a much smaller number than that of all
keys. If the WHERE clause contains range
predicates (see the discussion of the range
join type in Section 7.2.1, “Optimizing Queries with EXPLAIN”), a loose index
scan looks up the first key of each group that satisfies the
range conditions, and again reads the least possible number of
keys. This is possible under the following conditions:
The query is over a single table.
The GROUP BY includes the first
consecutive parts of the index. (If, instead of
GROUP BY, the query has a
DISTINCT clause, all distinct
attributes refer to the beginning of the index.)
The only aggregate functions used (if any) are
MIN() and
MAX(), and all of them
refer to the same column.
Any other parts of the index than those from the
GROUP BY referenced in the query must
be constants (that is, they must be referenced in
equalities with constants), except for the argument of
MIN() or
MAX() functions.
The EXPLAIN output for such queries shows
Using index for group-by in the
Extra column.
The following queries fall into this category, assuming that
there is an index idx(c1,c2,c3) on table
t1(c1,c2,c3,c4):
SELECT c1, c2 FROM t1 GROUP BY c1, c2; SELECT DISTINCT c1, c2 FROM t1; SELECT c1, MIN(c2) FROM t1 GROUP BY c1; SELECT c1, c2 FROM t1 WHERE c1 <constGROUP BY c1, c2; SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 >constGROUP BY c1, c2; SELECT c2 FROM t1 WHERE c1 <constGROUP BY c1, c2; SELECT c1, c2 FROM t1 WHERE c3 =constGROUP BY c1, c2;
The following queries cannot be executed with this quick select method, for the reasons given:
There are aggregate functions other than
MIN() or
MAX(), for example:
SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
The fields in the GROUP BY clause do
not refer to the beginning of the index, as shown here:
SELECT c1,c2 FROM t1 GROUP BY c2, c3;
The query refers to a part of a key that comes after the
GROUP BY part, and for which there is
no equality with a constant, an example being:
SELECT c1,c3 FROM t1 GROUP BY c1, c2;

User Comments
The following example will not be optimized (and should probably be added above) ...
SELECT c1, MIN(c4) FROM t1 GROUP BY c1;
The statement "The only aggregate functions used (if any) are MIN() and MAX(), and all of them refer to the same column." needs to be clearer. Specifically it isn't clear what is meant by 'the same column' - which I think should read 'refer to the same index'. This would explain the above.
The following example is optimized (according to EXPLAIN)...
SELECT c1, MIN(c3) FROM t1 GROUP BY c1;
Even though we don't group by c1 and c2.
--
Questions are a burden to others, answers a prison for ones self.
The query ...
SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
is optimized even though it is listed above as "cannot be executed with this quick select method [as] there are other aggregate function than MIN() or MAX()"
Given the following setup...
create table t1 (c1 int,c2 int, c3 int, c4 int, index (c1,c2,c3), index(c4));
insert into t1
select * from
(select key_col from raf limit 4) as w inner join
(select key_col from raf limit 4) as x inner join
(select key_col from raf limit 4) as y inner join
(select key_col from raf limit 4) as z;
Query OK, 256 rows affected (0.27 sec)
Records: 256 Duplicates: 0 Warnings: 0
Where raf is a table whose 'key_col' is an integer primary key
explain SELECT c1, SUM(c2) FROM t1 GROUP BY c1 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: NULL
key: c1
key_len: 15
ref: NULL
rows: 256
Extra: Using index
Trying this again...
insert into t1
select * from (select key_col from raf limit 10) as w inner join
(select key_col from raf limit 10) as x inner join
(select key_col from raf limit 10) as y inner join
(select key_col from raf limit 10) as z;
Query OK, 10000 rows affected (1.27 sec)
explain SELECT c1, SUM(c2) FROM t1 GROUP BY c1 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: NULL
key: c1
key_len: 15
ref: NULL
rows: 10512
Extra: Using index
1 row in set (0.03 sec)
So there you go!
--
Questions are a burden to others, answers a prison for ones self.
The query above,
SELECT c1,c3 FROM t1 GROUP BY c1, c2;
is optimized even though the manual states "The query refers to a key part that is after the GROUP BY parts, and for which there is no equality with a constant:"
Also from the description above (at least for me) it isn't clear if an "equality with a constant" would make a difference if the query were...
SELECT c1,c4 FROM t1 GROUP BY c1, c2;
--
Questions are a burden to others, answers a prison for ones self.
Pay attention to the line:
"This is possible under the following conditions:
*The query is over a single table. "
What this means is that, if referencing a normalized table in a query that uses group by, YOU CANNOT AVOID a temporary table!
For example, this query will never optimize regardless of indexes, even with explicit join statements:
SELECT MAX(orders.id)
FROM orders, orders_status
WHERE orders_status.name = 'completed'
AND orders.status = orders_status.id;
Note that this doesn't work with TEXT fields in the index, presumably because you can't create an index with a TEXT field without truncating it. Even an index with a TINYTEXT field "mytinytextfield(256)" won't be used here.
Use VARCHAR instead.
Add your own comment.