WL#6742: Improve InnoDB SELECT COUNT(*) performance by using handler::records

Status: Complete   —   Priority: Medium

As a first step toward implementing WL#6605, where the committed record count of
a table is adjusted by transaction deltas, we need to use the handler::record()
call to return the record count to the optimizer instead of making it count
records itself.

1) Implement handler::records for InnoDB.  This task can be implemented
independently and checked into the trunk codebase separate from WL#6605.

2) Implement a count(*) testcase. This testcase will be needed to assure that
the new method of collecting count(*) numbers in WL#6605 is accurate. Only
records visible to the current transaction should be counted in count(*). The
accuracy of this change will most likely be correct since row_search_for_mysql()
is used to find visible records both before and after this change.

3) Improve general count(*) performance. This task also happens to improve the
cost of an in-memory table scan by about 20% since each record does not need to
be returned to the optimizer via the handler interface.  It effectively pushes
down the counting effort from the server into InnoDB.

The optimizer code uses a table flag called HA_HAS_RECORDS to determine if the
COUNT(*) task can be pushed down to the storage engine.  InnoDB needs to define
this flag and provide an implementation of handler::records() called

If this handler function has any error or problem, then it must return
HA_POS_ERROR, otherwise the number it returns is assumed to be the count of
records visible by the current transaction. If the optimizer sees that this
function failed, it will try a second time to get the count(*) by the normal
method.  Since this optimization always has a fallback method, errors that occur
are not considered critical and are not reported.

BETTER: The general performance of count(*) will be better because there is only
one handler call into the storage engine instead of one for every record.

WORSE: But the performance could be worse in the case where there is a large
record in the clustered index, which is always used by this implementation to
count records, and there happens to be a very small secondary index.  The
optimizer will normally choose the smaller secondary index to traverse since it
expects the secondary index to have fewer pages.  This performance gain from
using a smaller index can sometimes be greater than the gain from making one
call to InnoDB instead of one for every record.

BETTER: However, in a dynamic database with lots of updates to the secondary
index, that performance gain of using a secondary index can be lost since
secondary indexes can contain even uncommitted key entries.  If a secondary
index page has been modified by a transaction more recent than the COUNT(*)
transaction, InnoDB must also read the record in the clustered index and
determine if the record is visible.  This extra read is slower than just reading
the clustered index in the first place.

So in real world databases, generally, this change should improve the
performance of COUNT(*).  But each test may be different.  The next worklog,
WL#6605, is intended to return the COUNT(*) through this handler::records()
interface almost immediately in all conditions just by keeping track if the base
committed count along with transaction deltas.
During the prototyping of this it was discovered that the optimizer will choose
a smaller secondary index to traverse when the clustered index is significantly
larger than the available secondary index.  For example;
	c2 INT,	c3 INT,	c4 TEXT, INDEX(c2)
	) Engine=InnoDB;
INSERT INTO t1(c2,c3,c4) VALUES (1, 1, REPEAT('a',200));
# Do this until there are 1 million records 
INSERT INTO t1(c2,c3,c4) (SELECT c2,c3,c4 FROM t1);

# Here are some relative times on my laptop
       Query                Index:WL6742         Index:5.7
SELECT COUNT(*) FROM t1;    Clustered:14.61     Secondary:11.31
SELECT COUNT(c1) FROM t1;   Clustered:14.81     Secondary:11.85
SELECT COUNT(c2) FROM t1;   Secondary:11.43     Secondary:11.81
SELECT COUNT(c3) FROM t1;   Clustered:19.29     Clustered:18.43
SELECT COUNT(c4) FROM t1;   Clustered:56.81     Clustered:62.29
SELECT SUM(c1) FROM t1;     Secondary:12.42     Secondary:12.93
SELECT SUM(c2) FROM t1;     Secondary:12.36     Secondary:12.82
SELECT SUM(c3) FROM t1;     Clustered:20.28     Clustered:19.20

The chart above shows that COUNT(*) and COUNT(c1) are both affected by the new
implementation of handler::records().  This was confirmed with a breakpoint. 
The new records() implementation always uses the clustered index and the
optimizer will traverse the 'smaller' secondary index, which also contains c1
since it is the primary key.  In this case, the optimizer can traverse the index
faster since it can use a smaller index.

But the affect is the opposite when there are multiple record versions.  The
following is the same chart, but this time, 10% of the records have newer
pending versions. 

# Now do the same select when 10% of the file has newer, non visible records
Client 1;  BEGIN;
Client 2;  BEGIN; UPDATE t1 SET c2 = 2 WHERE MOD(c1,10) = 0;
Client 1;                   Index:WL6742        Index:5.7
SELECT COUNT(*) FROM t1;    Clustered:21.58     Secondary:146.26
SELECT COUNT(c1) FROM t1;   Clustered:21.51     Secondary:146.45
SELECT COUNT(c2) FROM t1;   Secondary:139.16    Secondary:147.68
SELECT COUNT(c3) FROM t1;   Clustered:26.76     Clustered:27.19
SELECT COUNT(c4) FROM t1;   Clustered:100.81    Clustered:68.03
SELECT SUM(c1) FROM t1;     Secondary:140.30    Secondary:152.91
SELECT SUM(c2) FROM t1;     Secondary:141.09    Secondary:154.75
SELECT SUM(c3) FROM t1;     Clustered:27.67     Clustered:29.63

This shows that the extra work involved in reading the clustered index slows
things down considerably.  So in a concurrent environment where lots of records
arre being updated, it is considerably faster to just use the clustered index
instead of a smaller secondary index.

This means that bypassing the optimizer's choice of which index to use, we
change the performance characteristics;
1) The general performance is about 20% faster by eliminating all the handler
interface calls during the traverse.  This can be seen by comparing COUNT(*)
with COUNT(C2).
2) Some of that performance gain can be lost if there exists a secondary index
that is much smaller than the clustered index.
3) However, if a table is being changed concurrently, chosing the clustered
index for traversal can be sonsiderable faster.