As a first step toward implementing WL#6605, where the committed record count of a table is adjusted by transaction deltas, we need to use the handler::record() call to return the record count to the optimizer instead of making it count records itself. PURPOSE ======= 1) Implement handler::records for InnoDB. This task can be implemented independently and checked into the trunk codebase separate from WL#6605. 2) Implement a count(*) testcase. This testcase will be needed to assure that the new method of collecting count(*) numbers in WL#6605 is accurate. Only records visible to the current transaction should be counted in count(*). The accuracy of this change will most likely be correct since row_search_for_mysql() is used to find visible records both before and after this change. 3) Improve general count(*) performance. This task also happens to improve the cost of an in-memory table scan by about 20% since each record does not need to be returned to the optimizer via the handler interface. It effectively pushes down the counting effort from the server into InnoDB.
IMPLEMENTATION ============== The optimizer code uses a table flag called HA_HAS_RECORDS to determine if the COUNT(*) task can be pushed down to the storage engine. InnoDB needs to define this flag and provide an implementation of handler::records() called ha_innobase::records(). ERROR CONDITIONS ================ If this handler function has any error or problem, then it must return HA_POS_ERROR, otherwise the number it returns is assumed to be the count of records visible by the current transaction. If the optimizer sees that this function failed, it will try a second time to get the count(*) by the normal method. Since this optimization always has a fallback method, errors that occur are not considered critical and are not reported. PERFORMANCE ============ BETTER: The general performance of count(*) will be better because there is only one handler call into the storage engine instead of one for every record. WORSE: But the performance could be worse in the case where there is a large record in the clustered index, which is always used by this implementation to count records, and there happens to be a very small secondary index. The optimizer will normally choose the smaller secondary index to traverse since it expects the secondary index to have fewer pages. This performance gain from using a smaller index can sometimes be greater than the gain from making one call to InnoDB instead of one for every record. BETTER: However, in a dynamic database with lots of updates to the secondary index, that performance gain of using a secondary index can be lost since secondary indexes can contain even uncommitted key entries. If a secondary index page has been modified by a transaction more recent than the COUNT(*) transaction, InnoDB must also read the record in the clustered index and determine if the record is visible. This extra read is slower than just reading the clustered index in the first place. So in real world databases, generally, this change should improve the performance of COUNT(*). But each test may be different. The next worklog, WL#6605, is intended to return the COUNT(*) through this handler::records() interface almost immediately in all conditions just by keeping track if the base committed count along with transaction deltas.
During the prototyping of this it was discovered that the optimizer will choose a smaller secondary index to traverse when the clustered index is significantly larger than the available secondary index. For example; -------------------------------------------------- CREATE TABLE t1 (c1 INT AUTO_INCREMENT PRIMARY KEY, c2 INT, c3 INT, c4 TEXT, INDEX(c2) ) Engine=InnoDB; INSERT INTO t1(c2,c3,c4) VALUES (1, 1, REPEAT('a',200)); # Do this until there are 1 million records INSERT INTO t1(c2,c3,c4) (SELECT c2,c3,c4 FROM t1); # Here are some relative times on my laptop Query Index:WL6742 Index:5.7 SELECT COUNT(*) FROM t1; Clustered:14.61 Secondary:11.31 SELECT COUNT(c1) FROM t1; Clustered:14.81 Secondary:11.85 SELECT COUNT(c2) FROM t1; Secondary:11.43 Secondary:11.81 SELECT COUNT(c3) FROM t1; Clustered:19.29 Clustered:18.43 SELECT COUNT(c4) FROM t1; Clustered:56.81 Clustered:62.29 SELECT SUM(c1) FROM t1; Secondary:12.42 Secondary:12.93 SELECT SUM(c2) FROM t1; Secondary:12.36 Secondary:12.82 SELECT SUM(c3) FROM t1; Clustered:20.28 Clustered:19.20 -------------------------------------------------- The chart above shows that COUNT(*) and COUNT(c1) are both affected by the new implementation of handler::records(). This was confirmed with a breakpoint. The new records() implementation always uses the clustered index and the optimizer will traverse the 'smaller' secondary index, which also contains c1 since it is the primary key. In this case, the optimizer can traverse the index faster since it can use a smaller index. But the affect is the opposite when there are multiple record versions. The following is the same chart, but this time, 10% of the records have newer pending versions. -------------------------------------------------- # Now do the same select when 10% of the file has newer, non visible records Client 1; BEGIN; Client 2; BEGIN; UPDATE t1 SET c2 = 2 WHERE MOD(c1,10) = 0; Client 1; Index:WL6742 Index:5.7 SELECT COUNT(*) FROM t1; Clustered:21.58 Secondary:146.26 SELECT COUNT(c1) FROM t1; Clustered:21.51 Secondary:146.45 SELECT COUNT(c2) FROM t1; Secondary:139.16 Secondary:147.68 SELECT COUNT(c3) FROM t1; Clustered:26.76 Clustered:27.19 SELECT COUNT(c4) FROM t1; Clustered:100.81 Clustered:68.03 SELECT SUM(c1) FROM t1; Secondary:140.30 Secondary:152.91 SELECT SUM(c2) FROM t1; Secondary:141.09 Secondary:154.75 SELECT SUM(c3) FROM t1; Clustered:27.67 Clustered:29.63 -------------------------------------------------- This shows that the extra work involved in reading the clustered index slows things down considerably. So in a concurrent environment where lots of records arre being updated, it is considerably faster to just use the clustered index instead of a smaller secondary index. This means that bypassing the optimizer's choice of which index to use, we change the performance characteristics; 1) The general performance is about 20% faster by eliminating all the handler interface calls during the traverse. This can be seen by comparing COUNT(*) with COUNT(C2). 2) Some of that performance gain can be lost if there exists a secondary index that is much smaller than the clustered index. 3) However, if a table is being changed concurrently, chosing the clustered index for traversal can be sonsiderable faster.