WL#5652: InnoDB: Use HW CRC32
Status: Complete — Priority: Medium
The overhead of existing checksum in InnoDB is too high, according to: http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html. Another related article is http://dammit.lt/2008/05/29/on-checksums/ Facebook has already provided the new checksum code under BSD license that includes hardware-based implementation if it is supported by the CPU.
There are two 4-byte fields for checksums in each database page, lets call them FIL_PAGE_END_LSN_OLD_CHKSUM and FIL_PAGE_SPACE_OR_CHKSUM. When writing the page to the disk those two fields get: FIL_PAGE_SPACE_OR_CHKSUM: * InnoDB versions < 4.0.14 and < 4.1.1: always store 0 * InnoDB versions newer than the above store buf_calc_page_new_checksum() * Any InnoDB versions with innodb_checksums=OFF would store a constant magic number FIL_PAGE_END_LSN_OLD_CHKSUM: * Very old InnoDB stores LSN * Newer InnoDB stores buf_calc_page_old_checksum() * Any InnoDB versions with innodb_checksums=OFF would store a constant magic number Both buf_calc_page_old_checksum() and buf_calc_page_new_checksum() use the same algorithm, but the _old version calculates the checksum only on the first 26 bytes of the page. This WorkLog adds a new boolean parameter innodb_use_crc32 (OFF by default). When that option is turned ON (and innodb_checksums=ON) then buf_calc_page_crc32() will be stored in *both* checksum fields. This function uses CPU instructions to calculate CRC32 if the CPU supports them and falls back to a manual CRC32 calculation if the CPU does not have support for it. When checking whether a page is corrupted the value of the stored checksum for both fields will also be allowed to match buf_calc_page_crc32() in addition to what it is allowed before this WL. Before this WL the stored value is allowed to be anything if innodb_checksums=OFF during the check and if innodb_checksums=ON then the stored value is allowed to be one of: FIL_PAGE_SPACE_OR_CHKSUM: * 0 * buf_calc_page_new_checksum() * the constant magic number FIL_PAGE_END_LSN_OLD_CHKSUM: * the LSN * buf_calc_page_old_checksum() * the constant magic number Additionally if any of the two fields contains buf_calc_page_crc32() then the other field must also contain the same. ============================================== Ideas from Marko (notice that innodb_checksums is a boolean parameter, so it would not be a good idea to upgrade it to an integer because this will break existing configuration files that may have innodb_checksums=ON or innodb_cheksums=OFF): This is straightforward, except for the configuration of the selected checksum algorithm. We already have a read-only (static) configuration parameter, innodb_use_checksum, which allows the use of two algorithms: innodb_use_checksum=0 (no checksum, write a 0xdeadbeef magic value) innodb_use_checksum=1 (the default algorithm) There are two cases where checksums are being calculated. One is when writing a page to disk. Another is when reading a page, checking that the page matches the checksum written on it. When reading the page, InnoDB currently allows any checksum algorithm to match. This increases the probability that a corrupted page is mistaken for a good one, especially when a checksum algorithm happens to produce a 0xdeadbeef magic value. It could be useful to extend the configuration parameter as follows: innodb_use_checksum=2 (write crc32, but allow any algorithm to match on read) innodb_use_checksum=3 (write crc32, and require crc32 match on read) The default value could remain innodb_use_checksum=1. When creating a new database from the scratch, recommended practice would be to add innodb_use_checksum=3 to the MySQL configuration. This parameter should not be made dynamic, because innodb_use_checksum=3 can only work if there exist no pages containing a different checksum from crc32.
* Link extra/innochecksum with storage/innobase/libinnobase.a and remove copy-pasted InnoDB functions from it * Support the new CRC32 algo in innochecksum.c * Move functions that are to be used in innochecksum from buf0buf.c into a separate file buf0checksum.c so that linking works
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.