WL#5756: InnoDB: Support Page Sizes 4k and 8k

Status: Complete   —   Priority: Medium

This involves adding a new startup parameter innodb-page-size that can be
set to either 4k, 8k, or 16k.  This must be set before any InnoDB tablespaces
are created, before bootstrap, so that all InnoDB tablespaces have the same
page size.

Supporting smaller page sizes without recompiling the server has been asked for
years by MySQL users, including Facebook
Theoretically, SSD could benefit from smaller page sizes. But tuning SSD (with
smaller page sizes) should not be part of this worklog.
The tablespace page size must be stored in the tablespace header page so that
the value of innodb_page_size can be corrected if the tablepaces already exist.
 In the current file version, the compressed page size of a tablespace is stored
in the FSP_SPACE_FLAGS field of the tablespace header along with other flags
that can determine the table row type for file-per-tablespace. The physical
tablespace page size can be added to these flags, but once an expanded flags is
written to a tablespace, older engines will not be able to open that tablespace.

In order to prepare for this, the InnoDB code needs to be refactored in the way
that it handles table flags.  There needs to be a distinction between table
flags and filespace flags.  The bit manipulations also need to be isolated to
inline functions defined separately for table and tablespace flags. See WL#5873.

In order to know if an existing file-per-table tablespace containing a
compressed table can be opened by an engine using a fixed UNIV_PAGE_SIZE, we
will need to know the uncompressed page size.  Once the tablespace flags are
handled independently from the table flags, the Logical-Uncompressed Page Size
of the tablespace can be added to the tablespace flags.

In the future there will be more than one table per tablespace and we plan to
support multiple uncompressed page sizes at some point. See WL#5628.  When we
can get more than one table per tablespace the physical page size will be the
same for all tables in that tablespace.  So if there are multiple compressed
tables, they MUST all have the same KEY_BLOCK_SIZE or compressed page size.  At
that point, we can continue to rely on the ZIP_SIZE flags currently in bit
positions 2-5 in FSP_SPACE_FLAGS to determine the physical page size because
they will refer to the tablespace, not just one table in it.

There are ~450 occurrences of UNIV_PAGE_SIZE. For this worklog, UNIV_PAGE_SIZE
can be set to a global variable instead of the predefined constant.  Then there
are many places where UNIV_PAGE_SIZE is used in a precompiled macro which need
to be replaced. Often, UNIV_PAGE_SIZE can be replaced by UNIV_PAGE_SIZE_MAX or
UNIV_PAGE_SIZE_DEF which are still constants.

Debugging should look into any problems with the length of records found in
non-index pages such as the problem found in BUG#54924.  

This task may also show problems with index key lengths when tests are run with
smaller page sizes.  

One such problem is Bug #13336585-CHANGE BUFFERING WITH 4K PAGES CAN ASSERT IF
SECONDARY KEY IS NEAR MAX.  The maximum key length allowed by MySQL is 3072
which is less than 1/4th of the free space on a 16k page.  The worst case for a
large key length is where both the the primary and secondary key is the max
length.  Since InnoDB must have at least 2 records per index page, the maximum
key length must be less than 1/4th of the free space on a page.  InnoDB can
calculate the free space on a page and the record overhead, but it needs to know
the table metadata first.  MySQL asks each engine what the maximum key length is
before the CREATE TABLE statement.  Since InnoDB had the advantage of a maximum
key length that will easily fit 4 to a 16k page, this worklog will implement
similar limits on 8k and 4k page sizes.

The maximum key length for 16k pages is 3072 bytes, so the limit for 8k pages
will be 1536 bytes and the limit for 4k pages will be 768 bytes.  These limits
easily allow record data plus record overhead to be less than 1/4th of the free
space on any page.

A node pointer record in a non-unique secondary index has the most overhead,
because it will store tuples
(sec_col1,sec_col2,...,sec_colN,pk_col1,pk_col2,...,pk_colM,child_page_no). A
unique index would only store (sec_col1,sec_col2,...,sec_colM,child_page_no).
The maximum number of columns is N+M=16+16 when none of the columns of the
secondary index belong to the primary key. The maximum payload length is
3072+3072 bytes (3072 for the secondary index and 3072 for the primary key). The
maximum record header size is 2*(N+M)+6 bytes in ROW_FORMAT=COMPACT and one byte
less in other formats. The 1/4 is a bit inaccurate (it is a sufficient
condition, but sometimes unnecessarily strict). What InnoDB assumes is that each
node pointer page must have space for at least two node pointer records.

A future effort may be to increase these limits by enforcing them in CREATE
TABLE if customers need this.  But it is generally not wise to make indexes
where the key is so long that there is only 2 or 3 records per page because it
increases the depth of the Btree.