WL#2902: Store rows in blocks (MyISAM)

Affects: Server-7.1   —   Status: Un-Assigned

The task "store rows in blocks in MyISAM" was declared 
in the Chicago planning meeting (September 2005) to be 
"class A", that is, for version 5.2. It's a code cleanup 
task requested by Monty. All details here are guesses; 
we would have nothing to discuss if we only had the vague 
words "store rows in blocks". 
 
A "block", which is our preferred word (not "page") is 
the minimal read-write unit, so MySQL will read and write 
an integral number of blocks at a time. (Initially 
"1 block at a time" is a good bet.). For the moment, 
a block will also be the minimal allocation unit. 
 
Objectives: 
- to reduce fragmentation 
- to align the read-write size with the operating 
  system's / file system's sizes, for better performance 
- to emulate InnoDB 
- to make way for a Write-Ahead Log 
- to make locking easier 
- to make hot backup easier 
- to minimize movement after slight changes in row length. 
  
Syntax 
------ 
 
CREATE TABLE ... [ BLOCKSIZE n ] 
The clause "BLOCKSIZE n" is in the "table_option" section 
of the CREATE TABLE statement. It is legal only if 
"engine=myisam" is declared or is implicit. 
(Perhaps "BLOCK_LENGTH" would be a nicer word.) 
 
If "BLOCKSIZE n" is not specified, then MySQL will create 
a .MYD file which is exactly like the one in version 5.0, 
unblocked. We won't consider this legacy option further. 
 
The legal values of "n" are powers of 2: 
1024, 2048, 4096, 8192, 16384, 32768. 
There is probably an optimal block size, 
but it would depend on: 
- whether access will be random or sequential, reading or writing 
- whether organization indexed, which improves the likelihood that 
  two rows in the same block will be required for the same operation 
- row length i.e. maximum number of rows that will fit in one block 
- the units used by the operating system or file system 
NTFS allocation size, stripe size, PAGE_SIZE 
- block size should be a multiple of OS block size 
- expected number of rows in total, i.e. is the file small 
- keyblock size, if index and data will share the same buffer pool. 
That's too complicated, so MySQL won't try to calculate an 
optimal size. 
 
There is no option to specify PCTFREE, the percentage of 
each page to leave free when inserting. 
 
The block size is the same for all partitions. 
 
A file in the new format still has the extension .MYD, but 
somewhere the .frm file has a flag saying the format is new. 
 
Integral number of rows in a block 
---------------------------------- 
 
We do not want a row to "span" a block boundary. That is, 
if there are only 10 bytes free in a block, and we want to 
insert a 20-byte row, then the new row will go in a new 
block. Normally, then, a block has some filler bytes which 
don't belong to any row. These filler bytes should be 0x00s. 
 
If a row contains long VARCHAR or BLOB columns, it may be 
larger than a block. In that case, it begins at the start 
of a new block and takes as many blocks as necessary. 
 
The row format is unchanged. 
 
Block header 
------------ 
 
Blocks are split into three parts: the header, the filler, 
and the data rows. The header will grow downward within 
the block, the data rows will grow upward, so the unused 
space is always in the middle of the block. 
 
The header contains a row directory (which tells you where 
each row starts within the page), and housekeeping 
information similar to what InnoDB and Oracle have in 
their block headers. 
 
Row addresses 
------------- 
 
The address of a row is no longer an address within the 
file. Instead, it is a block number, plus one byte for 
"row number within the block". Since we use one byte, 
the maximum number of rows in a block is 256. 
 
This row address is what we use in index pointers. 
 
The advantage is that we can shift rows around (slightly) 
within a block, if updates occur. The disadvantage is 
that we need space for the directory in the block header. 
 
ALTER TABLE 
----------- 
 
We will allow "ALTER TABLE ... BLOCKSIZE n". This will 
be slow, and we have no clever "online alter table" plan. 
 
References 
---------- 
 
"MySQL Internals Manual Chapter 13: InnoDB Page Structure" 
http://dev.mysql.com/doc/internals/en/innodb-page-structure.html 
 
"Internet page containing a picture of an Oracle block" 
http://database.sarang.net/database/oracle/concept/oracle-internal/db_block.html 
 
And the following old raw ideas, which are probably useless now: 
WL#796 MyISAM chunked growth to avoid fragmentation 
WL#602 Keyblock should be specified on table and index level