MySQL 5.7 Reference Manual  /  ...  /  LINEAR HASH Partitioning

#### 19.2.4.1 LINEAR HASH Partitioning

MySQL also supports linear hashing, which differs from regular hashing in that linear hashing utilizes a linear powers-of-two algorithm whereas regular hashing employs the modulus of the hashing function's value.

Syntactically, the only difference between linear-hash partitioning and regular hashing is the addition of the `LINEAR` keyword in the ```PARTITION BY``` clause, as shown here:

```CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY LINEAR HASH( YEAR(hired) )
PARTITIONS 4;
```

Given an expression `expr`, the partition in which the record is stored when linear hashing is used is partition number `N` from among `num` partitions, where `N` is derived according to the following algorithm:

1. Find the next power of 2 greater than `num`. We call this value `V`; it can be calculated as:

````V` = POWER(2, CEILING(LOG(2, `num`)))
```

(Suppose that `num` is 13. Then `LOG(2,13)` is 3.7004397181411. `CEILING(3.7004397181411)` is 4, and `V` = `POWER(2,4)`, which is 16.)

2. Set `N` = `F`(`column_list`) & (`V` - 1).

3. While `N` >= `num`:

• Set `V` = CEIL(`V` / 2)

• Set `N` = `N` & (`V` - 1)

Suppose that the table `t1`, using linear hash partitioning and having 6 partitions, is created using this statement:

```CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY LINEAR HASH( YEAR(col3) )
PARTITIONS 6;
```

Now assume that you want to insert two records into `t1` having the `col3` column values `'2003-04-14'` and `'1998-10-19'`. The partition number for the first of these is determined as follows:

````V` = POWER(2, CEILING( LOG(2,6) )) = 8
`N` = YEAR('2003-04-14') & (8 - 1)
= 2003 & 7
= 3

(3 >= 6 is FALSE: record stored in partition #3)
```

The number of the partition where the second record is stored is calculated as shown here:

````V` = 8
`N` = YEAR('1998-10-19') & (8-1)
= 1998 & 7
= 6

(6 >= 6 is TRUE: additional step required)

`N` = 6 & CEILING(8 / 2)
= 6 & 3
= 2

(2 >= 6 is FALSE: record stored in partition #2)
```

The advantage in partitioning by linear hash is that the adding, dropping, merging, and splitting of partitions is made much faster, which can be beneficial when dealing with tables containing extremely large amounts (terabytes) of data. The disadvantage is that data is less likely to be evenly distributed between partitions as compared with the distribution obtained using regular hash partitioning.

Posted by Chris Wagner on April 20, 2011
Linear hashing is a bad name considering that it creates a bell curve style distribution. Here are some distribution patterns on a keyed list of integers (0 - 64k) with some associated data volume. The first column is the partition, the second is the number of rows that would go in that partition, and the percentage is that of the total data volume. U can see that if the number of partitions is a power of 2 then the two methods are the same.

Basically, with linear hashing the middle partions will have 2x the size of the outer ones.

8 partitions

Modulo hashing distribution

0 1286793 12.43%
1 1293976 12.50%
2 1281279 12.38%
3 1295948 12.52%
4 1307476 12.63%
5 1291118 12.48%
6 1294605 12.51%
7 1297231 12.54%

Linear hashing distribution

0 1286793 12.43%
1 1293976 12.50%
2 1281279 12.38%
3 1295948 12.52%
4 1307476 12.63%
5 1291118 12.48%
6 1294605 12.51%
7 1297231 12.54%

9 partitions

Modulo hashing distribution

0 1137765 10.99%
1 1152313 11.14%
2 1139055 11.01%
3 1153697 11.15%
4 1150596 11.12%
5 1158082 11.19%
6 1148080 11.09%
7 1155944 11.17%
8 1152894 11.14%

Linear hashing distribution

0 638360 6.17%
1 1293976 12.50%
2 1281279 12.38%
3 1295948 12.52%
4 1307476 12.63%
5 1291118 12.48%
6 1294605 12.51%
7 1297231 12.54%
8 648433 6.27%

7 partitions

Modulo hashing distribution

0 1483555 14.34%
1 1470934 14.21%
2 1487062 14.37%
3 1481081 14.31%
4 1477711 14.28%
5 1468336 14.19%
6 1479747 14.30%

Linear hashing distribution

0 1286793 12.43%
1 1293976 12.50%
2 1281279 12.38%
3 2593179 25.06%
4 1307476 12.63%
5 1291118 12.48%
6 1294605 12.51%

12 partitions

Linear hashing distribution

0 638360 6.17%
1 643268 6.22%
2 637077 6.16%
3 650900 6.29%
4 1307476 12.63%
5 1291118 12.48%
6 1294605 12.51%
7 1297231 12.54%
8 648433 6.27%
9 650708 6.29%
10 644202 6.23%
11 645048 6.23%

For comparison, this is what it looks like just on the integer alone (no data volume).

7 partitions

Modulo hashing distribution

0 9363 14.29%
1 9362 14.29%
2 9362 14.29%
3 9362 14.29%
4 9362 14.29%
5 9362 14.29%
6 9362 14.29%

Linear hashing distribution

0 8192 12.50%
1 8192 12.50%
2 8192 12.50%
3 16383 25.00%
4 8192 12.50%
5 8192 12.50%
6 8192 12.50%

Posted by Steven Roussey on June 22, 2011
For linear, the number of partitions ought to be a power of 2. So 4, 8, 16, 32, etc., would be good choices...
Posted by Gabriel Ramirez Melgarejo on November 25, 2015
Also by using LINEAR HASH or LINEAR KEY enables the use of sub partitioning for the partitioning types LINEAR and KEY.