Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
By Robin Schumacher
A few years ago, I wrote an article entitled "The Foundation of Excellent Performance" (still available at http://www.tdan.com/i016fe03.htm ) where I argued against the notion that SQL code was the number one contributor to performance in a database-driven system. Instead, I stated in the article that I firmly believed how good physical database design was far and away the leading component of superior database performance. In addition, I showed that Oracle's own research illustrated how poor design was the main culprit behind database downtime (planned or unplanned). In the years since then, I've not changed my stance and still think that any DBA who wants a high-performance database has got to invest in intelligent and savvy physical design to produce the kind of response times that make end users smile instead of scream.
One of the reasons I'm very excited about the release of MySQL 5.1 is that it contains a potent new weapon for designing supercharged databases that any MySQL DBA should quickly learn how to use and exploit. By smartly using the new 5.1 partitioning feature, a DBA can oftentimes dramatically improve the performance of most any VLDB or data warehouse they happen to be in charge of.
Partitioning is a physical database design technique that many data modelers and DBAs are quite familiar with. Although partitioning can be used to accomplish a number of various objectives, the main goal is to reduce the amount of data read for particular SQL operations so that overall response time is reduced.
There are two major forms of partitioning:
Horizontal Partitioning – this form of partitioning segments table rows so that distinct groups of physical row-based datasets are formed that can be addressed individually (one partition) or collectively (one-to-all partitions). All columns defined to a table are found in each set of partitions so no actual table attributes are missing. An example of horizontal partitioning might be a table that contains ten years worth of historical invoice data being partitioned into ten distinct partitions, where each partition contains a single year's worth of data.
Vertical Partitioning – this partitioning scheme is traditionally used to reduce the width of a target table by splitting a table vertically so that only certain columns are included in a particular dataset, with each partition including all rows. An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren't addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another.
Before database vendors began building partitioning (mainly horizontal) into their engines, DBAs and data modelers had to physically design separate table structures to hold the desired partitions, which either held redundant data (separate tables with data that were based off a live parent table) or were linked together to form one logical parent object (usually via a view). This practice has since been made obsolete for the most part for horizontal partitioning, although it is sometimes still done for vertical partitioning.
One of the great new features in MySQL 5.1 is support for horizontal partitioning. The really good news about MySQL and the new 5.1 partitioning feature is all the major forms of partitioning are supported:
Range – this partitioning mode allows a DBA to specify various ranges for which data is assigned. For example, a DBA may create a partitioned table that is segmented by three partitions that contain data for the 1980's, 1990's, and everything beyond and including the year 2000.
Hash – this partitioning mode allows a DBA to separate data based on a computed hash key that is defined on one or more table columns, with the end goal being an equal distribution of values among partitions. For example, a DBA may create a partitioned table that has ten partitions that are based on the table's primary key.
Key – a special form of Hash where MySQL guarantees even distribution of data through a system-generated hash key.
List – this partitioning mode allows a DBA to segment data based on a pre-defined list of values that the DBA specifies. For example, a DBA may create a partitioned table that contains three partitions based on the years 2004, 2005, and 2006.
Composite – this final partitioning mode allows a DBA to perform sub-partitioning where a table is initially partitioned by, for example range partitioning, but then each partition is segmented even further by another method (for example, hash).
There are a number of benefits that come with partitioning, but the two main advantages are:
From a design-for-performance standpoint, we're mainly interested in point one above. By smartly using partitioning and matching the design to properly coded queries, a dramatic performance impact can be realized. Let's take a quick test drive of partitioning in MySQL 5.1 to see this in action. Note that all tests below were done on a Dell Optiplex box with a Pentium 4 3.00GHz processor, 1GB of RAM, running Fedora Core 4 and MySQL 5.1.6 alpha.
To see the positive benefit partitioning can have on a database, let's create identical MyISAM tables that contain date sensitive information, but let's partition one and leave the other a standard heap table. For our partitioned table, we'll partition based on range and use a function that segments the data based on year:
mysql> CREATE TABLE part_tab -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL -> -> ) engine=myisam -> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995), -> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) , -> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) , -> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) , -> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) , -> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.00 sec)
Notice that we designed partitions for a particular year and finished with one catch-all partition to get anything that doesn't fall into any of the specific date partitions. Now let's create a mirror MyISAM table that's not partitioned:
mysql> create table no_part_tab -> (c1 int(11) default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam; Query OK, 0 rows affected (0.02 sec)
Now let's create a procedure (thanks to Peter Gulutzan for the code…) that will fill our partitioned table with 8 million rows that distributes data fairly evenly across the various partitions. Once filled, we'll then insert the same data into our non-partitioned MyISAM clone table:
mysql> delimiter // mysql> CREATE PROCEDURE load_part_tab() -> begin -> declare v int default 0; -> while v < 8000000 -> do -> insert into part_tab -> values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 3652)); -> set v = v + 1; -> end while; -> end -> // Query OK, 0 rows affected (0.00 sec) mysql> delimiter ; mysql> call load_part_tab(); Query OK, 1 row affected (8 min 17.75 sec) mysql> insert into no_part_tab select * from part_tab; Query OK, 8000000 rows affected (51.59 sec) Records: 8000000 Duplicates: 0 Warnings: 0
With our tables now ready, let's issue a simple date range query on both tables – the non-partitioned table first and then the partitioned table - followed by EXPLAIN's, and see what MySQL does:
mysql> select count(*) from no_part_tab where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (38.30 sec) mysql> select count(*) from part_tab where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (3.88 sec) mysql> explain select count(*) from no_part_tab where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: no_part_tab type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 8000000 Extra: Using where 1 row in set (0.00 sec) mysql> explain partitions select count(*) from part_tab where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_tab partitions: p1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 798458 Extra: Using where 1 row in set (0.00 sec)
The power of proper partition and query design can easily be seen as the partitioned table access delivers a whopping 90% response time reduction over the non-partitioned table. The EXPLAIN plans showcase why this is (notice the new EXPLAIN syntax for partitioned objects) as only the first partition in the partitioned table is accessed with all others being skipped.
As a MySQL DBA, it's easy to get excited about the potential benefits that partitioning can provide, but you always want to make sure that the tool you use for database design matches the requirements and scenario of your particular application. Partitioning is best suited for VLDB's that contain a lot of query activity that targets specific portions/ranges of one or more database tables. Of course, other situations lend themselves to partitioning as well (e.g. data archiving, etc.)
Although MySQL 5.1 automates horizontal partitioning, don't lose sight of vertical partitioning schemes when designing your databases. Although you have to do vertical partitioning manually, you can benefit from the practice in certain circumstances. For example, let's say you didn't normally need to reference or use the VARCHAR column defined in our previously shown partitioned table. Would the elimination of this column help query speed? Let's find out:
mysql> desc part_tab; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c1 | int(11) | YES | | NULL | | | c2 | varchar(30) | YES | | NULL | | | c3 | date | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ 3 rows in set (0.03 sec) mysql> alter table part_tab drop column c2; Query OK, 8000000 rows affected (42.20 sec) Records: 8000000 Duplicates: 0 Warnings: 0 mysql> desc part_tab; +-------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------+------+-----+---------+-------+ | c1 | int(11) | YES | | NULL | | | c3 | date | YES | | NULL | | +-------+---------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> select count(*) from part_tab where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (0.34 sec)
By removing the VARCHAR column from the design, you actually get another 90+% reduction in query response time. Beyond partitioning, this speaks to the effect wide tables can have on queries and why you should always ensure that all columns defined to a table are actually needed.
A short article like this can't possibly cover all the benefits and mechanics of MySQL 5.1's partitioning, but a few notes of interest include:
From a performance standpoint, the main take-away is that MySQL 5.1 partitioning is a powerful new tool that can be used in many physical database designs to dramatically improve performance and ease DBA management burdens. For more information on MySQL partitioning, you can visit out the online reference manual at http://dev.mysql.com/doc/refman/5.1/en/partitioning.html and visit the MySQL forums as there is a forum section devoted to partitioning, which can be referenced at http://forums.mysql.com/list.php?106.
Download a copy of MySQL 5.1 (which is now is beta) today and give partitioning a try. I think you will be pleased with all the new possibilities partitioning provides when it comes to creating a top-notch physical database design, which is the number one contributor to overall database performance.