The world's most popular open source database
By Peter Zaitsev
At some point, almost every project using the MySQL database server will need to decide on which platform to deploy MySQL, taking into account the project's performance and redundancy requirements, as well as the project's budget.
Selecting an operating system is a good place to begin. MySQL supports many operating systems, but some of them are known to run MySQL better than the others. Some operating systems are known to have issues with kernel or system libraries, which can limit scalability or stability. Some operating systems may perform better the other on the same hardware. Finally, because some operating systems are used by more MySQL users, any bugs specific to those operating systems are more likely to have already been identified and fixed.
Because MySQL uses threads extensively, it is important to consider how well the operating system supports them. Good thread support is critical for performance, stability and scalability. Most modern operating systems support kernel-level threads, as this offers the best performance and scalability, allowing the use of multiple processors to allow for a higher level of concurrency. Operating systems that provide threads support using a user-level library typically do not take advantage of multiple processors, limiting concurrency.
It is a good idea to read the section of the MySQL Reference Manual that covers issues for various operating systems, which you can find online at http://www.mysql.com/doc/en/Operating_System_Specific_Notes.html.
Two main challenges during the hardware selection are to select hardware, which provides good performance together with excellent stability. MySQL does not work with hardware directly, so good support from the operating system is required for the hardware you're planning to use. Reliable and good quality hardware is unfortunately not good enough to make sure it will work well for you - the best idea would be to check if your operating system of choice supports this hardware well officially as well as search the internet for user feedback. Official papers might be sometimes too optimistic. But still nothing can replace testing - take the sample of hardware you're planning to use and put the stress load on it and try to simulate and exceed the load you plan to have on this system. MySQL super-smack package as well as some scripts in /test directory of the MySQL distribution will help you to simulate stress load on your system.
This is the question, which is often asked, but there is no direct answer. Everything depends on the load you have and the application you have. If you already have the MySQL running and just plan to update your system, then the best idea would be to analyze the current load to spot the bottlenecks. In Unix platforms you can use tools like vmstat, iostat and sar. The CPU is a likely bottleneck in case load of at least one of system's processors is close to 100%. In case it is well below that, then disk or network system are likely to be bottleneck. Best way to check the load on the disk is to take a look at number of read and write operations per second it does to the disk.
Modern disks can do 150-250 transactions per second, which means up to 300-500 transactions for mix with good number of sequential accesses. If single drive in your system is accessed at some constant rate it is likely to be the limiting factor. For RAID systems the potential limit is up to <number of drives> higher, thus they usually do not scale more than to 50% of it.
Network bottleneck is rather simple to identify. You may use one of various network diagnostics software to check how much load you have on the network, and many network switches/routers has such tools already build in. Check loads both for in and out direction as well as error rate. Load more than 80% in any direction may increase latency, i.e. decrease performance. Non-zero error rate for local network is also good subject to be examined. For Ethernet network it is good idea to make sure you're running in full duplex mode with flow control enabled.
If your system is CPU bound then most obvious idea is to get faster CPU, but this is not always the issue. Additionally to CPU load itself you can observe excessive CPU load as the consequence of slow memory or slow system bus, so you might wish to start from choosing good motherboard with powerful chipset. The motherboard should supports fast type memory and possibly has memory interleaving to increase bandwidth. Another way to optimize memory speed is to use CPU with large amount of cache to offload memory bus. The benefit of large cache highly depends on the application and can range from almost zero benefit to 20% and more increase. Usually cache is the most vital part, if you have many concurrent connections running simultaneously with large memory available. High performance system bus or several system buses spreading high bus usage devices (network, disk etc) across them helps.
If the operating system has kernel level thread support, MySQL can take advantage from several CPUs and HyperThreading support. However the benefit, which you will get, highly depends on type of the load generated by your application. In case you have high CPU bound load coming from many concurrent connections you can get close to 100% performance increase from extra CPU as well as some 30% from HyperThreading. On the other hand if you are running queries using just single connection you are not likely to get any major performance improvement as MySQL executes one query in one thread and so it is using only one CPU.
Memory is one of the most vital resources for good performance. It is used by MySQL to speedup query execution by using larger buffers to perform reads, sorts or store temporary tables in memory. Memory is also used for caching data to avoid disk reads. MySQL has some internal caches as well as uses operating system's cache to speed up disk IO. On 32-bit hardware there is limit for the amount of memory, that can be used by internal buffers. This limit ranges from 1Gb to 3Gb depending on the operating system. There could also be limits for the available file cache in the operating system. For example in 32bit Windows you will seldom be able to use efficiently more than 4Gb of memory.
Increasing amount of memory usually gives uneven benefit to performance. Typically, at small memory sizes increase of memory can give good benefit up to some point, where benefit with each extra Mb of memory will be smaller and smaller. The optimal amount of memory for MySQL can usually be found out by examining MySQL status variables. If MySQL does not need to use disk to perform sorts, then the amount of memory is often sufficient, as MySQL does not need to create temporary tables on the disk for sorting purposes. Memory dedicated for cache purposes is usually tuned by using the cache hit rate. If the cache hit rate does not grow much when increasing the available memory, then the benefit from extra memory will not be large. You should examine MySQL caches hit rate (MyISAM keycache, Innodb buffer pool) as well as OS cache hit rate. Of course if you can manage to fit the whole database into cache, then giving MySQL bit extra memory for query execution buffers should be sufficient and increasing memory even further will not give you any extra benefit.
For various applications tuning the performance can be very interesting. Imagine some sort of search application, which has 100M of very frequently accessed data, and 10G of data accessed relatively rarely with even data distribution. If you increase cache from 20Mb to 40Mb you might improve performance some 40%, while increasing it to 120M might lead to 100% and even better performance improvement. Increasing it further will give relatively slow growth, while increasing it from 8G to 10G may once again give good performance boost by sharply decreasing miss rate. Detailed application analyses are often required to really answer the question what is the pattern for your system, another good idea is to benchmark your system to find this out in practice. MySQL 4.0 adds one type of memory consumer - query cache. Query cache tuning was described in details in one of the previous newsletters, so lets omit it here. In many cases you would allocate to query cache the amount of memory it can use.
Disk subsystem brings up many questions: Which type of disk subsystem to use, how to configure it, and what MySQL layout to select. Entry level systems usually have simple hard drives, possibly organized in software RAID0/1 to get extra performance or redundancy. Both SCSI and IDE disks work well, thus SCSI historically has given a bit better performance compared to IDE with the same parameters and SCSI having less CPU usage.
In most cases databases have random data access pattern, so disk transfer speed is not that critical compared to random access speed, so selecting 10000-15000RPM drives with small seek time is a good idea. Having too many disks organized in software RAID might be a bad idea due to the extra CPU and memory and system bus usages, so adding hardware RAID instead is better idea.
As databases often have random data access it could be good idea to use relatively small stripes for best performance, as this allows better utilizations of RAID controller cache. 32K-64K is good value to start the benchmarks. Amount of memory in RAID controller is the most critical for RAID5, while in other cases increasing cache size might not give good performance increase, as long as it is smaller than the operating system disk cache. Battery backed up cache is a good idea, especially when using InnoDB tables, as it speeds up disk flushes significantly.
Selecting which RAID level to use is a challenging task. RAID5 is often selected for database as it gives good space utilization; if you can afford to sacrifice it to get better performance RAID10 is usually better choice. Depending on availability requirements RAID0 can be used for temporary directory, while it is only worth to do if you have very large and complex queries whose speed depends on disk sort buffers and temporary tables. InnoDB and binary logs might be worth to put to RAID1 device as they are written sequentially.
Which layout to use? The advice might be to store everything on its own array/disk, database, temporary directory, binary logs and InnoDB logs, while usually you do not have so many of them to waste. Instead you might wish to do a bit of profiling to find out which of these areas have most IO and split it among the available devices. If you're running with innodb_flush_trx_at_commit=1 and have a lot of updating transactions, it might be worth to have InnoDB log on the dedicated disk, as it will keep writes sequential, so in that way increasing performance.
In high load configurations it is often worth to use external storage system, which gives two major benefits in addition to great performance and redundancy it can provide. Such systems usually have volume management functions allowing you to take consistent backups with very small performance penalty. Thus some major operating systems also have such functionality, for example LVM in Linux. The second benefit is you can connect 2 servers to such device, so if one fails other can take over quickly.
With tight budget of nowadays the question is often raised if memory or disk upgrade is better choice in case of disk bound performance issue. As you might see from explanations above, the answer is very load dependent. If you can significantly decrease the amount of physical IO needed with your load by adding more memory, it is the best choice. However if you do not expect this to happen, for example if your database is evenly accessed and several times larger than amount of memory you can afford, it is way better idea to invest money in disk subsystem, assuming of course that your database schema and your application are well optimized. If you have badly behaving application or very un-optimal schema, even the most decent platform is not likely to help you.

