WL#4601: Remove fastmutex from the server sources

Affects: Server-8.0   —   Status: Complete

The MySQL SQL-layer (i.e. not InnoDB) has a custom spin-lock mutex
implementation called "fast mutex". This mutex implementation is enabled
by default on release builds (-DBUILD_CONFIG=mysql_release) for Linux.

MySQL "fast" mutexes have a series of shortcomings and should be removed.

1) Spin-wait loops are hard to get right

Due to the superscalar nature of modern processors, busy waiting can incur
significant costs if not done properly as the processor generally needs to
enforce certain constraints. But how to properly delay the loop tends to vary a
lot between specific processors and is a burden that should be handled by the
system provided implementation and not us. For example, Intel suggests the use
of the PAUSE instruction as a hint for the processor and highlights that it's
very important for hyperthreaded CPUs. There is also specific delay strategies 
on ppc, s390, ia64, etc.

More details in subsection 2.1:

http://www.intel.com/cd/ids/developer/asmo-na/eng/17689.htm

The point is that system provided implementations knows better how to deal with
this, which may eventually lead to better performance and power consumption.

2) Useless on single processor systems

Busy waiting on single processor systems is completely useless as the thread
will spin on the CPU until its time quantum expires.

System implementations of spinlocks/mutexes usually know when the code is built
for or is running on a uniprocessor system and will skip the busy waiting.

3) Fixed-point arithmetic

The 'fast' mutex implementation relies on a PRNG to produce values for the
spin delay. The formula for calculating the spin count includes a floating-point
division operation that can be a bottleneck for multiprocessors systems that
only have a single floating point unit (FPU) (ie: Sun SPARC Enterprise T5240).

4) Adaptive Mutex

Nowadays most mainstream systems implement adaptive locks that provide a
balance between busy waiting benefits and disadvantages by spinning on a
lock only for a limited period and by not spinning at all if the lock is
being held by a thread that is not currently running. Solaris mutexes are
adaptive by default, Linux provides a attribute to make mutexes adaptive,
etc.

Another advantage of relying on system provided adaptive locks is that they
usually offer environment variables or other means by which one can control
the spin count, number of spinners, etc.

5) Starvation

Constantly polling the mutex with pthread_mutex_trylock without eventually
blocking until the mutex becomes available may lead to potential resource
starvation if there is a high demand for the lock. Theoretically this may
happen if the thread polling the mutex is the victim of unfavorable
scheduling.

6) Offers no measurable advantage

Various user reports and benchmarks have shown that there is no measurable
performance advantage when MySQL is compiled with 'fast' mutex.

7) Naïveté

Pretending that we can implement faster mutexes with such a simple code is just
naive and labeling then as "fast mutex" without any evidence is misleading to
our users.

Associated bugs:

BUG#38941: fast mutexes in MySQL 5.1 have mutex contention when calling random()
BUG#37703: MySQL performance with and without Fast Mutexes using Sysbench 
Workload

BUG#72805: mutex_delay() creating excess memory traffic, GCC mem barrier needed
BUG#72806: mutex_delay() missing x86 pause instruction optimization
BUG#72807: Set thread priority in my_pthread_fastmutex_lock

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-8.html
NF-1: Removing FAST_MUTEX support should have no negative performance implications.
I-1: The WL will remove the WITH_FAST_MUTEXES CMake build option. Since this
     option was enabled by default for release builds on Linux, it means that
     the release builds will change to now use default OS mutexes.