WL#9250: Split LOCK_thd_list and LOCK_thd_remove mutexes

Affects: Server-8.0   —   Status: Complete

The major mutex bottlenecks for connect/disconnect performance are currently
LOCK_thd_list and LOCK_thd_remove. Both protect the global list of current 
connections (THDs).

This WL is about splitting these two mutexes to remove the bottleneck.

Performance testing of a draft patch show a 5.5% performance improvement
(measured as TPS for point selects with reconnect between each query).
Results also show that the draft patch removes remaining internal mutex
bottlenecks so that further performance improvements will have to come elsewhere.
NF-1: This WL will not introduce any user visible changes beyond increased
      connect/disconnect performance.
I-1: LOCK_thd_list, LOCK_thd_remove and COND_thd_list will be split. This will
     affect Performance Schema instrumentation of these as there will be
     >1 instances of each. No other changes to the interface specification.
1) Split Global_THD_manager::thd_list, LOCK_thd_list, LOCK_thd_remove and
   COND_thd_list. 8 partitions seems to be enough for removing the bottleneck.
   No plan to make the number of partitions user-configurable.
2) Convert Global_THD_manager::global_thd_count to an atomic counter since it
   is no longer updated by holding a single mutex. Use C++11 atomics.
3) Change e.g. Global_THD_manager::do_for_all_thd() so that we lock one
   instance of LOCK_thd_list, process matching thd_list partition and unlock, 
   before proceeding to next partition. I.e. we will not lock all
   instances of LOCK_thd_list before processing.   
4) Reduce the critical section of LOCK_global_system_variables to avoid
   malloc/free while holding this mutex. This applies to THD::init() and
   alloc_and_copy_thd_dynamic_variables().