WL#8356: Improve scalability by not acquiring unnecessary locks for internal temp tables

Affects: Server-5.7 — Status: Complete

Description
Requirements
High Level Architecture

After WL#6671 "Improve scalability by not using thr_lock.c locks for InnoDB
tables" has been implemented LOCK_plugin and THR_LOCK_lock became more prominent
as scalability bottlenecks in workloads which involve creation of internal
in-memory temporary tables (like Sysbench's SELECT_DISTINCT test for InnoDB).

The goal of this WL is to remove these bottlenecks. There is no real reason why
we should acquire these locks for internal temporary tables.

NF0: This task is not supposed to cause any changes in user-visible
     behavior.

NF1: There should not be any significant performance/scalability regressions
     caused by this WL.

NF2: Together with WL#8355 this patch should improve performance/scalability
     in Sysbench DISTINCT_RANGES/InnoDB and possibly other tests on systems
     with many cores (we have seen nice effect from preliminary patch on
     system with 40 cores).

After WL#6671 has been implemented DimitriK's benchmarks identified the
following scalability bottlenecks related to creation of internal temporary
tables which caused problems in some of Sysbench tests (e.g. in DISTINCT_RANGES):

1) BITMAP::mutex lock. This lock protects bitmap which serves as a pool
   of possible names for internal temporary tables. The idea of this pool
   is to reduce set of names used for temporary tables. Using unique name
   for each new table, apparently, caused some performance problems on
   Linux in the past. It is not used on non-Linux systems.

   Since it is easy to avoid this bottleneck by turning the temporary table
   names pool off (using --temp-pool=0 start-up option) we are not going
   to do anything about this bottleneck in the scope of this task.

2) "THR_LOCK_lock" lock. This lock protects global list of all THR_LOCK
   objects in thr_lock.c lock manager. It turned out that for each internal
   temporary table created in Heap SE we still allocate and init THR_LOCK
   object even though it is not going be used for such tables. Since
   thr_lock_init() call used for such initialization acquires this global
   mutex we get totally unnecessary scalability bottleneck.

   The simple solution to this problem is skipping initialization of THR_LOCK
   and THR_LOCK_DATA structs for internal temporary tables in Heap SE. This
   doesn't require any additional changes since they are not really used
   for such tables.

   As a bonus we will also remove HP_SHARE::intern_lock mutex which is not
   used for anything in the current code.

3) LOCK_plugin lock. When we construct TABLE_SHARE object for internal
   temporary table we fill in TABLE_SHARE::db_plugin member by calling
   ha_lock_engine() function. This function allows to get proper reference
   to SE plugin from its handlerton object. In the process global
   LOCK_plugin is acquired creating scalability bottleneck.

   It is necessary for ha_lock_engine() to acquire LOCK_plugin for non-built-in
   SEs since we count references to plugins in these cases. This also happens
   for built-in SEs in debug builds.
   OTOH we do not use reference counting for built-in plugins in production
   builds. Thus it becomes possible to remove this bottleneck at least for
   built-in SEs in production, by taking a shortcut and not acquiring
   LOCK_plugin for them. Luckily all 3 SEs used for internal temporary tables
   (Heap, MyISAM, InnoDB) are built-in.