WL#8356: Improve scalability by not acquiring unnecessary locks for internal temp tables
Affects: Server-5.7 — Status: Complete
After WL#6671 "Improve scalability by not using thr_lock.c locks for InnoDB tables" has been implemented LOCK_plugin and THR_LOCK_lock became more prominent as scalability bottlenecks in workloads which involve creation of internal in-memory temporary tables (like Sysbench's SELECT_DISTINCT test for InnoDB). The goal of this WL is to remove these bottlenecks. There is no real reason why we should acquire these locks for internal temporary tables.
NF0: This task is not supposed to cause any changes in user-visible behavior. NF1: There should not be any significant performance/scalability regressions caused by this WL. NF2: Together with WL#8355 this patch should improve performance/scalability in Sysbench DISTINCT_RANGES/InnoDB and possibly other tests on systems with many cores (we have seen nice effect from preliminary patch on system with 40 cores).
After WL#6671 has been implemented DimitriK's benchmarks identified the following scalability bottlenecks related to creation of internal temporary tables which caused problems in some of Sysbench tests (e.g. in DISTINCT_RANGES): 1) BITMAP::mutex lock. This lock protects bitmap which serves as a pool of possible names for internal temporary tables. The idea of this pool is to reduce set of names used for temporary tables. Using unique name for each new table, apparently, caused some performance problems on Linux in the past. It is not used on non-Linux systems. Since it is easy to avoid this bottleneck by turning the temporary table names pool off (using --temp-pool=0 start-up option) we are not going to do anything about this bottleneck in the scope of this task. 2) "THR_LOCK_lock" lock. This lock protects global list of all THR_LOCK objects in thr_lock.c lock manager. It turned out that for each internal temporary table created in Heap SE we still allocate and init THR_LOCK object even though it is not going be used for such tables. Since thr_lock_init() call used for such initialization acquires this global mutex we get totally unnecessary scalability bottleneck. The simple solution to this problem is skipping initialization of THR_LOCK and THR_LOCK_DATA structs for internal temporary tables in Heap SE. This doesn't require any additional changes since they are not really used for such tables. As a bonus we will also remove HP_SHARE::intern_lock mutex which is not used for anything in the current code. 3) LOCK_plugin lock. When we construct TABLE_SHARE object for internal temporary table we fill in TABLE_SHARE::db_plugin member by calling ha_lock_engine() function. This function allows to get proper reference to SE plugin from its handlerton object. In the process global LOCK_plugin is acquired creating scalability bottleneck. It is necessary for ha_lock_engine() to acquire LOCK_plugin for non-built-in SEs since we count references to plugins in these cases. This also happens for built-in SEs in debug builds. OTOH we do not use reference counting for built-in plugins in production builds. Thus it becomes possible to remove this bottleneck at least for built-in SEs in production, by taking a shortcut and not acquiring LOCK_plugin for them. Luckily all 3 SEs used for internal temporary tables (Heap, MyISAM, InnoDB) are built-in.
Copyright (c) 2000, 2020, Oracle Corporation and/or its affiliates. All rights reserved.