WL#8355: Improve scalability by partitioning LOCK_grant lock.

Affects: Server-5.7   —   Status: Complete   —   Priority: Medium

After WL#6671 "Improve scalability by not using thr_lock.c locks for InnoDB
tables" has been implemented LOCK_grant lock became more visible as a
scalability bottleneck in some workloads (e.g. 1-table InnoDB POINT_SELECT
Sysbench test).


The goal of this WL is to improve scalability by partitioning LOCK_grant lock.
NF0: This task is not supposed to cause any changes in user-visible

NF1: There should not be any significant performance/scalability regressions
     caused by this WL.

NF2: This patch should improve performance/scalability in Sysbench
     POINT_SELECT/InnoDB and possibly other tests on systems with many
     cores (we have seen nice effect from preliminary patch on system
     with 40 cores).
The goal of this task is to solve scalability bottleneck caused by
LOCK_grant for some workloads (e.g. Sysbench POINT_SELECT/InnoDB).
Since these workloads do not involve changes to privilege information
LOCK_grant is locked only in read mode in them. So scalability issues
come from cache invalidation/concurrent atomic operations on rwlock
object, rather than inherent to the problem we are trying to solve by
using this rwlock.

The idea is pretty simple, partition LOCK_grant rwlock into several
partitions and make read lock requests to acquire read lock on only
one partition (write lock requests lock will have to lock all
partitions). The specific partition to be used will be determined
by thread id. As result concurrent acquisitions of read lock by
different threads are likely to use different partitions reducing
negative effects of cache invalidation/concurrent atomic operations.

The exact number of partitions will be constant and to be determined
during benchmarking.

New class class implementing rwlock partitioned by THD/thread id to
be added. Code in ACL subsystem to be adjusted to use object of this
class instead of LOCK_grant rwlock.