WL#6616: PERFORMANCE_SCHEMA, INDEXES

Affects: Server-8.0   —   Status: Complete

Implement indexes in the performance_schema storage engine itself.

This would allow the optimizer to have more choices (beside a full table scan),
and make queries more efficient.

Rationale for this task
=======================

With the performance schema in general, people are mostly concerned with "overhead".

Overhead can come from two sources:
- (a) overhead generated when instrumenting the application workload
- (b) overhead generated by the monitoring application itself when executing a
performance schema query.

For years now, everybody has been strongly focused on (a) alone.
All the performance improvements have been targeted to improve (a) as a result.

However, overhead from (b) is expected to be -- or become -- an issue, as
adoption of the performance schema, and helpers like sys views, increases.

With sys views being part of the mainstream server starting with MySQL 5.7,
overhead generated by sys queries in performance_schema tables is expected to
increase.

Example of query with and without index:

SELECT * from performance_schema.threads where PROCESSLIST_ID= 1000;

Without index:
- the SELECT is a full table scan
- the performance schema returns ALL rows
- the optimizer applies the where clause to every rows, only keeping 1

This consumes a lot of CPU, which indirectly affects the server performances at
peak load.

With an index on PROCESSLIST_ID:
- the optimizer uses the index to read data
- the performance schema is provided the Id to look for (1000)
- the performance schema returns only ONE row, not all

This is more efficient, consuming less CPU, for the same result.

Note that for tables where each row is computed on the fly, like most aggregate
statistics in the performance schema, the benefit of having an index makes an
even bigger difference, as statistics for rows that are not part of the final
result do not need to be -- computed --, which saves even more CPU.

This task is to reduce overhead from (b), to avoid impacting the server in
production simply by monitoring it.

Performance Characteristics
===========================

Performance Overhead

The row retrieval mechanism is the same for indexed and non-indexed queries, so
there should be no direct performance impact when fetching a row via an index.
The trivial cost of key matching is more than offset by the gains achieved with
the reduced size of the result set.

As an example, a primary key on performance_schema.variables_by_thread will
considerably reduce the size of the result set on a simple query, as evidenced
by the status counters of the storage engine API.

First, the query with indexes disabled:

  SELECT * FROM variables_by_thread IGNORE INDEX (primary) WHERE thread_id = 28
AND variable_name = 'time_zone'; (results omitted)

The handler status counters shows 135 reads:
  +-----------+-----------------------+----------------+
  | THREAD_ID | VARIABLE_NAME         | VARIABLE_VALUE |
  +-----------+-----------------------+----------------+
  |        28 | Handler_read_first    | 0              |
  |        28 | Handler_read_key      | 0              |
  |        28 | Handler_read_last     | 0              |
  |        28 | Handler_read_next     | 0              |
  |        28 | Handler_read_prev     | 0              |
  |        28 | Handler_read_rnd      | 0              |
  |        28 | Handler_read_rnd_next | 135            |
  +-----------+-----------------------+----------------+

Now, the same query with indexes enabled:

  SELECT * FROM variables_by_thread WHERE thread_id = 28 AND variable_name =
'time_zone';

The handler status counters shows 1 read:
+-----------+-----------------------+----------------+
| THREAD_ID | VARIABLE_NAME         | VARIABLE_VALUE |
+-----------+-----------------------+----------------+
|        28 | Handler_read_first    | 0              |
|        28 | Handler_read_key      | 1              |
|        28 | Handler_read_last     | 0              |
|        28 | Handler_read_next     | 0              |
|        28 | Handler_read_prev     | 0              |
|        28 | Handler_read_rnd      | 0              |
|        28 | Handler_read_rnd_next | 0              |
+-----------+-----------------------+----------------+

In this example there is only one user connection. With 1000 user connections,
the non-indexed result set would be 136,000 and the indexed result set would
still be 1.

How Performance Schema Indexes Differ from Regular Indexes
==========================================================

1) Indexes are fixed.

The indexes on Performance Schema tables are predefined. They cannot be deleted,
added or altered.

2) Indexes are virtual.

A Performance Schema index is implemented as a filtered scan across the
existing table data, rather than a traversal through a separate data structure.
There are no B-trees or hash tables to be constructed, updated or otherwise managed.

Performance Schema table indexes are also virtual in sense that table data does
not persist beyond server shutdown.

3) Indexes are unordered.

Performance Schema tables indexes behave like hash indexes in that a) they
quickly retrieve the desired rows, and b) do not provide row ordering, leaving
the server to sort the result set if necessary. However, depending on the query,
indexes obviate the need for a full table scan and will return a considerably
smaller result set.

Reported performance issues
===========================

http://mysqlserverteam.com/using-sys-session-as-an-alternative-to-show-processlist/

See Simon Mudd comments:

http://mysqlserverteam.com/using-sys-session-as-an-alternative-to-show-processlist/#comment-11037

"
On a busier server with 1400 connections SHOW PROCESSLIST still takes 0.00
seconds, and the select on sys.session took 1 minute 38 seconds.
"

"
This seems to indicate that some sort of indexing in P_S might be appropriate on
tables with a large number of rows? I have discussed this with a couple of
people and understand it complicates things so agree that avoiding if possible
is better, but people will expect to get good performance out of P_S/sys and
this shows that currently that is not always possible.
"