WL#5384: PERFORMANCE_SCHEMA Histograms

Affects: Server-8.0   —   Status: Complete   —   Priority: Very High

Currently, the statement instrumentation provide the following statistics
per statements digests, or globally:
- minimum latency
- average latency
- maximum latency

This is not enough for a DBA to assess overall the system performance,
assess if the server is operating efficiently, and assess if the server
performance is stable (narrow distribution of query latencies).

This feature provides histograms of statements latency,
to have a better visibility of the distribution of data.

Also, measurements of quantiles ("P95", "P99" and "P999" percentiles)
are computed from the histogram collected.
These percentiles are often used as indicators of quality of service.

Histograms are provided only for statements,
not for lower level instruments like stages or waits.

Histograms are aggregated per statement digest, and globally.

User Documentation
==================

* https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html
* https://dev.mysql.com/doc/refman/8.0/en/statement-histogram-summary-tables.html
Requirements
============

(1) Table performance_schema.events_statements_histogram_by_digest.
-------------------------------------------------------------------

Func-Req (1.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_statements_histogram_by_digest.

Func-Req (1.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_statements_histogram_by_digest.

Func-Req (1.3): Security privileges for table
events_statements_histogram_by_digest are enforced.
This table can be only selected from, by users with the proper grants.

Func-Req (1.4): TRUNCATE TABLE
performance_schema.events_statements_summary_by_digest
implicitly truncates data in table
performance_schema.events_statements_histogram_by_digest.

Func-Req (1.5): TRUNCATE TABLE
performance_schema.events_statements_histogram_by_digest
reset the table statistics.

(2) Table performance_schema.events_statements_histogram_global.
----------------------------------------------------------------

Func-Req (2.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_statements_histogram_global.

Func-Req (2.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_statements_histogram_global.

Func-Req (2.3): Security privileges for table events_statements_histogram_global
are enforced.
This table can be only selected from, by users with the proper grants.

Func-Req (2.4): TRUNCATE TABLE
performance_schema.events_statements_summary_global_by_event_name
implicitly truncates data in table
performance_schema.events_statements_histogram_global.

Func-Req (2.5): TRUNCATE TABLE
performance_schema.events_statements_histogram_global
reset the table statistics.

(3) Table performance_schema.events_statements_summary_by_digest.
-----------------------------------------------------------------

Func-Req (3.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_statements_summary_by_digest with the new
quantile columns.

Func-Req (3.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must add the
new quantile columns in table
performance_schema.events_statements_summary_by_digest.

TABLE performance_schema.events_statements_histogram_by_digest
==============================================================

New table.
----------

CREATE TABLE `events_statements_histogram_by_digest` (
  `SCHEMA_NAME` varchar(64),
  `DIGEST` varchar(32),
  `BUCKET_NUMBER` int(10) unsigned NOT NULL,
  `BUCKET_TIMER_LOW` bigint(20) unsigned NOT NULL,
  `BUCKET_TIMER_HIGH` bigint(20) unsigned NOT NULL,
  `COUNT_BUCKET` bigint(20) unsigned NOT NULL,
  `COUNT_BUCKET_AND_LOWER` bigint(20) unsigned NOT NULL,
  `BUCKET_QUANTILE` double(7,6) NOT NULL,
  UNIQUE KEY `SCHEMA_NAME` (`SCHEMA_NAME`,`DIGEST`,`BUCKET_NUMBER`)
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8

Privileges
----------

Only SELECT and TRUNCATE are supported.

A user must be granted the proper table privileges to
perform an operation.

Semantic
--------

Each statement digest (row in table events_statements_summary_by_digest)
is associated with an histogram.

An histogram consists of N buckets, displayed in table
events_statements_histogram_by_digest, where each row represents one bucket.

* SCHEMA_NAME, DIGEST: foreign key pointing to a statement digest in table
events_statements_summary_by_digest.

* BUCKET_NUMBER: Number of the current bucket

* BUCKET_TIMER_LOW, BUCKET_TIMER_HIGH: A bucket counts statements which have a
latency measured between BUCKET_TIMER_LOW and BUCKET_TIMER_HIGH. This timer is
expressed in picoseconds.

By definition, the value of BUCKET_TIMER_LOW for the
first bucket (BUCKET_NUMBER = 0) is zero.

The value of BUCKET_TIMER_LOW for a
bucket (BUCKET_NUMBER = k) is the same as BUCKET_TIMER_HIGH for the previous
bucket (BUCKET_NUMBER = k-1)

For the last bucket, BUCKET_TIMER_HIGH is MAX_UINT64,
because the last bucket is a catch all for statements which have
a latency that exceeds previous buckets in the histogram.

* COUNT_BUCKET: Number of statements measured with a latency in the
[BUCKET_TIMER_LOW, BUCKET_TIMER_HIGH[ interval.

* COUNT_BUCKET_AND_LOWER: Number of statements measured with a latency in the
[0, BUCKET_TIMER_HIGH[ interval.

* BUCKET_QUANTILE: proportion of statements what falls into this or lower
bucket. This proportion corresponds by definition to COUNT_BUCKET_AND_LOWER /
sum(COUNT_BUCKET), and is displayed as a convenience column.

Index:

Rows are indexed per digest (SCHEMA_NAME, DIGEST) and then per bucket
(BUCKET_NUMBER).
This helps performances with queries that inspect the histogram for a single
digest, which is a use case expected to be frequent.

Truncate:

A truncate on table events_statements_summary_by_digest
truncates all digests, and implicitly truncates data from
table events_statements_histogram_by_digest.

A truncate on table events_statements_histogram_by_digest
reset all statistics.

Behavior for the "NULL" digest:

Whenever table events_statements_summary_by_digest is full,
this table collects statistics about extra digests in a special
row, with a NULL SCHEMA_NAME and a NULL DIGEST.
In this case, table events_statements_histogram_by_digest
collects histograms for a a NULL SCHEMA_NAME and a NULL DIGEST.

TABLE performance_schema.events_statements_histogram_global
===========================================================

New table.
----------

CREATE TABLE `events_statements_histogram_global` (
  `BUCKET_NUMBER` int(10) unsigned NOT NULL,
  `BUCKET_TIMER_LOW` bigint(20) unsigned NOT NULL,
  `BUCKET_TIMER_HIGH` bigint(20) unsigned NOT NULL,
  `COUNT_BUCKET` bigint(20) unsigned NOT NULL,
  `COUNT_BUCKET_AND_LOWER` bigint(20) unsigned NOT NULL,
  `BUCKET_QUANTILE` double(7,6) NOT NULL,
  PRIMARY KEY (`BUCKET_NUMBER`)
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8

Privileges
----------

Only SELECT and TRUNCATE are supported.

A user must be granted the proper table privileges to
perform an operation.

Semantic
--------

Histograms are aggregated globally in this table.

The semantic for each columns is the same as for table
events_statements_histogram_by_digest.

Index:

Rows are indexed per bucket (BUCKET_NUMBER).

Truncate:

A truncate on table events_statements_summary_global_by_event_name
truncates all statements statistics,
and implicitly truncates data from table events_statements_histogram_global.

A truncate on table events_statements_histogram_global
resets all statistics.

Table performance_schema.events_statements_summary_by_digest
============================================================

Existing table.

New columns:
  `QUANTILE_95` bigint(20) unsigned NOT NULL,
  `QUANTILE_99` bigint(20) unsigned NOT NULL,
  `QUANTILE_999` bigint(20) unsigned NOT NULL,

Semantic:

* QUANTILE_95: 95th percentile, expressed in picoseconds, of the statement
latency. This percentile is a high estimate, computed from the histogram data
collected. In other words, for a given digest, 95 percents of the statements
measured have a latency lower that QUANTILE_95.

* QUANTILE_99: Likewise, for the 99th percentile.

* QUANTILE_999: Likewise, for the 99.9th percentile.

Note on statistics accuracy
---------------------------

Logically, the following statistics are the same:
- SELECT SUM(COUNT_STAR)
  FROM performance_schema.events_statements_summary_by_digest
- SELECT SUM(COUNT_BUCKET)
  FROM performance_schema.events_statements_histogram_by_digest

However, for performance reasons:
- statistics in events_statements_histogram_by_digest can be inaccurate,
because no lock is held when the statistics are updated.

Statistics in events_statements_histogram_by_digest are updated atomically,
and are accurate.

As a result, the two SELECT SUM() above may differ under heavy concurrent load.

Buckets used in histograms
==========================

Buckets are sized given the following formula:

bucket_timer_high = base * factor ^ bucket_number.

The first bucket (starting at base) is chosen to be low enough to cover very
fast statements.

The factor is not 10, not even 2, but chosen to be much lower, to collect
histograms with a very narrow bucket size.

This is required to compute 95th, 99th and 99.9th percentiles (see the
events_statements_summary_by_digest table) with a reasonable precision.

To compute power of 10 histograms easily, it is desirable to have:
  factor ^ K = 10
so that K buckets can be grouped together to compute one power of 10 bucket.

The number of buckets is chosen to have enough buckets to cover long running
statements.

Values are subject to change during implementation,
and are currently set to:

- base = 10 micro second
- K = 50
- factor = 1.0471285480508996 (increase by 4.7 percent, derived from K = 50)
- number = 450 buckets

With these values, an histogram covers times ranging:
- from 10 micro second
- to 10,000 seconds, or 2h46m40s.
- by 4.7 percent increases.
Grouping 50 buckets together makes one bucket for a power of 10 histogram.