WL#7415: PERFORMANCE SCHEMA, STAGE PROGRESS

Affects: Server-5.7   —   Status: Complete

Improve the stage instrumentation to keep track of a progress indicator.

This allows, for example, monitor the progress of long statements,
such as ALTER TABLE.

User Documentation:

===============================================================================
FUNCTIONAL REQUIREMENTS
===============================================================================
CURRENT-VERSION = 5.7
PREVIOUS-VERSION = 5.6

Install
=======
F-1 A fresh MySQL installation of CURRENT-VERSION must create the following
tables with the new schema:
- 1.1 TABLE performance_schema.events_stages_current
- 1.2 TABLE performance_schema.events_stages_history
- 1.3 TABLE performance_schema.events_stages_history_long

Upgrade
=======
F-2 An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must upgrade the
following tables to the new schema:
- 2.1 TABLE performance_schema.events_stages_current
- 2.2 TABLE performance_schema.events_stages_history
- 2.3 TABLE performance_schema.events_stages_history_long

Progress Instrumentation
========================

F-3 Instrumentation of the existing stage "stage/sql/copy to tmp table" must
provide the current number of rows copied in the WORK_COMPLETED column,
and the number of rows to be copied in the WORK_ESTIMATED column.

Existing tables are changed as follows:

Table performance_schema.events_stages_current
==============================================

CREATE TABLE `events_stages_current` (
  `THREAD_ID` bigint(20) unsigned NOT NULL,
  `EVENT_ID` bigint(20) unsigned NOT NULL,
  `END_EVENT_ID` bigint(20) unsigned DEFAULT NULL,
  `EVENT_NAME` varchar(128) NOT NULL,
  `SOURCE` varchar(64) DEFAULT NULL,
  `TIMER_START` bigint(20) unsigned DEFAULT NULL,
  `TIMER_END` bigint(20) unsigned DEFAULT NULL,
  `TIMER_WAIT` bigint(20) unsigned DEFAULT NULL,

New columns:
  `WORK_COMPLETED` bigint(20) unsigned DEFAULT NULL,
  `WORK_ESTIMATED` bigint(20) unsigned DEFAULT NULL,

  `NESTING_EVENT_ID` bigint(20) unsigned DEFAULT NULL,
  `NESTING_EVENT_TYPE` enum('TRANSACTION','STATEMENT','STAGE','WAIT') DEFAULT NULL
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8

Table performance_schema.events_stages_history
==============================================

Likewise.

Table performance_schema.events_stages_history_long
===================================================

Likewise.

Semantic
========

For the stage represented by the current row:
- column WORK_COMPLETED indicate how many work units have been completed.
- column WORK_ESTIMATED indicate how many work units are expected for the stage.

The performance schema tables only provide a container to store that data,
in columns WORK_COMPLETED and WORK_ESTIMATED, but the performance schema makes
no assumption about the semantic of the metric itself.

The definition of "work unit" is left to the instrumentation code providing the
data, and depends on each stage definition, and therefore on each instrumentation.

A "work unit" is an integer metric that increases over time during execution,
such as:
- a number of bytes processed
- a number or rows processed
- a number of files processed
- a number of tables processed
- ...

The metric WORK_COMPLETED can increase 1 unit at a time,
or many units at a time, up to the instrumented code.

The metric WORK_ESTIMATED can change during the stage,
up to the instrumented code.

The code providing the stage instrumentation can implement one of the following
behavior:

No progress instrumentation
---------------------------

This is the most typical case, where no progress data is provided.
Columns WORK_COMPLETED and WORK_ESTIMATED are both NULL.

Unbounded progress instrumentation
----------------------------------

Only the column WORK_COMPLETED is provided, but no data is provided for column
WORK_ESTIMATED, which displays 0.

A monitoring application, by querying table
performance_schema.events_stage_current for the monitored session:
- can report how much work has been performed so far,
- can not report if the stage is near completion or not

Bounded progress instrumentation
--------------------------------

Both columns WORK_COMPLETED and WORK_ESTIMATED are provided.

A monitoring application, by querying table
performance_schema.events_stage_current for the monitored session:
- can report how much work has been performed so far,
- can report the overall completion percentage for the stage,
  by computing the WORK_COMPLETED / WORK_ESTIMATED ratio.

Server instrumentation provided
===============================

In general, instrumenting each stage is up to the maintainers of the calling
code, which is outside the scope of this task.

The amount of instrumentation provided in the server is up to the implementer.

Currently, during execution of an ALTER TABLE statement,
the stage "stage/sql/copy to tmp table" is currently used,
and this stage can execute potentially for a long time,
depending on the size of the data to copy.

As part of this task, stage "stage/sql/copy to tmp table" will be instrumented
to provided bounded progress information
(that is, column WORK_COMPLETED and WORK_ESTIMATED will be provided),
to demonstrate the feature.

For stage "stage/sql/copy to tmp table", the work unit used
is the number of rows copied.