WL#4813: PERFORMANCE_SCHEMA Instrumenting Stages

Affects: Server-Prototype Only   —   Status: Complete   —   Priority: Very High

Add EVENTS_STAGES viewed tables in the performance schema.   
Rows in the tables will have information about parts of statements   
(in progress or completed) for active users.   

What we say about stages applies mutatis mutandis to statements,
and stages might be the first nesting event which we
implement, so WL#2515 "PERFORMANCE_SCHEMA statements"
refers often to this worklog task.
Contents
--------
Terminology
No Overlaps
SETUP_INSTRUMENTS Table
EVENTS_STAGES Tables
Nesting-Event Summaries
Chip and Operating System Counters
SETUP_TIMERS Table
SETUP_CONSUMERS Table
Server Variables
Change to EVENTS_WAITS Tables
Requirements
References

Terminology
-----------

A stage is a process that takes place during
statement processing (it doesn't span a statement), and
examples that we've talked about include "parse" and
"filesort". The word "stages" is accepted. Other possible names were STATE
(which MySQL uses already), or SEGMENT, or STEP, or PHASE,
or PART OF A STATEMENT, or DATABASE CALL (which Oracle 11g uses),
or [ fill in your own favourite here ] -- nobody voted for them.

We follow the matryoshka design principle
http://en.wikipedia.org/wiki/Matryoshka
From biggest to smallest:
Session -> Transaction -> Statement -> Stage -> Wait.
We say that a wait is "nested" in a stage ("matryoshkificated"
is too long), so a stage is a "nesting event" for a wait.
In turn, the nesting event for a stage is a statement.
The worklog task WL#5556 Nesting-Event Summaries" will describe
the appropriate columns of any nesting event.

SETUP_INSTRUMENTS Table
-----------------------

There will be new rows in PERFORMANCE_SCHEMA.SETUP_INSTRUMENTS
for stages. But what are they? We considered these possibilities:
1. Use the "backup-progress" values.
2. Use the "General Thread States" values.
3. Use modified "thd_proc_info" values.
4. Follow Oracle 11g.
5. Overhaul.
6. From optimizer.
7. [ Fill in your own favourite here. ]
The winner was '2.'.
For details about the rejected possibilities, see 'Progress' for 2010-08-16.

The MySQL Reference Manual section 7.8.5.2. General Thread States
http://dev.mysql.com/doc/refman/5.5/en/general-thread-states.html
and following sections (Delayed-Insert Thread States,
Query Cache Thread States, Replication Master Thread States,
Replication Slave I/O Thread States, Replication Slave SQL Thread States,
Replication Slave Connection Thread States, Event Scheduler Thread States)
describe the values we are looking at. SHOW PROFILE CPU and
SHOW PROCESSLIST have similar bases.

Stages begin and end when thread state values change.
For example, when the general thread state changes to
"checking permissions", then an old stage ends and a
new stage named "checking permissions" begins.
For example, suppose a statement has 10 thread states:
starting
Opening tables
System lock
Table lock
init
end
query end
freeing items
logging slow query
cleaning up
When the state changes to 'cleaning up', then instrumenting
of the previous stage ('logging slow query') ends, and
instrumenting of the new stage ('cleaning up') begins.
So EVENTS_STAGES_CURRENT will have a 'cleaning up' row.

The state value becomes part of EVENT_NAME in SETUP_INSTRUMENTS
and in events tables, following our 'taxonomy' convention:
Class:     stage
Order      relevant code area, for example 'sql' or 'myisam'
Family     Thread States value
For example: 'stage/sql/cleaning up'

A stage's event_name has only three components,
this differs from waits which have five components.

So the columns of SETUP_INSTRUMENTS are:
EVENT_NAME       as above
ENABLED          'NO' means the stage is disabled,
                 'YES' means the stage is enabled.
TIMED            'NO' means the stage timers are off,
                 'YES' means the stage timers are on.

For example:

mysql> select * from setup_instruments where name like "stage/%" order by name
limit 10;
+-----------------------------------------------+---------+-------+
| NAME                                          | ENABLED | TIMED |
+-----------------------------------------------+---------+-------+
| stage/sql/After create                        | YES     | YES   |
| stage/sql/allocating local table              | YES     | YES   |
| stage/sql/Changing master                     | YES     | YES   |
| stage/sql/Checking master version             | YES     | YES   |
| stage/sql/checking permissions                | YES     | YES   |
| stage/sql/checking privileges on cached query | YES     | YES   |
| stage/sql/checking query cache for query      | YES     | YES   |
| stage/sql/cleaning up                         | YES     | YES   |
| stage/sql/closing tables                      | YES     | YES   |
| stage/sql/Connecting to master                | YES     | YES   |
+-----------------------------------------------+---------+-------+
10 rows in set (0.00 sec)

The ENABLED and TIMED values affect only the
instrumentation of the stage itself. They do
not affect instrumentation of waits nested
within the stage -- there is no 'cascading'.
So waits can be enabled even if stages are
disabled.


No Overlaps
-----------

Stages "should" be sequential. That is:
* one stage should not be nested within another stage
* one stage should not overlap with another stage.

But we must accept that nesting and overlapping will happen.
Keeping a stack is possible, but not for all the thousands of aggregator bytes.
We considered and rejected that we could have:
Row#1     First part of Event A
Row#2     All of Event B
Row#3     Second part of Event A
For this, we could recover the fact that we are resuming Event A
when Event B ends. So, when starting Event B, we could save that
we were in Event A. And when ending Event B, we could recover that fact.
And in Row#3 there could be an indication that "this is Part 2" or
"Continued". However, this will not be done.
Everything will appear to be linear.

EVENTS_STAGES Tables
--------------------

The EVENTS_STAGES tables in PERFORMANCE_SCHEMA are
EVENTS_STAGES_CURRENT
EVENTS_STAGES_HISTORY
EVENTS_STAGES_HISTORY_LONG
EVENTS_STAGES_SUMMARY_BY_THREAD_BY_EVENT_NAME
EVENTS_STAGES_SUMMARY_GLOBAL_BY_EVENT_NAME
They're analogous to EVENTS_WAITS tables (EVENTS_WAITS_CURRENT etc.)
and the column descriptions are the same as in EVENTS_WAITS tables,
plus summary columns (see section "Nesting-Event Summaries"),
plus extra timer columns (see section "Chip and Operating System Counters").

(We may summarize _BY_FAMILY or _BY_GENUS rather than BY_EVENT_NAME,
but until it's decided let us use _BY_EVENT_NAME.)

To be more specific about the columns that are in common
for CURRENT and HISTORY and HISTORY_LONG tables:
THREAD_ID             identifies the thread
EVENT_ID              is a unique identifier "of the event", that is,
                      it's the same counter that we use for waits,
                      therefore stage 101 does not follow stage 100,
                      it follows wait 100 (assuming both are instrumented).
EVENT_NAME            see section "SETUP_INSTRUMENTS Table"
SOURCE                identifies the source code where instrumented
TIMER_START           when instrumentation starts
TIMER_END             when instrumentation ends
                      (the assumption is that there are no blank
                      areas between stages, so the end of one stage
                      is always equal to the start of the next stage)
TIMER_WAIT            TIMER_END - TIMER_START
NESTING_EVENT_ID      the EVENT_ID of the nesting event, usually statement
NESTING_EVENT_TYPE    'statement' or 'stage' or 'wait'

Notice the removal of the columns SPINS, OBJECT_SCHEMA,
OBJECT_NAME, OBJECT_TYPE, OBJECT_INSTANCE_BEGIN,
OPERATION, NUMBER_OF_BYTES, FLAGS.

So why have a different set of tables for stages?
One thought was: their content is backed up by a different set
of internal buffers. Another thought was: they obviously aren't
WAITS. Indeed, they're obviously not EVENTS either, but we're
sticking to the word EVENTS for all the tables with instrumentation
data.

EVENTS_STAGES_SUMMARY_BY_THREAD_BY_EVENT_NAME is the STAGES
summary table corresponding to EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME.
  THREAD_ID BIGINT /* or INTEGER */ NOT NULL
  EVENT_NAME VARCHAR(128) NOT NULL        /* i.e. stage name */
  COUNT_STAR BIGINT unsigned NOT NULL
  SUM_TIMER_WAIT BIGINT UNSIGNED NOT NULL
  MIN_TIMER_WAIT BIGINT UNSIGNED NOT NULL
  AVG_TIMER_WAIT BIGINT UNSIGNED NOT NULL
  MAX_TIMER_WAIT BIGINT UNSIGNED NOT NULL

EVENTS_STAGES_SUMMARY_GLOBAL_BY_EVENT_NAME is the STAGES
summary table corresponding to EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME.
  EVENT_NAME VARCHAR(128) NOT NULL        /* i.e. stage name */
  COUNT_STAR BIGINT unsigned NOT NULL
  SUM_TIMER_WAIT BIGINT UNSIGNED NOT NULL
  MIN_TIMER_WAIT BIGINT UNSIGNED NOT NULL
  AVG_TIMER_WAIT BIGINT UNSIGNED NOT NULL
  MAX_TIMER_WAIT BIGINT UNSIGNED NOT NULL

There are also of course the nesting-event summaries.
We will consider alternative ways of filling nesting-event
summary columns in these tables, in WL#5556 Nesting-Event Summaries.

Chip and Operating System Counters
----------------------------------

[ 2010-09-03 This section has been moved to WL#2515. Due to voting,
stages will not have timer2 or timer3, and getrusage-like timers are
okay for statements. ]

SETUP_TIMERS Table
------------------

There will be an additional "stages" row in performance_schema.SETUP_TIMERS:
mysql> select * from SETUP_TIMERS;
+--------------------+--------------+
| NAME               | TIMER_NAME   |
+--------------------+--------------+
| wait               | CYCLE        |
| stage              | NANOSECOND   |
+--------------------+--------------+

If the TIMER_NAME column (which is updatable) is set to 'NANOSECOND'
for the 'stage' row, then stage instrumentation will be done with a
nanosecond timer.

Possible values for TIMER_NAME are what's in PERFORMANCE_TIMERS
(CYCLE, NANOSECOND, MICROSECOND, MILLISECOND, TICK).
The timer for 'stage' can differ from the timer for 'wait'. 

SETUP_CONSUMERS Table
---------------------

For every 'events_waits_...' row in the PERFORMANCE_SCHEMA.SETUP_CONSUMERS
table, there will be a corresponding 'events_stages_...' row, thus:
mysql> select * from SETUP_CONSUMERS;
+----------------------------+---------+
| NAME                       | ENABLED |
+----------------------------+---------+
| events_stages_current      | YES     |
| events_stages_history      | YES     |
| events_stages_history_long | YES     |
+----------------------------+---------+

There might be more consumers depending on how aggregates are
implemented, the implementor may decide this.
For example, we may have events_stage_summary_... consumers. 

Server Variables
----------------

The default number of history rows is 10; the default number of
history_long rows is 10000; this is settable with mysqld and
visible thus:

mysql> show variables like 'performance%stages%';
+----------------------------------------------------+---------+
| Variable_name                                      | Value   |
+----------------------------------------------------+---------+
| performance_schema_events_stages_history_long_size | 10000   |
| performance_schema_events_stages_history_size      | 10      |
+----------------------------------------------------+---------+

Change to EVENTS_WAITS Tables
-----------------------------

The NESTING_EVENT_ID column in EVENTS_WAITS_CURRENT etc.
(that is, EVENTS_WAITS_CURRENT and EVENTS_WAITS_HISTORY and
EVENTS_WAITS_HISTORY_LONG) is NULL.
It should be the EVENT_ID of the wait's STAGES row, if any.
Or, it should be the EVENT_ID of another wait that this wait is
nested in, and in that case NESTING_EVENT_TYPE should be 'stage'.
(NESTING_EVENT_TYPE will have to be a new EVENTS_WAITS_CURRENT column.)

Requirements
============

Note: This section is refining the general specification written in this HLS
into more detailed technical or functional requirements, to help scope the tests.
This is *not* yet another spec, just some logical consequences of the HLS
sections written above.

(1) Table performance_schema.events_stages_current.
---------------------------------------------------

Func-Req (1.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_stages_current.

Func-Req (1.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_stages_current.

Func-Req (1.3): Security privileges for events_stages_current
are enforced. Legal operations are SELECT, TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (1.4): Table performance_schema.events_stages_current
is visible in the information_schema.

Func-Req (1.5): Table performance_schema.events_stages_current
is visible in SHOW TABLES.

(2) Table performance_schema.events_stages_history.
---------------------------------------------------

Func-Req (2.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_stages_history.

Func-Req (2.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_stages_history.

Func-Req (2.3): Security privileges for events_stages_history
are enforced. Legal operations are SELECT, TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (2.4): Table performance_schema.events_stages_history
is visible in the information_schema.

Func-Req (2.5): Table performance_schema.events_stages_history
is visible in SHOW TABLES.

(3) Table performance_schema.events_stages_history_long.
--------------------------------------------------------

Func-Req (3.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_stages_history_long.

Func-Req (3.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_stages_history_long.

Func-Req (3.3): Security privileges for events_stages_history_long
are enforced. Legal operations are SELECT, TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (3.4): Table performance_schema.events_stages_history_long
is visible in the information_schema.

Func-Req (3.5): Table performance_schema.events_stages_history_long
is visible in SHOW TABLES.

(4) Table performance_schema.events_stages_summary_by_thread_by_event_name.
---------------------------------------------------------------------------

Func-Req (4.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_stages_summary_by_thread_by_event_name.

Func-Req (4.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_stages_summary_by_thread_by_event_name.

Func-Req (4.3): Security privileges for
events_stages_summary_by_thread_by_event_name
are enforced. Legal operations are SELECT, TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (4.4): Table
performance_schema.events_stages_summary_by_thread_by_event_name
is visible in the information_schema.

Func-Req (4.5): Table
performance_schema.events_stages_summary_by_thread_by_event_name
is visible in SHOW TABLES.

(5) Table performance_schema.events_stages_summary_global_by_event_name.
------------------------------------------------------------------------

Func-Req (5.1): A fresh MySQL installation of CURRENT-VERSION must create the
table performance_schema.events_stages_summary_global_by_event_name.

Func-Req (5.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the table performance_schema.events_stages_summary_global_by_event_name.

Func-Req (5.3): Security privileges for events_stages_summary_global_by_event_name
are enforced. Legal operations are SELECT, TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (5.4): Table performance_schema.events_stages_summary_global_by_event_name
is visible in the information_schema.

Func-Req (5.5): Table performance_schema.events_stages_summary_global_by_event_name
is visible in SHOW TABLES.

(6) Table performance_schema.setup_timers.
------------------------------------------

Func-Req (6.1): Table setup_timers has a new timer named "stage".

(7) Table performance_schema.setup_instruments.
-----------------------------------------------

Func-Req (7.1): Table setup_instruments displays new rows for the stage
instrumentation, with a prefix "stage/".

(8) Table performance_schema.setup_consumers.
---------------------------------------------

Func-Req (8.1): Table setup_consumers has a new consumer named
"events_stages_current", which control the table of the same name.

Func-Req (8.2): Table setup_consumers has a new consumer named
"events_stages_history", which control the table of the same name.

Func-Req (8.3): Table setup_consumers has a new consumer named
"events_stages_history_long", which control the table of the same name.

(9) Server start options and variables.
---------------------------------------

Func-Req (9.1): A new server start option is available,
"performance-schema-events-stages-history-size", to control the size of the
table events_stage_history.

Func-Req (9.2): A new server start option is available,
"performance-schema-events-stages-history-long-size", to control the size of the
table events_stage_history_long.

Func-Req (9.3): A new server start option is available,
"performance-schema-max-stage-classes", to control the maximum number of stage
instrument classes.

(10) Server status.
-------------------

Func-Req (10.1): A new server status variable is available,
"Performance_schema_stage_classes_lost".

(11) Table performance_schema.events_waits_current.
--------------------------------------------------

Func-Req (11.1): A fresh MySQL installation of CURRENT-VERSION must create
table events_waits_current with the new NESTING_EVENT_TYPE column.

Func-Req (11.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must upgrade
table events_waits_current to add the new NESTING_EVENT_TYPE column.

(12) Table performance_schema.events_waits_history.
--------------------------------------------------

Func-Req (12.1): A fresh MySQL installation of CURRENT-VERSION must create
table events_waits_history with the new NESTING_EVENT_TYPE column.

Func-Req (12.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must upgrade
table events_waits_history to add the new NESTING_EVENT_TYPE column.

(13) Table performance_schema.events_waits_history_long.
-------------------------------------------------------

Func-Req (13.1): A fresh MySQL installation of CURRENT-VERSION must create
table events_waits_history_long with the new NESTING_EVENT_TYPE column.

Func-Req (13.2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must upgrade
table events_waits_history_long to add the new NESTING_EVENT_TYPE column.


References
----------

dev-private email thread "Tables for showing what stage a statement has reached"
starting with
[mysql internal address]/secure/mailarchive/mail.php?folder=4&mail=21981

WL#1023 Check if we can get CPU time used by thread (1023)
WL#4116 Online Backup: Record Online Backup Progress
WL#4170 Provide progress indication for long queries and operations

Blog article “Show profile” + “Information_schema.profiling”
http://blogs.mysql.com/peterg/2008/11/06/show-profile-information_schemaprofiling/
The code in bzr will contain the low-level
structures and main algorithms, commented and set
apart from other code with if/endif as described in
the HLS. That suffices as an LLD for a task of this nature.