WL#4816: PERFORMANCE_SCHEMA Summaries

Affects: Server-5.6   —   Status: Complete   —   Priority: Low

Make tables in PERFORMANCE_SCHEMA which summarize instrumented data.

The initial description here was shifted from the "Summaries"
section of WL#2360 Performance Schema, which is now complete.
The description existed to show that, with WL#2360 as a basis,
we can produce what we'll want. Now we commit to the plan.
[ ADVANCE PSEUDO GROUP BY ]

Suppose the EVENTS_WAITS_HISTORY table contained rows, not just for the
last 10 events, but for all events since the instance started.
Then we could say:
SELECT EVENT_NAME, COUNT(*) AS COUNT_STAR, SUM(TIMER_WAIT) AS SUM_TIMER_WAIT
FROM EVENTS_WAITS_HISTORY
WHERE /* filter_condition, e.g. "in a previous statement" */
GROUP BY EVENT_NAME
And we'd have a lovely table like this:
EVENT_NAME                              COUNT_STAR    SUM_TIMER_WAIT
----------                              ----------    --------------
wait/synch/mutex/sql/LOCK_open                  14               140
wait/synch/rwlock/innodb/buf_block_lock          8                80
wait/io/file/csv/data                           22               220

Terminology notes: here EVENT_NAME is the "grouping column", COUNT and
SUM are "aggregates", TIMER_WAIT is an "aggregating column", the
result set is a "summary table". You'll see those terms several times.

All summary information is based on groupings and aggregatings.
The only flaw is that it's "pseudo". There is no table that goes
back forever. Shortly after producing an EVENTS_WAITS_HISTORY row, we risk
overwriting it (see section "Ring Buffers ..." in WL#2360). So, to provide
summary information, we'll have to do the grouping at the same time as, or
shortly after, the production of the EVENTS_WAITS_CURRENT row.

To some extent, that means we must guess what the user will want.
But we can guess well. We can provide a lot of tables for the user
to pick from (by enabling in SETUP_CONSUMERS). We can make sure that
the tables include anything that's been requested before or provided
before for MySQL or for some other DBMS.

[ WHAT WE CAN USE FOR AGGREGATING ]

These are the possible aggregations, in order of increasing difficulty.
COUNT(*)           Easy. If there's a row, then ++COUNT_STAR somewhere.
SUM(TIMER_WAIT)    Easy. Subtract TIMER_START from TIMER_END (or from
                   current time if TIMER_END is still null), and add
                   that to SUM_TIMER_WAIT somewhere.
AVG(TIMER_WAIT)    Easy. The consumer routine must calculate averages.
MIN(TIMER_WAIT)    A bit harder. The typical algorithm for calculating
or                 minimum|maximum in a list will have a line like this:
MAX(TIMER_WAIT)    "if (curr_value < min_value) then min_value = curr_value;"
                   but that might cause a 'jump' instruction at assembly
                   level. See "#72 Replace the min+max calculations with
                   tricky alternatives" in WL#5414.
STDDEV(TIMER_WAIT) Impossible. We won't do equivalents for any of the other
                   MySQL aggregating functions like BIT_XOR, GROUP_CONCAT,
                   STDDEV, and the others listed in the reference manual
                   http://dev.mysql.com/doc/refman/6.0/en/group-by-functions.html
                   because even if we could do them, who would care?

TIMER_WAIT is the basis for most aggregations. There were other possibilities:
MIN|MAX(START_TIME|END_TIME) if the timer is wall-clock this makes some sense;
SUM(SPINS) remembering that there might be other 'extra' columns like SPINS.
We ignore these other possibilities.

[ WHAT WE CAN USE FOR GROUPING ]

These are the possible grouping columns:
THREAD_ID           Actually grouping "by thread" is not a problem because
                    it's implicit. Almost all summarizations are done within a
thread.
                    We do not update counters of other threads, and we do not
                    update global counters, except in two special cases (see later
                    discussion of "global summaries", and notice that object
                    summary tables are global). This is one of the
                    reasons we won't need mutexes.
EVENT_NAME          The components of EVENT_NAME are all candidates for
                    grouping. To answer questions like "how many mutex_locks
                    occurred for LOCK_open" or "what is the average time for each
                    mutex event", we'd need a summary table that has a row for
                    each species. And specifies could be accumulated into genera
                    which could be accumulated into orders, which could be
                    accumulated into classes, all the way up the taxonomy line.
                    But we'll leave the higher accumulation levels out of WL#4816,
                    we'll go by species. 
OPERATION           Originally this was part of EVENT_NAME, and originally the
                    idea was to group by operations as well. But that idea ran
                    into some opposition.

Notice that EVENT_NAME, our favourite grouping column, has a fixed number of
possible values and they are all knowable in advance. (Actually a few can be
added by storage engines during server initialization but that's still "in
advance" for most practical purposes.) This is an important consideration.
It means that we can preallocate memory for the grouping permanently, and
it means that we can point to the memory quickly, because it's fixed offset.

We might get a request for a "per minute" grouping, based on
ROUND(TIMER_START/60000000000) or something similar. We won't
do it because we can't precalculate the possible values. But
it's irrelevant anyway, because groupings by time periods can
be done with "Samplings". (See WL#2360 section "Sampling".)
So we won't use any timer columns as grouping columns.

We might get a request for a grouping based on OBJECT_INSTANCE_BEGIN.
This would be great for answering really detailed questions like
"how many times did somebody access row#5" or "what was the time
elapsed when we wrote to a file starting at byte#16384". Sorry,
there are just too many possible values, Nobody saw a way to do it.
Histograms for OBJECT_INSTANCE_BEGIN might be a substitute, but we
won't have them either.

I (Peter) will talk later about "per statement" and "per object" groupings.

I think it's unlikely that people would want a grouping by EVENT_ID
or SPINS. And when something's unlikely, it's out of the question.

Combining the considerations about aggregating and grouping, we know
therefore there is an 'aggregators' struc which will contain these fields:
  NAME             LENGTH  DESCRIPTION
  grouping_type         2  Identifies the event, correlates with event_name
  count_star            8  We ++count_star whenever we end a wait
  sum_time_elapsed      8  We add to sum_time_elapsed when we end a wait
  min_time_elapsed      8  We calculate min_time_elapsed when we end a wait
  max_time_elapsed      8  We calculate max_time_elapsed when we end a wait
Total length = 34 bytes for one aggregators-struc occurrence.

[ DEFINITION OF A TYPICAL TABLE ]

(Note added 2010-06-30: Formerly there were references
to "by genus" and EVENTS_WAITS_SUMMARY_BY_THREAD_BY_GENUS.
Now it's "by species", that is, by the whole event name.)

Table name = EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME
List of columns and data types:
  THREAD_ID             BIGINT /* or INTEGER */ NOT NULL
  EVENT_NAME            VARCHAR(128) NOT NULL
  COUNT_STAR            BIGINT UNSIGNED NOT NULL
  SUM_TIMER_WAIT        BIGINT UNSIGNED NOT NULL
  MIN_TIMER_WAIT        BIGINT UNSIGNED NOT NULL
  AVG_TIMER_WAIT        BIGINT UNSIGNED NOT NULL
  MAX_TIMER_WAIT        BIGINT UNSIGNED NOT NULL

Compare the definition of EVENTS_WAITS_CURRENT:
columns EVENT_ID + SOURCE + SPINS + OBJECT_SCHEMA + OBJECT_NAME
+ OBJECT_TYPE + OBJECT_INSTANCE_BEGIN + OPERATION + NUMBER_OF_BYTES are gone,
column TIMER_WAIT is summarized,
column COUNT_STAR is added.

We prefer to have the columns for the effective GROUP BY clause
together and in order according to the effective groupings,
followed by the aggregate columns. We prefer the aggregate columns
to follow the effective GROUP BY columns, in the order -- this
is an arbitrary choice -- COUNT, SUM, MIN, AVG, MAX.


[ NESTING-EVENT SUMMARIES ]

We have moved the specification for nesting-event summaries to
WL#4813 PERFORMANCE_SCHEMA Instrumenting Stages.
At time of writing we believe that "Stages" will be the first
nesting event which we implement, so nesting-event summaries
fit better in that worklog task, which will come later.

[ HISTOGRAMS ]

We have moved the specification for histograms to
WL#5384 PERFORMANCE_SCHEMA Histograms.
At time of writing we believe that we will not implement
histograms for a long time, but they're there for later
consideration.

[ SETUP_OBJECTS ]

(2010-07-12 This section was moved out of OBJECT_SUMMARIES
because it is not specifically for OBJECT_SUMMARIES any more.)

We have a SETUP table. Call it SETUP_OBJECTS. It contains:
OBJECT_TYPE ENUM('TABLE') NOT NULL DEFAULT 'TABLE'
OBJECT_SCHEMA VARCHAR(64) DEFAULT '%'
OBJECT_NAME VARCHAR(64) NOT NULL DEFAULT '%'
TIMED ENUM('YES','NO') NOT NULL DEFAULT 'YES'

The OBJECT_TYPE column may only contain 'TABLE'.
'TABLE' meaning base table is the only possible type but
the OBJECT_TYPE column is just here for future use.

The OBJECT_SCHEMA column identifies the schema that an
object is in. It is nullable in case someday we decide
to include objects which are not part of schemas.
If the value is the wildcard '%', it means "any schema".

The OBJECT_NAME column identifies the object.
If the value is the wildcard '%', it means "any object".

The TIMED column is 'YES' if events for the object are
to be timed, rather than merely counted.

One may SELECT, INSERT, UPDATE, or DELETE from
SETUP_OBJECTS provided one has the appropriate privilege.

The presence of a row in SETUP_OBJECTS means that we
want to enable instrumentation for events which occur
on the object identified by OBJECT_TYPE plus OBJECT_SCHEMA
plus OBJECT_NAME. This affects instrumentation, not merely
summaries. The enablement is combined (ANDed) with the
settings in SETUP_INSTRUMENTS for the event names that
begin with 'wait/io/table...' or 'wait/lock/table...'.
Also assume that the thread is enabled, see WL#4674.

Examples:
1. INSERT INTO SETUP_OBJECTS ('TABLE','TEST','T','YES');
   This means "enable instrumentation for events on table
   TEST.T if SETUP_INSTRUMENTS.ENABLED = 'YES', and enable
   timing as well if SETUP_INSTRUMENTS.TIMED = 'YES'.
2. INSERT INTO SETUP_OBJECTS ('TABLE','TEST','%','NO');
   This means "enable instrumentation for events on any table
   in schema TEST (again, provided SETUP_INSTRUMENTS also
   says enabled). Do not turn the timer on, even if
   SETUP_INSTRUMENTS.TIMED is 'YES'.
3. INSERT INTO SETUP_OBJECTS ('TABLE','TEST','T','YES');
   INSERT INTO SETUP_OBJECTS ('TABLE','TEST','%','NO');
   This enables just as indicated for examples (1.) and (2.).
   The question might arise what happens for table TEST.T
   since it is true for both rows that we're inserting,
   but one row says TIMED='YES' while the other row says
   TIMED='NO'. The answer is: it's still enabled, but
   whether it's timed or not is implementation-dependent.
   Implementor choice: 'TEST', 'T' is more specific than
   'TEST', '%', so the more specific row prevails,
   and the table is timed in this example.
   Likewise, 'TEST', '%' is more specific than '%', '%'
   and takes precedence.
4. DELETE FROM SETUP_OBJECTS;
   When SETUP_OBJECTS is empty, instrumentation is off.
   Turning off instrumentation will not affect any summaries
   which already exist, however.

Initially there is one row with ('TABLE','%','%','YES').

We won't instrument for tables in PERFORMANCE_SCHEMA
itself, or in INFORMATION_SCHEMA, regardless of what
values are in SETUP_OBJECTS.

The maximum number of rows in SETUP_OBJECTS is
(performance_schema_setup_objects_size) 100.

We use the term "matching" when asking "how does MySQL
compare the table name it's working on with the table
name in SETUP_OBJECTS". The matching effectively takes
place at the time the event happens (although in practice
the matching only has to happen once unless SETUP_OBJECTS
changes.) The matching uses an implementation-dependent
collation. For tables, matching happens on both persistent
and temporary tables. For a wildcard, matching is always true.

Summaries only take place for events which are instrumented.

[ OBJECT SUMMARIES ]

A grouping "by object" is what people are asking for when they say
"* Track blocks read per file * Track blocks written per file
* Track I/O per second average per file"
[mysql inside intranet address]/wiki/ServerMonitoringWishlist
And one of the Google patches can give counts of reads and
writes per table, or per index.
I've also heard a request for "how often has a procedure been called".

Generally we want to summarize for "database objects" (base tables,
indexes, stored procedures, functions, events, triggers, modules,
constraints) or named "session objects" (prepared statements,
cursors, conditions, variables, savepoints). We don't worry about
summarizing for things like mutexes because they'd be both easy and
unnecessary. We can get summaries for all the lock_open mutexes without
needing to know the address that was used for lock_open.

At first glance it looks simple: just declare that OBJECT_NAME etc.
are grouping columns, or do the same things that we did for
"nesting-event summaries". But both those methods fail because:
* the number of objects isn't fixed and might be large
* the location of the object struc isn't known, it must be looked up
* the object name columns are strings, so they take time to compare.

So we need WL#2760."Super-Database (real Data Dictionary)".
I'm assuming that WL#2760 will give us persistent in-memory descriptors
for all schema objects, so we'll be able to use an address or offset
whenever we're using the object. When we have that, object summaries will
be easy. However, WL#2760 will not be available for a long time.

In the meantime, we offer a restricted-functionality solution.

This works only for base tables. MySQL sets up an internal list for
each table that MySQL becomes aware of, in a way which the
implementor defines. The maximum number of rows in this
internal table is (performance_schema_max_table_handles) 10000.
The internal table includes information to identify the object.

If an event takes place which affects a table,
if internal-table schema+name match schema+name in any SETUP_OBJECTS row,
if instrumentation is enabled as described in section SETUP_OBJECTS,
then
  instrumentation and aggregation takes place
  (MySQL adds to the aggregator fields, in the same manner as
  when it adds to nesting-event aggregator fields).

So now we can describe what is in the summary table.

Table name = OBJECTS_SUMMARY_GLOBAL_BY_TYPE
List of columns and data types:
   OBJECT_TYPE           VARCHAR(64) NOT NULL
   OBJECT_SCHEMA         VARCHAR(64) NOT NULL
   OBJECT_NAME           VARCHAR(64) NOT NULL
   COUNT_STAR            BIGINT UNSIGNED NOT NULL
   SUM_TIMER_WAIT        BIGINT UNSIGNED NOT NULL
   MIN_TIMER_WAIT        BIGINT UNSIGNED NOT NULL
   AVG_TIMER_WAIT        BIGINT UNSIGNED NOT NULL
   MAX_TIMER_WAIT        BIGINT UNSIGNED NOT NULL

The values for OBJECT_TYPE, OBJECT_SCHEMA, and OBJECT_NAME
are as in SETUP_OBJECTS. The values for COUNT_STAR,
SUM_TIMER_WAIT, MIN_TIMER_WAIT, AVG_TIMER_WAIT, MAX_TIMER_WAIT
are as in other summary tables such as
EVENTS_WAITS_SUMMARY_BY_EVENT_NAME.
By searching the global table, you'll have counting and
timing caused by all the table io done on the table.
There are two matters which the implementor may decide:
* If the SETUP_OBJECTS row is '%','%', and the event is
  on table TEST.T, shall the OBJECTS_SUMMARY_GLOBAL_BY_TYPE
  row be
  'TABLE','TEST','T',1,2,3,4,5
  or
  'TABLE','%','%',1,2,3,4,5
Implementor choice: 'TABLE','TEST','T',1,2,3,4,5
  The real object schema and object name is used.
* If a row is dropped from the internal table,
  shall it still appear in OBJECTS_SUMMARY_GLOBAL_BY_TYPE,
  or shall it disappear at any implementation-dependent time,
  or shall its counters be added to a "summary summary" row.
Implementor choice:
  When a row is dropped from the internal table,
  it is also dropped from OBJECTS_SUMMARY_GLOBAL_BY_TYPE.

One may SELECT or TRUNCATE from OBJECTS_SUMMARY_GLOBAL_BY_TYPE
provided one has the appropriate privileges.

Only one data-change statement is legal:
TRUNCATE OBJECTS_SUMMARY_GLOBAL_BY_TYPE;
which will reset all the bigint columns to 0 or NULL, but will
not actually remove any rows.

The implementor will doubtless have some trouble with robustness,
given that users might be updating SETUP_OBJECTS, and the contents
of the internal table might be changing, or another thread might
be working on the same object, while aggregating occurs.
Once again, we will just say: very occasionally there will be
erroneous information, we just document that's the way things are.

Once WL#2760 is done and we add support for other object types
besides 'TABLE', very little will change from the user's point
of view. The table and column descriptions will be the same.

For comparison with another DBMS, see dev-private email
[ mysql intranet archive ] /secure/mailarchive/mail.php?folder=4&mail=35591


[ INFINITE HISTORIES ]

The proposal is "no infinite histories", but let's consider them.

At the outset, I said that we have to update summaries during the
instrumentation because "There is no table that goes
back forever." But what if there was? There are two possibilities:
* Add 12GB of memory to your machine. Now there is enough space for
  making EVENTS_WAITS_HISTORY_LONG into a "long long long" table,
  by changing performance_schema_events_waits_history_long_size.
* Enable file trace. Now there is a disk file instead of memory.
  We can convert it to a MySQL table.

With such scenarios, the instrumentation doesn't need to do any
summarizing at all. We can assume that users will do their own
ad-hoc grouping and aggregating for everything.

But, if we require lots of memory, we make the instrumentation
obtrusive. Our objective is to make all users comfortable with
monitoring always on. And benchmarks now show that the performance
schema collects up to 800,000 events per second, which would
overflow a 12GB history in one minute. Really, this indicates
that EVENTS_WAITS_HISTORY_LONG is "mostly for debug".

And trace files have high overhead, as mentioned elsewhere (WL#4878).

So assume there are no infinite histories.

[ HIGHER-LEVEL SUMMARIES ]

This section is informative; it is not part of the WL#4816 requirement.

Earlier we said that there could be multiple summary tables, each for a
different EVENT_NAME level, for example EVENTS_WAITS_SUMMARY_BY_FAMILY and
EVENTS_WAITS_SUMMARY_BY_GENUS. but there is only one underlying structure,
the summaries for "by species". For all tables above the species level
(genus, family, order, class, etc.), the instrumentation does nothing at all.
We know that the Consumer can calculate the per-genus summaries by
adding up the per-species summaries.

Higher-level summaries are calculated, so they're disabled if the
lower-level summaries are disabled.

[ GLOBAL SUMMARIES ]

A global summary would be processed thus:
There are also occurrences of performance-structure-summary in global memory.
But these global summaries are not totals for all current jobs.
They are only for non-current jobs, that is, jobs that have terminated.
Why don't we update the global summaries at the same time as the per-thread
summaries? Because we fear conflicts, with two threads updating the global
summaries at the same time. That might be harmless if the only operation
is ADD. But the second reason is that we want the Provider's End procedure
("end_wait") to be as fast as possible, and so we'll pass extra work to the
Consumer. When the Provider thread is about to terminate, it gets a mutex
(call it Terminator mutex) and then adds all per-thread summaries to the
global summaries.
When a Consumer thread is about to display global information, for example
"the total number of mutex acquisitions for all sessions since the server
started", it must go through all the per-thread summaries and add them in
its temporary storage, and then add the global summaries.

So the important thing to understand is that all such summmaries are done
"within a thread", thus removing contention problems, which is important
because Providers never use mutexes. But if a thread is killed and its
memory is deallocated, then all its tables re copied to a single
"global summaries" area. This copying has to be under mutex control,
and Consumers must see the mutex, but it's not an exception to our
"Providers never use mutexes" rules. I mean the instrumentation has no mutexes.

This does mean that monitoring a dead thread is impossible.
But we have merely decided that information about dead threads is less
important, so it ends up in summaries, samples, and user-generated
histograms. Its details disappear.

The consumer of a "global" summary table will have to go through all the
threads and add them up, and then add the summaries from the dead-threads
global area.

Global summaries are calculated, so they're disabled if the
by-thread summaries are disabled.

For example, EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME would
have the same columns as EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME
except for the first column, THREAD_ID.

[ CLEARING ]

Low-level summary tables can be cleared by saying
TRUNCATE EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME
(or some other summary-table name).
Appropriate privileges on the table are necessary.
The effect is zeroing of the aggregate columns, not removal of the rows.

This functionality is apparently desirable -- the InnoDB metrics table
has a "reset" feature, and Marc Alff says "we need a way to clean summaries
after the configuration of SETUP_INSTRUMENTS / SETUP_CONSUMERS / etc
has changed, so that the results can be meaningful and not tainted by
previous aggregates made using different conditions."


[ THE ROUTINE ]

Simply put, a per-thread summary would be processed thus:
In the End procedure ("end_wait"), the Provider takes values from
performance-structure and adds them to a performance-structure-aggregator.
The summary is "by class of event", still within the thread. For example,
when we finish a mutex call, we calculate the total canonical time units waited
(.timer_end - .timer_start) and add it to
performance-structure-aggregators [mutex] . sum_timer_wait. And we
increment performance-structure-summary [mutex] . count_star.
We will need this summary for another table, too.

Remember that intrumentation has a 'start' (before the instrumented code)
and an 'end' (after the instrumented code). All summarizing is in the 'end'
("end_wait"). The pseudocode for a WAIT instrumentation is:

/* Do the non-summary instrumentation code. */
if (summary table for statement is enabled)
  ++statement.summary.count_star;
  statement.summary.sum_elapsed+=elapsed-time-for-this-instrument;
  conditional-move(statement.summary.min_elapsed,elapsed-time-for-this-instrument);
  conditional-move(statement.summary.max_elapsed,elapsed-time-for-this-instrument);
if (summary table for this wait is enabled)
  ++wait.summary[event].count_star;
  wait.summary[event].sum_elapsed+=elapsed-time-for-this-instrument;
 
conditional-move(wait.summary[event].min_elapsed,elapsed-time-for-this-instrument);
 
conditional-move(wait.summary[event].max_elapsed,elapsed-time-for-this-instrument);

In assembler, with wonderful optimization, with all summaries enabled,
this will take 2 not-taken conditional-jump instructions, 4 increment
or add instructions, 1 multiplication to get canonical time units from
the units that the timer returns, 4 conditional moves, and 1/5 cache miss.
55 cycles.
Eventually there will be 2 other nesting-event classes besides 'statements', so
multiply by 2, and say 110 cycles.

This is a good time to estimate the total cost.
It's 110 cycles overhead in speed terms, and in memory terms it's
34 bytes per 'aggregators' struc
times 500 occurrences of 'aggregators' in genus-level summaries
plus 2 occurrences of 'aggregators' in statement or stage summaries
times 200 threads
plus 500 occurrences of 'aggregators' in global summaries
Typically, then, it will all add up to somewhere around 1MB.

Our cost estimate, then, is 110 extra cycles per instrument and 1MB extra memory.

(Peter did some tests to see whether the above estimating is realistic
nowadays. The practice doesn't match the theory, but the architecture review
box can be ticked anyway. Ignore the estimating, it is useless for review purposes.)

[ TRICKS ]

These are tricks; they are recommendations; they are not requirements.

1. Be cheap with SETUP_CONSUMERS.
Effectively SETUP_CONSUMERS acts as a filter on production of summary
tables. Disabling saves a tiny amount of time, but more importantly,
it can save memory. Suppose that there are 1000 summary tables, but
we have a fixed limit, we say that users cannot enable more than 100.
Now, allocate memory for only 100 strucs, and one list of pointers
from the enabled-summaries list to the strucs themselves.

2. Group by genus or family not by species. This is just because we might have too
many species (so far we have about 200). So another way to accomplish
the same purpose is to disable instrumenting for many species, permanently.

See also WL#5414 PERFORMANCE_SCHEMA Optimize Instrumentation.

[ EXAMPLES ]

Assuming default schema = performance schema.

"What is the total number of wait calls of any type since instance start?"
SELECT SUM(COUNT_STAR) FROM EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME;

"For the current statement on thread 17, show the statement elapsed time,
the total time spent in the last 10 mutexes, and the average time spent on
all mutexes of type 'lock_open'".
SELECT TIMER_WAIT FROM EVENTS_STATEMENTS_CURRENT WHERE THREAD_ID = 17
UNION ALL
SELECT SUM(TIMER_WAIT) FROM EVENTS_WAITS_HISTORY WHERE THREAD_ID = 17
UNION ALL
SELECT WAIT_SYNCH_MUTEX_AVG_TIMER_WAIT FROM EVENTS_STATEMENTS_CURRENT
WHERE WAIT_SYNCH_MUTEX_EVENT_NAME = 'wait/synch/mutex/sql/LOCK_open';

"Show contents of all waits summary tables."
(Note added 2010-06-30: removed some class/family/order/class examples.) 
SELECT * FROM EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME;
...
SELECT * FROM EVENTS_STAGES_SUMMARY_BY_THREAD_BY_EVENT_NAME;
...
SELECT * FROM EVENTS_STATEMENTS_SUMMARY_BY_THREAD_BY_EVENT_NAME;
...
SELECT * FROM EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME;
...
SELECT * FROM EVENTS_STAGES_SUMMARY_GLOBAL_BY_EVENT_NAME;
...
SELECT * FROM EVENTS_STATEMENTS_SUMMARY_GLOBAL_BY_EVENT_NAME;
/* many tables in all */


[ REQUIREMENTS ]

Note: This section is refining the general specification written in this HLS
into more detailed technical or functional requirements, to help scope the tests.
This is *not* yet another spec, just some logical consequences of the HLS
sections written above.

(1) Table performance_schema.setup_objects.
-------------------------------------------

Func-Req (1.1): A fresh MySQL installation of 5.6 must create the table
performance_schema.setup_objects.
See the HLS for the table structure.

Func-Req (1.2): An upgrade from 5.5 to 5.6 must create the table
performance_schema.setup_objects.

Func-Req (1.3): Security privileges for performance_schema.setup_objects are
enforced.
Legal operations are SELECT, INSERT, UPDATE, DELETE, TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (1.4): Table performance_schema.setup_objects is visible in the
information_schema.

Func-Req (1.5): Table performance_schema.setup_objects is visible in SHOW TABLES.

Func-Req (1.6): Rows in performance_schema.setup_objects controls whether or not
TABLE
instrumentation
is enabled and/or timed. See the HLS for details.

(2) Table performance_schema.objects_summary_global_by_type.
------------------------------------------------------------

Func-Req (2.1): A fresh MySQL installation of 5.6 must create the table
performance_schema.objects_summary_global_by_type.
See the HLS for the table structure.

Func-Req (2.2): An upgrade from 5.5 to 5.6 must create the table
performance_schema.objects_summary_global_by_type.

Func-Req (2.3): Security privileges for
performance_schema.objects_summary_global_by_type are enforced.
Legal operations are SELECT, and TRUNCATE.
CREATE TABLE and DROP TABLE are currently also legal, as they are used during
install/upgrade.

Func-Req (2.4): Table performance_schema.objects_summary_global_by_type is
visible in
the information_schema.

Func-Req (2.5): Table performance_schema.objects_summary_global_by_type is
visible in
SHOW TABLES.

Func-Req (2.6): Rows in performance_schema.objects_summary_global_by_type are "per
objects" aggregations.
This aggregation is the logical equivalent of:
SELECT object_type, object_schema, object_name,
       count(*) as count_star,
       sum(timer_wait) as sum_timer_wait,
       min(timer_wait) as min_timer_wait,
       avg(timer_wait) as avg_timer_wait,
       max(timer_wait) as max_timer_wait
FROM performance_schema.events_waits_history_infinite
where object_type='TABLE' group by object_type, object_schema, object_name.

Func-Req (2.7): Truncate table performance_schema.objects_summary_global_by_type
resets the statistics collected.

(3) Existing performance schema tables.
---------------------------------------

Func-Req (3.1): Table performance_schema.events_waits_summary_global_by_event_name
shows aggregations for the table io instrument "wait/io/table/sql/handler".

Func-Req (3.2): Table
performance_schema.events_waits_summary_by_thread_by_event_name
shows aggregations for the table io instrument "wait/io/table/sql/handler".

(4) Server variables.
---------------------

Func-Req (4.1): There is a new server variable
performance_schema_setup_objects_size.
This variable is global and readonly.

Func-Req (4.2): There is a new server command line option
--performance-schema-setup-objects-size.

Server implementation.
----------------------

Tech-Req: The internal implementation of summaries in the performance schema has
(significantly)
changed with this WL.
This is a *technical* requirement only that affects design, that can not be
tested by a functional test.
Note from Marc: I consider this one already covered by code review.
The detailed design is documented in doxygen format.

See instructions in the file Doxyfile-perfschema.

In particular, see the doxygen page about aggregation.