Events are collected by means of instrumentation added to the server source code. Instruments time events, which is how the Performance Schema provides an idea of how long events take. It is also possible to configure instruments not to collect timing information. This section discusses the available timers and their characteristics, and how timing values are represented in events.
Performance Schema timers vary in precision and amount of
overhead. To see what timers are available and their
characteristics, check the
performance_timers
table:
mysql> SELECT * FROM performance_schema.performance_timers;
+-------------+-----------------+------------------+----------------+
| TIMER_NAME | TIMER_FREQUENCY | TIMER_RESOLUTION | TIMER_OVERHEAD |
+-------------+-----------------+------------------+----------------+
| CYCLE | 2389029850 | 1 | 72 |
| NANOSECOND | 1000000000 | 1 | 112 |
| MICROSECOND | 1000000 | 1 | 136 |
| MILLISECOND | 1036 | 1 | 168 |
| THREAD_CPU | 339101694 | 1 | 798 |
+-------------+-----------------+------------------+----------------+
If the values associated with a given timer name are
NULL
, that timer is not supported on your
platform.
The columns have these meanings:
The
TIMER_NAME
column shows the names of the available timers.CYCLE
refers to the timer that is based on the CPU (processor) cycle counter.TIMER_FREQUENCY
indicates the number of timer units per second. For a cycle timer, the frequency is generally related to the CPU speed. The value shown was obtained on a system with a 2.4GHz processor. The other timers are based on fixed fractions of seconds.TIMER_RESOLUTION
indicates the number of timer units by which timer values increase at a time. If a timer has a resolution of 10, its value increases by 10 each time.TIMER_OVERHEAD
is the minimal number of cycles of overhead to obtain one timing with the given timer. The overhead per event is twice the value displayed because the timer is invoked at the beginning and end of the event.
The Performance Schema assigns timers as follows:
The wait timer uses
CYCLE
.The idle, stage, statement, and transaction timers use
NANOSECOND
on platforms where theNANOSECOND
timer is available,MICROSECOND
otherwise.
At server startup, the Performance Schema verifies that assumptions made at build time about timer assignments are correct, and displays a warning if a timer is not available.
To time wait events, the most important criterion is to reduce
overhead, at the possible expense of the timer accuracy, so
using the CYCLE
timer is the best.
The time a statement (or stage) takes to execute is in general
orders of magnitude larger than the time it takes to execute a
single wait. To time statements, the most important criterion
is to have an accurate measure, which is not affected by
changes in processor frequency, so using a timer which is not
based on cycles is the best. The default timer for statements
is NANOSECOND
. The extra
“overhead” compared to the
CYCLE
timer is not significant, because the
overhead caused by calling a timer twice (once when the
statement starts, once when it ends) is orders of magnitude
less compared to the CPU time used to execute the statement
itself. Using the CYCLE
timer has no
benefit here, only drawbacks.
The precision offered by the cycle counter depends on
processor speed. If the processor runs at 1 GHz (one billion
cycles/second) or higher, the cycle counter delivers
sub-nanosecond precision. Using the cycle counter is much
cheaper than getting the actual time of day. For example, the
standard gettimeofday()
function can take
hundreds of cycles, which is an unacceptable overhead for data
gathering that may occur thousands or millions of times per
second.
Cycle counters also have disadvantages:
End users expect to see timings in wall-clock units, such as fractions of a second. Converting from cycles to fractions of seconds can be expensive. For this reason, the conversion is a quick and fairly rough multiplication operation.
Processor cycle rate might change, such as when a laptop goes into power-saving mode or when a CPU slows down to reduce heat generation. If a processor's cycle rate fluctuates, conversion from cycles to real-time units is subject to error.
Cycle counters might be unreliable or unavailable depending on the processor or the operating system. For example, on Pentiums, the instruction is
RDTSC
(an assembly-language rather than a C instruction) and it is theoretically possible for the operating system to prevent user-mode programs from using it.Some processor details related to out-of-order execution or multiprocessor synchronization might cause the counter to seem fast or slow by up to 1000 cycles.
MySQL works with cycle counters on x386 (Windows, macOS, Linux, Solaris, and other Unix flavors), PowerPC, and IA-64.
Rows in Performance Schema tables that store current events
and historical events have three columns to represent timing
information: TIMER_START
and
TIMER_END
indicate when an event started
and finished, and TIMER_WAIT
indicates
event duration.
The setup_instruments
table has
an ENABLED
column to indicate the
instruments for which to collect events. The table also has a
TIMED
column to indicate which instruments
are timed. If an instrument is not enabled, it produces no
events. If an enabled instrument is not timed, events produced
by the instrument have NULL
for the
TIMER_START
, TIMER_END
,
and TIMER_WAIT
timer values. This in turn
causes those values to be ignored when calculating aggregate
time values in summary tables (sum, minimum, maximum, and
average).
Internally, times within events are stored in units given by the timer in effect when event timing begins. For display when events are retrieved from Performance Schema tables, times are shown in picoseconds (trillionths of a second) to normalize them to a standard unit, regardless of which timer is selected.
The timer baseline (“time zero”) occurs at
Performance Schema initialization during server startup.
TIMER_START
and
TIMER_END
values in events represent
picoseconds since the baseline. TIMER_WAIT
values are durations in picoseconds.
Picosecond values in events are approximate. Their accuracy is
subject to the usual forms of error associated with conversion
from one unit to another. If the CYCLE
timer is used and the processor rate varies, there might be
drift. For these reasons, it is not reasonable to look at the
TIMER_START
value for an event as an
accurate measure of time elapsed since server startup. On the
other hand, it is reasonable to use
TIMER_START
or
TIMER_WAIT
values in ORDER
BY
clauses to order events by start time or
duration.
The choice of picoseconds in events rather than a value such
as microseconds has a performance basis. One implementation
goal was to show results in a uniform time unit, regardless of
the timer. In an ideal world this time unit would look like a
wall-clock unit and be reasonably precise; in other words,
microseconds. But to convert cycles or nanoseconds to
microseconds, it would be necessary to perform a division for
every instrumentation. Division is expensive on many
platforms. Multiplication is not expensive, so that is what is
used. Therefore, the time unit is an integer multiple of the
highest possible TIMER_FREQUENCY
value,
using a multiplier large enough to ensure that there is no
major precision loss. The result is that the time unit is
“picoseconds.” This precision is spurious, but
the decision enables overhead to be minimized.
While a wait, stage, statement, or transaction event is executing, the respective current-event tables display current-event timing information:
events_waits_current
events_stages_current
events_statements_current
events_transactions_current
To make it possible to determine how long a not-yet-completed event has been running, the timer columns are set as follows:
TIMER_START
is populated.TIMER_END
is populated with the current timer value.TIMER_WAIT
is populated with the time elapsed so far (TIMER_END
−TIMER_START
).
Events that have not yet completed have an
END_EVENT_ID
value of
NULL
. To assess time elapsed so far for an
event, use the TIMER_WAIT
column.
Therefore, to identify events that have not yet completed and
have taken longer than N
picoseconds thus far, monitoring applications can use this
expression in queries:
WHERE END_EVENT_ID IS NULL AND TIMER_WAIT > N
Event identification as just described assumes that the
corresponding instruments have ENABLED
and
TIMED
set to YES
and
that the relevant consumers are enabled.