The MySQL server is capable of maintaining statement digest information. The digesting process converts each SQL statement to normalized form (the statement digest) and computes a SHA-256 hash value (the digest hash value) from the normalized result. Normalization permits statements that are similar to be grouped and summarized to expose information about the types of statements the server is executing and how often they occur. For each digest, a representative statement that produces the digest is stored as a sample. This section describes how statement digesting and sampling occur and how they can be useful.
Digesting occurs in the parser regardless of whether the Performance Schema is available, so that other features such as MySQL Enterprise Firewall and query rewrite plugins have access to statement digests.
When the parser receives an SQL statement, it computes a statement digest if that digest is needed, which is true if any of the following conditions are true:
Performance Schema digest instrumentation is enabled
MySQL Enterprise Firewall is enabled
A query rewrite plugin is enabled
The parser is also used by the
STATEMENT_DIGEST_TEXT()
and
STATEMENT_DIGEST()
functions,
which applications can call to compute a normalized statement
digest and a digest hash value, respectively, from an SQL
statement.
The max_digest_length
system
variable value determines the maximum number of bytes available
per session for computation of normalized statement digests.
Once that amount of space is used during digest computation,
truncation occurs: no further tokens from a parsed statement are
collected or figure into its digest value. Statements that
differ only after that many bytes of parsed tokens produce the
same normalized statement digest and are considered identical if
compared or if aggregated for digest statistics.
Setting the max_digest_length
system variable to zero disables digest production, which also
disables server functionality that requires digests.
After the normalized statement has been computed, a SHA-256 hash value is computed from it. In addition:
If MySQL Enterprise Firewall is enabled, it is called and the digest as computed is available to it.
If any query rewrite plugin is enabled, it is called and the statement digest and digest value are available to it.
If the Performance Schema has digest instrumentation enabled, it makes a copy of the normalized statement digest, allocating a maximum of
performance_schema_max_digest_length
bytes for it. Consequently, ifperformance_schema_max_digest_length
is less thanmax_digest_length
, the copy is truncated relative to the original. The copy of the normalized statement digest is stored in the appropriate Performance Schema tables, along with the SHA-256 hash value computed from the original normalized statement. (If the Performance Schema truncates its copy of the normalized statement digest relative to the original, it does not recompute the SHA-256 hash value.)
Statement normalization transforms the statement text to a more standardized digest string representation that preserves the general statement structure while removing information not essential to the structure:
Object identifiers such as database and table names are preserved.
Literal values are converted to parameter markers. A normalized statement does not retain information such as names, passwords, dates, and so forth.
Comments are removed and whitespace is adjusted.
Consider these statements:
SELECT * FROM orders WHERE customer_id=10 AND quantity>20
SELECT * FROM orders WHERE customer_id = 20 AND quantity > 100
To normalize these statements, the parser replaces data values
by ?
and adjusts whitespace. Both statements
yield the same normalized form and thus are considered
“the same”:
SELECT * FROM orders WHERE customer_id = ? AND quantity > ?
The normalized statement contains less information but is still representative of the original statement. Other similar statements that have different data values have the same normalized form.
Now consider these statements:
SELECT * FROM customers WHERE customer_id = 1000
SELECT * FROM orders WHERE customer_id = 1000
In this case, the normalized statements differ because the object identifiers differ:
SELECT * FROM customers WHERE customer_id = ?
SELECT * FROM orders WHERE customer_id = ?
If normalization produces a statement that exceeds the space
available in the digest buffer (as determined by
max_digest_length
), truncation
occurs and the text ends with “...”. Long
normalized statements that differ only in the part that occurs
following the “...” are considered the same.
Consider these statements:
SELECT * FROM mytable WHERE cola = 10 AND colb = 20
SELECT * FROM mytable WHERE cola = 10 AND colc = 20
If the cutoff happens to be right after the
AND
, both statements have this normalized
form:
SELECT * FROM mytable WHERE cola = ? AND ...
In this case, the difference in the second column name is lost and both statements are considered the same.
In the Performance Schema, statement digesting involves these elements:
A
statements_digest
consumer in thesetup_consumers
table controls whether the Performance Schema maintains digest information. See Statement Digest Consumer.The statement event tables (
events_statements_current
,events_statements_history
, andevents_statements_history_long
) have columns for storing normalized statement digests and the corresponding digest SHA-256 hash values:DIGEST_TEXT
is the text of the normalized statement digest. This is a copy of the original normalized statement that was computed to a maximum ofmax_digest_length
bytes, further truncated as necessary toperformance_schema_max_digest_length
bytes.DIGEST
is the digest SHA-256 hash value computed from the original normalized statement.
See Section 29.12.6, “Performance Schema Statement Event Tables”.
The
events_statements_summary_by_digest
summary table provides aggregated statement digest information. This table aggregates information for statements perSCHEMA_NAME
andDIGEST
combination. The Performance Schema uses SHA-256 hash values for aggregation because they are fast to compute and have a favorable statistical distribution that minimizes collisions. See Section 29.12.20.3, “Statement Summary Tables”.
Some Performance Tables have a column that stores original SQL statements from which digests are computed:
The
SQL_TEXT
column of theevents_statements_current
,events_statements_history
, andevents_statements_history_long
statement event tables.The
QUERY_SAMPLE_TEXT
column of theevents_statements_summary_by_digest
summary table.
The maximum space available for statement display is 1024 bytes
by default. To change this value, set the
performance_schema_max_sql_text_length
system variable at server startup. Changes affect the storage
required for all the columns just named.
The
performance_schema_max_digest_length
system variable determines the maximum number of bytes available
per statement for digest value storage in the Performance
Schema. However, the display length of statement digests may be
longer than the available buffer size due to internal encoding
of statement elements such as keywords and literal values.
Consequently, values selected from the
DIGEST_TEXT
column of statement event tables
may appear to exceed the
performance_schema_max_digest_length
value.
The
events_statements_summary_by_digest
summary table provides a profile of the statements executed by
the server. It shows what kinds of statements an application is
executing and how often. An application developer can use this
information together with other information in the table to
assess the application's performance characteristics. For
example, table columns that show wait times, lock times, or
index use may highlight types of queries that are inefficient.
This gives the developer insight into which parts of the
application need attention.
The
events_statements_summary_by_digest
summary table has a fixed size. By default the Performance
Schema estimates the size to use at startup. To specify the
table size explicitly, set the
performance_schema_digests_size
system variable at server startup. If the table becomes full,
the Performance Schema groups statements that have
SCHEMA_NAME
and DIGEST
values not matching existing values in the table in a special
row with SCHEMA_NAME
and
DIGEST
set to NULL
. This
permits all statements to be counted. However, if the special
row accounts for a significant percentage of the statements
executed, it might be desirable to increase the summary table
size by increasing
performance_schema_digests_size
.
For applications that generate very long statements that differ
only at the end, increasing
max_digest_length
enables
computation of digests that distinguish statements that would
otherwise aggregate to the same digest. Conversely, decreasing
max_digest_length
causes the
server to devote less memory to digest storage but increases the
likelihood of longer statements aggregating to the same digest.
Administrators should keep in mind that larger values result in
correspondingly increased memory requirements, particularly for
workloads that involve large numbers of simultaneous sessions
(the server allocates
max_digest_length
bytes per
session).
As described previously, normalized statement digests as
computed by the parser are constrained to a maximum of
max_digest_length
bytes,
whereas normalized statement digests stored in the Performance
Schema use
performance_schema_max_digest_length
bytes. The following memory-use considerations apply regarding
the relative values of
max_digest_length
and
performance_schema_max_digest_length
:
If
max_digest_length
is less thanperformance_schema_max_digest_length
:Server features other than the Performance Schema use normalized statement digests that take up to
max_digest_length
bytes.The Performance Schema does not further truncate normalized statement digests that it stores, but allocates more memory than
max_digest_length
bytes per digest, which is unnecessary.
If
max_digest_length
equalsperformance_schema_max_digest_length
:Server features other than the Performance Schema use normalized statement digests that take up to
max_digest_length
bytes.The Performance Schema does not further truncate normalized statement digests that it stores, and allocates the same amount of memory as
max_digest_length
bytes per digest.
If
max_digest_length
is greater thanperformance_schema_max_digest_length
:Server features other than the Performance Schema use normalized statement digests that take up to
max_digest_length
bytes.The Performance Schema further truncates normalized statement digests that it stores, and allocates less memory than
max_digest_length
bytes per digest.
Because the Performance Schema statement event tables might
store many digests, setting
performance_schema_max_digest_length
smaller than max_digest_length
enables administrators to balance these factors:
The need to have long normalized statement digests available to server features outside the Performance Schema
Many concurrent sessions, each of which allocates digest-computation memory
The need to limit memory consumption by the Performance Schema statement event tables when storing many statement digests
The
performance_schema_max_digest_length
setting is not per session, it is per statement, and a session
can store multiple statements in the
events_statements_history
table. A
typical number of statements in this table is 10 per session, so
each session consumes 10 times the memory indicated by the
performance_schema_max_digest_length
value, for this table alone.
Also, there are many statements (and digests) collected
globally, most notably in the
events_statements_history_long
table. Here, too, N
statements stored
consumes N
times the memory indicated
by the
performance_schema_max_digest_length
value.
To assess the amount of memory used for SQL statement storage
and digest computation, use the
SHOW ENGINE
PERFORMANCE_SCHEMA STATUS
statement, or monitor these
instruments:
mysql> SELECT NAME
FROM performance_schema.setup_instruments
WHERE NAME LIKE '%.sqltext';
+------------------------------------------------------------------+
| NAME |
+------------------------------------------------------------------+
| memory/performance_schema/events_statements_history.sqltext |
| memory/performance_schema/events_statements_current.sqltext |
| memory/performance_schema/events_statements_history_long.sqltext |
+------------------------------------------------------------------+
mysql> SELECT NAME
FROM performance_schema.setup_instruments
WHERE NAME LIKE 'memory/performance_schema/%.tokens';
+----------------------------------------------------------------------+
| NAME |
+----------------------------------------------------------------------+
| memory/performance_schema/events_statements_history.tokens |
| memory/performance_schema/events_statements_current.tokens |
| memory/performance_schema/events_statements_summary_by_digest.tokens |
| memory/performance_schema/events_statements_history_long.tokens |
+----------------------------------------------------------------------+
The Performance Schema uses statement sampling to collect
representative statements that produce each digest value in the
events_statements_summary_by_digest
table. These columns store sample statement information:
QUERY_SAMPLE_TEXT
(the text of the
statement), QUERY_SAMPLE_SEEN
(when the
statement was seen), and
QUERY_SAMPLE_TIMER_WAIT
(the statement wait
or execution time). The Performance Schema updates all three
columns each time it chooses a sample statement.
When a new table row is inserted, the statement that produced the row digest value is stored as the current sample statement associated with the digest. Thereafter, when the server sees other statements with the same digest value, it determines whether to use the new statement to replace the current sample statement (that is, whether to resample). Resampling policy is based on the comparative wait times of the current sample statement and new statement and, optionally, the age of the current sample statement:
Resampling based on wait times: If the new statement wait time has a wait time greater than that of the current sample statement, it becomes the current sample statement.
Resampling based on age: If the
performance_schema_max_digest_sample_age
system variable has a value greater than zero and the current sample statement is more than that many seconds old, the current statement is considered “too old” and the new statement replaces it. This occurs even if the new statement wait time is less than that of the current sample statement.
By default,
performance_schema_max_digest_sample_age
is 60 seconds (1 minute). To change how quickly sample
statements “expire” due to age, increase or
decrease the value. To disable the age-based part of the
resampling policy, set
performance_schema_max_digest_sample_age
to 0.