The MySQL server is capable of maintaining statement digest information. The digesting process converts each SQL statement to normalized form (the statement digest) and computes an MD5 hash value (the digest hash value) from the normalized result. Normalization permits statements that are similar to be grouped and summarized to expose information about the types of statements the server is executing and how often they occur. This section describes how statement digesting occurs and how it can be useful.
As of 5.6.24, digesting occurs in the parser regardless of whether the Performance Schema is available, so that other server components such as MySQL Enterprise Firewall have access to statement digests. Before MySQL 5.6.24, statement digesting was a function of the Performance Schema.
When the parser receives an SQL statement, it computes a statement digest if that digest is needed, which is true if any of the following conditions are true:
Performance Schema digest instrumentation is enabled
MySQL Enterprise Firewall is enabled
variable value determines the maximum number of bytes available
for computing normalized statement digests. Once that amount of
space is used during digest computation, truncation occurs: no
further tokens from a parsed statement are collected or figure
into its digest value. Statements that differ only after that
many bytes of parsed tokens produce the same normalized
statement digest and are considered identical if compared or if
aggregated for digest statistics.
After the normalized statement has been computed, an MD5 hash value is computed from it. In addition:
If MySQL Enterprise Firewall is enabled, it is called and the digest as computed is available to it.
If the Performance Schema has digest instrumentation enabled, it makes a copy of the normalized statement digest, allocating a maximum of
performance_schema_max_digest_lengthbytes for it. Consequently, if
performance_schema_max_digest_lengthis less than
max_digest_length, the copy is truncated relative to the original. The copy of the normalized statement digest is stored in the appropriate Performance Schema tables, along with the MD5 hash value computed from the original normalized statement. (If the Performance Schema truncates its copy of the normalized statement digest relative to the original, it does not recompute the MD5 hash value.)
In MySQL 5.6.24 and 5.6.25,
is not available and
max_digest_length applies to
all digest computation. Before MySQL 5.6.24, neither
are available and a fixed maximum of 1024 bytes applies to all
Statement normalization transforms the statement text to a more standardized digest string representation that preserves the general statement structure while removing information not essential to the structure:
Object identifiers such as database and table names are preserved.
Literal values are converted to parameter markers. A normalized statement does not retain information such as names, passwords, dates, and so forth.
Comments are removed and whitespace is adjusted.
Consider these statements:
SELECT * FROM orders WHERE customer_id=10 AND quantity>20 SELECT * FROM orders WHERE customer_id = 20 AND quantity > 100
To normalize these statements, the parser replaces data values
? and adjusts whitespace. Both statements
yield the same normalized form and thus are considered
SELECT * FROM orders WHERE customer_id = ? AND quantity > ?
The normalized statement contains less information but is still representative of the original statement. Other similar statements that have different data values have the same normalized form.
Now consider these statements:
SELECT * FROM customers WHERE customer_id = 1000 SELECT * FROM orders WHERE customer_id = 1000
In this case, the normalized statements differ because the object identifiers differ:
SELECT * FROM customers WHERE customer_id = ? SELECT * FROM orders WHERE customer_id = ?
If normalization produces a statement that exceeds the space
available in the digest buffer (as determined by
occurs and the text ends with “...”. Long
normalized statements that differ only in the part that occurs
following the “...” are considered the same.
Consider these statements:
SELECT * FROM mytable WHERE cola = 10 AND colb = 20 SELECT * FROM mytable WHERE cola = 10 AND colc = 20
If the cutoff happens to be right after the
AND, both statements have this normalized
SELECT * FROM mytable WHERE cola = ? AND ...
In this case, the difference in the second column name is lost and both statements are considered the same.
In the Performance Schema, statement digesting involves these components:
The statement event tables (
events_statements_history_long) have columns for storing normalized statement digests and the corresponding digest MD5 hash values:
DIGEST_TEXTis the text of the normalized statement digest. This is a copy of the original normalized statement that was computed to a maximum of
max_digest_lengthbytes, further truncated as necessary to
DIGESTis the digest MD5 hash value computed from the original normalized statement.
events_statements_summary_by_digestsummary table provides aggregated statement digest information. This table aggregates information for statements per
DIGESTcombination. The Performance Schema uses MD5 hash values for aggregation because they are fast to compute and have a favorable statistical distribution that minimizes collisions. See Section 220.127.116.11, “Statement Summary Tables”.
The statement event tables also have an
SQL_TEXT column that contains the original
SQL statement. The maximum space available for statement display
is 1024 bytes.
system variable determines the maximum number of bytes available
for digest value storage in the Performance Schema. However, the
display length of statement digests may be longer than the
available buffer size due to internal encoding of statement
components such as keywords and literal values. Consequently,
values selected from the
of statement event tables may appear to exceed the
summary table provides a profile of the statements executed by
the server. It shows what kinds of statements an application is
executing and how often. An application developer can use this
information together with other information in the table to
assess the application's performance characteristics. For
example, table columns that show wait times, lock times, or
index use may highlight types of queries that are inefficient.
This gives the developer insight into which parts of the
application need attention.
summary table has a fixed size. By default the Performance
Schema estimates the size to use at startup. To specify the
table size explicitly, set the
system variable at server startup. If the table becomes full,
the Performance Schema groups statements that have
values not matching existing values in the table in a special
DIGEST set to
permits all statements to be counted. However, if the special
row accounts for a significant percentage of the statements
executed, it might be desirable to increase the summary table
size by increasing
For applications that generate very long statements that differ
only at the end, increasing
computation of digests that distinguish statements that would
otherwise aggregate to the same digest. Conversely, decreasing
max_digest_length causes the
server to devote less memory to digest storage but increases the
likelihood of longer statements aggregating to the same digest.
Administrators should keep in mind that larger values result in
correspondingly increased memory requirements, particularly for
workloads that involve large numbers of simultaneous sessions
(the server allocates
max_digest_length bytes per
As described previously, normalized statement digests as
computed by the parser are constrained to a maximum of
whereas normalized statement digests stored in the Performance
bytes. The following memory-use considerations apply regarding
the relative values of
Server components other than the Performance Schema use normalized statement digests that take up to
The Performance Schema does not further truncate normalized statement digests that it stores, but allocates more memory than
max_digest_lengthbytes per digest, which is unnecessary.
Because the Performance Schema statement event tables might
store many digests, setting
enables administrators to balance these factors:
The need to have long normalized statement digests available for server components outside the Performance Schema
Many concurrent sessions, each of which allocates digest-computation memory
The need to limit memory consumption by the Performance Schema statement event tables when storing many statement digests