As of MySQL 5.6.5, the MySQL server is capable of maintaining statement digest information. The digesting process converts an SQL statement to normalized form and computes a hash value for the result. Normalization permits statements that are similar to be grouped and summarized to expose information about the types of statements the server is executing and how often they occur. This section describes how statement normalizing occurs and how it can be useful.
Before MySQL 5.6.24, statement digesting was a function of the Performance Schema. As of 5.6.24, digesting occurs at the SQL level regardless of whether the Performance Schema is available, so that other server functions such as MySQL Enterprise Firewall have access to statement digests.
In the Performance Schema, statement digesting involves these components:
statement_digestconsumer in the
setup_consumerstable controls whether the Performance Schema maintains digest information.
DIGEST_TEXTis the text of the normalized statement digest.
DIGESTis the digest MD5 hash value.
The maximum space available for digest computation is 1024 bytes by default. This value can be changed at server startup by setting the
performance_schema_max_digest_lengthsystem variable. In MySQL 5.6.24 and 5.6.25, use
max_digest_lengthinstead. Before 5.6.24, the value cannot be changed.
The statement event tables also have a
SQL_TEXTcolumn that contains the original SQL statement. The maximum space available for statement display is 1024 bytes.
events_statements_summary_by_digesttable provides aggregated statement digest information.
Statement normalization transforms the statement text to a more standardized digest string representation that preserves the general statement structure while removing information not essential to the structure:
Object identifiers such as database and table names are preserved.
Literal values are converted to parameter markers. A normalized statement does not retain information such as names, passwords, dates, and so forth.
Comments are removed and whitespace is adjusted.
Consider these statements:
SELECT * FROM orders WHERE customer_id=10 AND quantity>20 SELECT * FROM orders WHERE customer_id = 20 AND quantity > 100
To normalize these statements, the Performance Schema replaces
data values by
? and adjusts whitespace. Both
statements yield the same normalized form and thus are considered
SELECT * FROM orders WHERE customer_id = ? AND quantity > ?
The normalized statement contains less information but is still representative of the original statement. Other similar statements that have different comparison values have the same normalized form.
Now consider these statements:
SELECT * FROM customers WHERE customer_id = 1000 SELECT * FROM orders WHERE customer_id = 1000
In this case, the statements are not “the same.” The object identifiers differ, so the statements yield different normalized forms:
SELECT * FROM customers WHERE customer_id = ? SELECT * FROM orders WHERE customer_id = ?
If normalization produces a statement that exceeds the space available in the digest buffer, the text ends with “...”. Long statements that differ only in the part that occurs following the “...” are considered to be the same. Consider these statements:
SELECT * FROM mytable WHERE cola = 10 AND colb = 20 SELECT * FROM mytable WHERE cola = 10 AND colc = 20
If the cutoff happens to be right after the
AND, both statements have this normalized form:
SELECT * FROM mytable WHERE cola = ? AND ...
In this case, the difference in the second column name is lost and both statements are considered the same.
For each normalized statement, the Performance Schema computes a
hash digest value and stores the statement and its MD5 hash value
DIGEST columns of the statement event tables
addition, statement digests are summarized in the
table, which aggregates information for statements that have the
values. The Performance Schema uses MD5 hash values because they
are fast to compute and have a favorable statistical distribution
that minimizes collisions.
The statement digest summary table provides a profile of the statements executed by the server. It shows what kinds of statements an application is executing and how often. An application developer can use this information together with other information in the table to assess the application's performance characteristics. For example, table columns that show wait times, lock times, or index use may highlight types of queries that are inefficient. This gives the developer insight into which parts of the application need attention.
summary table has a fixed size. When it becomes full, statements
DIGEST values not matching existing values in
the table are grouped in a special row with
NULL. This permits all statements to be
counted. However, if the special row accounts for a significant
percentage of the statements executed, it might be desirable to
increase the size of the summary table by setting the
system variable to a larger value at server startup. If no
value is given, the server estimates the value to use at startup.
(Before MySQL 5.6.9, there is no
column and the special row has
DIGEST set to
system variable determines the maximum number of bytes available
in the digest buffer for digest computation. However, the display
length of statement digests may be longer than the available
buffer size due to internal encoding of statement components such
as keywords and literal values. Consequently, values selected from
DIGEST_TEXT column of statement event
tables may appear to exceed the
For applications that generate very long statements that differ
only at the end, increasing
enables computation of digests that distinguish statements that
would otherwise aggregate to the same digest. Conversely,
causes the server to devote less memory to digest storage but
increases the likelihood of longer statements aggregating to the
same digest. Administrators should keep in mind that larger values
result in correspondingly increased memory requirements,
particularly for workloads that involve large numbers of
bytes are allocated per session).