WL#5259: PERFORMANCE_SCHEMA HOST_CACHE
Affects: Server-Prototype Only
—
Status: Complete
The server has an internal table containing information about the host cache, including 'host name' and 'ip address' and 'error count'. Expose this name via a new table in the performance_schema. The benefits are: * Users will know when the 'error count' approaches the value in the system variable 'max_connect_errors' * Users will know how many hosts are cached. According to support people, there are several support issues which this would solve.
Background information ====================== Currently in mysql 5.6.4 (and previous releases), the server behavior is as follows. host cache: - there is an internal host cache maintained in memory. - this host cache has a fixed size (#define HOST_CACHE_SIZE 128) - each entry consist of an ip address, a host name, and an error counter. - the cache hash table, that keeps up to HOST_CACHE_SIZE records. - the oldest entry is discarded to make room for new ones, if needed. connection checks: - when a client connection is accepted at the socket level, various network related checks are performed, before going through security for authentication and authorization. The operations involved are: - get the IP address of the client connecting to the server - get the hostname for that IP - check the hostname format for special cases - get all the IP addresses for that hostname - verify that the IP address from the client actually belong to the hostname, so that resolving IP <--> HOSTNAME is consistent in DNS. The hostname found as part of this resolution is: - cached in the host cache to avoid future DNS lookups - used for authorization Authentication and authorization can later fail due to various reasons, some related to the user (bad username / password), some related to the configuration of the client and/or server (usage of plugins, protocol handshake errors) Current limitations and problems ================================ Assume a client application attempts to connect to a server and fails. When the failure is in the network itself (no route to host, connection refused, etc), a system admin has all the network related tools to diagnose the problem and fix it (ping, telnet, trace route, DNS, netstat, ...). When the network delivered the client connection all the way to the mysqld server, on the listening port, and when the failure is in the logic internal to the mysqld process itself, the only visible sign that a connection was refused is an increment on the server "Aborted_connects" status counter. There is no way to tell why the connect fails, where the connection came from, etc, and whether this is expected or not. In practice, this makes administration very difficult: given that there are many checks implemented that could cause a connect to be rejected, and that there are many configuration settings that can have an influence, trouble shooting why a given connection fails is not easy. Design goals and proposed improvements ====================================== The goal of this feature is to expose how the internal host_cache function, so that failures can be clearly identified, and corrected. Every time a connection is accepted at the socket level (on the listening port) and then rejected by the server, the DBA should be able to see: - the exact reason for the failure - the identity of the client affected, if known - timestamp information, to help understanding the system status. Specification ============= Connection_errors_select status variable ---------------------------------------- This status variable counts the number of errors found when calling select / poll on the listening port. A failure to select / poll may not necessarily means a client connection was rejected, but it is nonetheless a server error condition that should be accounted for, as part of the general server connection process. This variable is global. This variable can be flushed. Connection_errors_accept status variable ---------------------------------------- This status variable counts the number of errors found when calling accept on the listening port. This variable is global. This variable can be flushed. Connection_errors_tcpwrap status variable ----------------------------------------- This status variable counts the number of connections refused by libwrap. This variable is global. This variable can be flushed. Connection_errors_internal status variable ------------------------------------------ This status variable counts the number of connections refused due to internal errors in the server. Internal errors can be: - failure to start a new thread - out of memory conditions This variable is global. This variable can be flushed. Connection_errors_max_connections status variable ------------------------------------------------ This status variable counts the number of connections refused due to the server max_connections limit. This variable is global. This variable can be flushed. Connection_errors_peer_addr status variable ------------------------------------------- This status variable counts the number of errors found when searching for the connecting client IP. This variable is global. This variable can be flushed. host_cache_size system variable ------------------------------- This variable controls the size of the internal host cache. It is global. It can be changed dynamically. Changing the variable size causes implicitly a FLUSH HOSTS. Zero is a valid value, changing the host cache size to 0 means to disable the host cache. TABLE performance_schema.host_cache ----------------------------------- Columns: `IP` varchar(64) NOT NULL, `HOST` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL, `HOST_VALIDATED` enum('YES','NO') NOT NULL, `SUM_CONNECT_ERRORS` bigint(20) NOT NULL, `COUNT_HOST_BLOCKED_ERRORS` bigint(20) NOT NULL, `COUNT_NAMEINFO_TRANSIENT_ERRORS` bigint(20) NOT NULL, `COUNT_NAMEINFO_PERMANENT_ERRORS` bigint(20) NOT NULL, `COUNT_FORMAT_ERRORS` bigint(20) NOT NULL, `COUNT_ADDRINFO_TRANSIENT_ERRORS` bigint(20) NOT NULL, `COUNT_ADDRINFO_PERMANENT_ERRORS` bigint(20) NOT NULL, `COUNT_FCRDNS_ERRORS` bigint(20) NOT NULL, `COUNT_HOST_ACL_ERRORS` bigint(20) NOT NULL, `COUNT_NO_AUTH_PLUGIN_ERRORS` bigint(20) NOT NULL, `COUNT_AUTH_PLUGIN_ERRORS` bigint(20) NOT NULL, `COUNT_HANDSHAKE_ERRORS` bigint(20) NOT NULL, `COUNT_PROXY_USER_ERRORS` bigint(20) NOT NULL, `COUNT_PROXY_USER_ACL_ERRORS` bigint(20) NOT NULL, `COUNT_AUTHENTICATION_ERRORS` bigint(20) NOT NULL, `COUNT_SSL_ERRORS` bigint(20) NOT NULL, `COUNT_MAX_USER_CONNECTIONS_ERRORS` bigint(20) NOT NULL, `COUNT_MAX_USER_CONNECTIONS_PER_HOUR_ERRORS` bigint(20) NOT NULL, `COUNT_DEFAULT_DATABASE_ERRORS` bigint(20) NOT NULL, `COUNT_INIT_CONNECT_ERRORS` bigint(20) NOT NULL, `COUNT_LOCAL_ERRORS` bigint(20) NOT NULL, `COUNT_UNKNOWN_ERRORS` bigint(20) NOT NULL, `FIRST_SEEN` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `LAST_SEEN` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `FIRST_ERROR_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00', `LAST_ERROR_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00' Operations: SELECT is supported. TRUNCATE is supported, and is used to flush the host cache. Normal privilege checks do apply: a user must be granted the proper table privileges to perform an operation. Note that the existing way of flushing the table cache: FLUSH HOSTS still works, and now has a new alternative syntax: TRUNCATE TABLE performance_schema.host_cache. Both statements have the same effect, but use a different privilege. Semantic for each column: IP: Shows the IP address of the client that connected to the server. This address is expressed as a string. Note: IPv4 requires a length of 15, IPv6 a length of 46 bytes. This column is exposed as a varchar(64) in utf8. HOST: Shows the resolved DNS hostname for that client IP, if any. Note that hostname length is 255, see RFC 2181. HOST_VALIDATED: Indicates if the IP --> HOST --> IP DNS resolution was performed. When HOST_VALIDATED is still NO, the DNS resolution is attempted again for each connect, until it eventually completes, either with a valid result, or with a permanent error. The reason for this column is to avoid caching bad / missing hostnames during temporary DNS failures, which would affect clients forever. When HOST_VALIDATED is YES, the HOST column is used as host name, avoiding calls to DNS. SUM_CONNECT_ERRORS: Number of connect errors that are deemed 'blocking', when using the max_connect_errors system variable. Currently, only protocol handshake errors are counted, and only for hosts that passed validation (HOST_VALIDATED). This is an existing MySQL feature. COUNT_HOST_BLOCKED_ERRORS: Number of connections that were blocked, because SUM_CONNECT_ERRORS exceeded max_connect_errors. This is an existing MySQL feature. COUNT_NAMEINFO_TRANSIENT_ERRORS: Number of transient errors in DNS, during the IP --> HOSTNAME resolution. This is an existing MySQL feature. COUNT_NAMEINFO_PERMANENT_ERRORS: Number of permanent errors in DNS, during the IP --> HOSTNAME resolution. This is an existing MySQL feature. COUNT_FORMAT_ERRORS: Number of hostname format errors. Hostnames can not start with a number followed by a ".", This is an existing MySQL feature. COUNT_ADDRINFO_TRANSIENT_ERRORS: Number of transient errors in DNS, during the HOSTNAME --> IP reverse resolution. This is an existing MySQL feature. COUNT_ADDRINFO_PERMANENT_ERRORS: Number of permanent errors in DNS, during the HOSTNAME --> IP reverse resolution. This is an existing MySQL feature. COUNT_FCRDNS_ERRORS: Number of Forward-confirmed reverse DNS errors. These errors occur when the IP --> HOSTNAME --> IP resolution does not match the client originating IP. This is an existing MySQL feature. COUNT_HOST_ACL_ERRORS: Number of errors caused when no user from the client host can possibly log in: the server returns ER_HOST_NOT_PRIVILEGED and does not even ask for a user name / password. This is an existing MySQL feature. COUNT_NO_AUTH_PLUGIN_ERRORS: Number of errors seen when an auth plugin is requested, but is not loaded in the server. This is an existing MySQL feature. COUNT_AUTH_PLUGIN_ERRORS: Number of errors reported by the auth plugin itself. This is an existing MySQL feature. Note: authentication plugin can report different error codes to indicate the root cause of a failure. Based on the plugin return code when an error occurs, one of the following columns: - COUNT_HANDSHAKE_ERRORS - COUNT_AUTH_PLUGIN_ERRORS - COUNT_AUTHENTICATION_ERRORS is incremented in the host cache. New return codes are an optional extension to the existing plugin API. Unknown or unexpected plugin errors are counted as COUNT_AUTH_PLUGIN_ERRORS. COUNT_HANDSHAKE_ERRORS: Number or errors detected at the wire protocol level. This is an existing MySQL feature. COUNT_PROXY_USER_ERRORS: Number of errors detected when using proxy users for authentication, and when the proxy user does not exist. This is an existing MySQL feature. COUNT_PROXY_USER_ACL_ERRORS: Number of errors detected when using proxy users for authentication, and when the PROXY privilege is not granted. This is an existing MySQL feature. COUNT_AUTHENTICATION_ERRORS: Number of errors caused by failed authentication. This is an existing MySQL feature. COUNT_SSL_ERRORS: Number of errors caused by SSL. This is an existing MySQL feature. COUNT_MAX_USER_CONNECTIONS_ERRORS: Number of errors caused by exceeding per user connections quotas. This is an existing MySQL feature. COUNT_MAX_USER_CONNECTIONS_PER_HOUR_ERRORS: Number of errors caused by exceeding per user connections per hour quotas. This is an existing MySQL feature. COUNT_DEFAULT_DATABASE_ERRORS: Number of errors related to the default database. This is an existing MySQL feature. COUNT_INIT_CONNECT_ERRORS: Number of errors caused by failures in the init-connect statement execution. This is an existing MySQL feature. COUNT_LOCAL_ERRORS: Number of errors local to the server implementation and not related to the network, authentication or authorization. For example, out of memory conditions fit into this category. COUNT_UNKNOWN_ERRORS: Number of other, unknown errors. At time of writing, all errors are known and accounted for in the previous columns in this table. This column is reserved for future use, in case new error conditions need to be reported, and if preserving the backward compatibility and table structure of the host_cache table is required. FIRST_SEEN: Timestamp of the first connect seen from this client IP. LAST_SEEN: Timestamp of the last connect seen from this client IP. FIRST_ERROR_SEEN: Timestamp of the first error seen from this client IP. LAST_ERROR_SEEN: Timestamp of the last error seen from this client IP. Requirements ============ (1) Table performance_schema.host_cache --------------------------------------- Func-Req (1.1): A fresh MySQL installation of CURRENT-VERSION must create the table performance_schema.host_cache. (2) Server options and variables --------------------------------- Func-Req (2.1): The status variable Connection_errors_select is defined. Func-Req (2.2): The status variable Connection_errors_accept is defined. Func-Req (2.3): The status variable Connection_errors_tcpwrap is defined. Func-Req (2.4): The status variable Connection_errors_internal is defined. Func-Req (2.5): The status variable Connection_errors_max_connections is defined. Func-Req (2.6): The status variable Connection_errors_peer_addr is defined. Func-Req (2.7): The system variable host_cache_size is defined. (3) Functional behavior ----------------------- Func-Req (3.1): When a TCP/IP client connection reaches the server, a corresponding row for this client IP is inserted in the host_cache. The HOST column is set to NULL, the VALIDATED column to NO. This happens regardless of whether the connection is later accepted or rejected. Func-Req (3.2): When subsequent TCP/IP client connection reaches the server from the same originating IP, the corresponding row for this client IP is updated in the host_cache. This happens regardless of whether the connection is later accepted or rejected. Func-Req (3.3): Upon successful IP --> HOSTNAME resolution, the corresponding row for this IP is updated in the host_cache: the HOST column is updated with the resolved host name, the VALIDATED column is set to YES. Func-Req (3.4): Upon permanent failures during IP --> HOSTNAME resolution, the corresponding row for this IP is updated in the host_cache: the HOST column is NULL, the VALIDATED column is set to YES. Func-Req (3.5): Upon transient failures during IP --> HOSTNAME resolution, the HOST and VALIDATED columns are unchanged in the host_cache. Func-Req (3.6): When an error XXX occurs when processing an incoming client connection from a given IP address, the corresponding COUNT_XXX_ERRORS statistics columns for the given IP are updated in the host_cache. See each column description for the kind of errors accounted for. Func-Req (3.7): Setting a value to the host_cache_size server variable implicitly performs a FLUSH HOSTS. Func-Req (3.8): TRUNCATE TABLE performance_schema.host_cache implicitly performs a FLUSH HOSTS. References ========== BUG#22821 Adding "SHOW HOST_CACHE" patch from Jeremy Cole BUG 11746048 - 22821: ADDING "SHOW HOST_CACHE" PATCH FROM JEREMY COLE BUG#24906 No command to monitor blocked hosts BUG 11746269 - 24906: NO COMMAND TO MONITOR BLOCKED HOSTS BUG#45817 Please add SHOW command for inc_host_errors(max_connect_errors) BUG 11754244 - 45817: PLEASE ADD SHOW COMMAND FOR INC_HOST_ERRORS(MAX_CONNECT_ERRORS) BUG#59404 More info from host cache BUG 11766316 - 59404: MORE INFO FROM HOST CACHE dev-private thread "BUG#45817" starting with [ mysql intranet ] /secure/mailarchive/mail.php?folder=4&mail=32668 Some observations about CONNECT errors on a Facebook page http://www.facebook.com/notes/mysqlfacebook/monitoring-for-errors/293204560932 More examples of complaints, where host_cache could help: http://stackoverflow.com/questions/9369981/mysql-completely-random-unable-to-connect-to-any-of-the-specified-mysql-hosts Impact on documentation ======================= Notes for the doc team: beside the new table, new server variables etc, the documentation of return codes for auth plugins **MAY** need to be revised also.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.