WL#17272: MySQL Router Host Cache – configuration and management
Motivation
MySQL Router today provides controls for passthrough client connections, such as limiting concurrent connections (max_connections) and tracking connection errors per host (max_connect_errors). These mechanisms primarily protect against excessive concurrency and repeated failures, but they do not address the overhead caused by repeated hostname resolution during frequent connect/disconnect cycles when connections succeed and therefore are not constrained by error-based limits.
As a result, a large volume of incoming connection attempts (including abusive patterns, or frequent reconnect storms from a small set of hosts) can trigger excessive DNS lookups. This increases latency, adds load to DNS infrastructure, and can degrade Router performance.
Introducing an in-process DNS/host cache (with appropriate TTLs, negative caching for deterministic failures) will reduce repeated resolver work, improve connection-handling efficiency, and provide better observability into problematic client host behavior.
In contrast, the MySQL Server's Host Cache provides a way to control and analyze using "host-cache" (https://dev.mysql.com/doc/refman/8.4/en/host-cache.html#host-cache- configuration)
Goals
- Improve connection-handling efficiency: Reduce overhead from repeated DNS lookups and improve response to frequent connection attempts from the same hosts.
- Enhance monitoring and troubleshooting: Provide administrators with insight into routing activity patterns and possible connection issues.
- Ensure conservative behavior for transient DNS errors: timeouts/SERVFAIL-class errors should not be cached by default, enabling quick detection when DNS service returns to normal operation.
Scope
Host Cache Implementation:
- in-memory cache of host entries,
- each entry should include: host, IP address, connection attempt counter, last activity.
Configurable Parameters - Administrators should be able to configure via a new '[host_cache]' section in mysqlrouter.conf:
- Feature flag: enabled/disabled,
- Maximum cache size (number of entries),
- Algorithm: LRU.....,
- Expiration policy:
- separate Time-to-Live for successful and negative (deterministic) results,
- do not cache transient DNS errors (timeouts/SERVFAIL),
- jitter to avoid synchronized expiry bursts.
Note: A. These TTLs are Router-controlled cache TTLs and are independent of DNS record TTL values returned by DNS server (potential future improvement). B. "negative" results are intended to mean deterministic DNS outcomes such as NXDOMAIN / NODATA (NOERROR with no records). Transient resolver errors (e.g., timeout, SERVFAIL, REFUSED) should be re-queried by default.
Monitoring:
- expose through REST interface metrics like cache utilization, cache entries.
Consumers of DNS cache:
- 'routing' plugin,
- 'routing_guidelines' plugin.
Special consumer of DNS cache:
- destination_status plugin — cache refresh only (no cache reads):
- For quarantined hosts, the plugin MUST always bypass the host cache on the read path and perform a fresh DNS resolution for each check.
- If the fresh resolution succeeds, the plugin MUST update the host cache (refresh an existing entry or insert a new one) so subsequent consumers can benefit from the validated result.
- destination_status plugin — cache refresh only (no cache reads):
Will not use DNS cache (DNS lookup performed internally by mysqlclient / OS resolver):
- 'metadata_cache' plugin,
- 'mysql_rest_service' plugin,
- 'http' (authentication backend - metadata_cache).
Clarification: these components establish outbound MySQL connections via libmysqlclient, which performs name resolution internally. While Router could theoretically resolve hostnames itself and pass a numeric IP address to mysql_real_connect(), this can break TLS identity verification when VERIFY_IDENTITY is enabled. The server certificate might be issued for a hostname, and verification may fail if the client connects using an IP address that does not appear in the certificate. Therefore, these code paths rely on libmysqlclient’s hostname-based connection-name-resolution behavior rather than Router’s host cache.
The decision of which plugins were assigned to a section (consumer, special consumer, will no use cache) was driven by (a) whether they can realistically use Router’s host cache (see clarification on libmysqlclient/TLS constraints), and (b) the expected rate at which they trigger hostname resolution / DNS activity:
routing plugin: resolves MySQL Server destination hostnames as part of routing each new incoming client connection to Router. This can become a high-frequency DNS driver under reconnect storms and is the primary path the Router host cache is intended to optimize.
destination_status plugin: when a destination is quarantined, it is re-checked periodically; each check performs a fresh DNS resolution per quarantined destination per check interval (default ~1s) until the destination is validated again. This is typically a lower, steady-rate source of DNS queries compared to the routing path and, on success, the result is used to refresh/update the host cache for other consumers.
metadata_cache plugin: generally keeps outbound MySQL connections open and reuses them, so it usually opens fewer new connections (and therefore triggers fewer new resolutions) than routing. However, in ClusterSet scenarios it may establish/connect to each cluster on its own TTL cadence (default ~0.5s). Hostname resolution in this plugin is performed internally by libmysqlclient rather than Router’s host cache.
References
Bug#38813214 Reduce Excessive DNS Queries by Caching Resolved Hostnames in MySQL Router
Functional Requirements
- FR-1: Without any user changes, the DNS cache SHALL be enabled and operate with a default configuration (see HLS/default values for FR-2).
- FR-2: The user SHALL be able to change the DNS cache configuration by editing the MySQL Router configuration file under the [host_cache] section:
- FR-2.1: The user SHALL be able to disable/enable the DNS cache by changing the "enabled" option:
- FR-2.1.1: The user SHALL be able to disable the DNS cache by setting "enabled" to 0 or false.
- FR-2.1.2: The user SHALL be able to enable the DNS cache by setting "enabled" to 1 or true.
- FR-2.1.3: Any other value for enabled SHALL cause Router startup to fail with a clear configuration error.
- FR-2.1.4 (default): If the "enabled" option is not specified, Router SHALL default to "enabled=true".
- FR-2.2: The user SHALL be able to change the DNS cache size (maximum number of entries stored in the
cache) by setting the "max_entries" option:
- FR-2.2.1: The user SHALL be able to set "max_entries" to an integer value between 1 and 10,000 (inclusive).
- FR-2.2.2: Any other value for "max_entries" SHALL cause Router startup to fail with a clear configuration error.
- FR-2.2.3 (default): If the "max_entries" option is not specified, Router SHALL default to "max_entries=250".
- FR-2.3: The user SHALL be able to change how long a successful DNS resolution result is cached by setting the ttl_success_seconds option:
- FR-2.3.1: The user SHALL be able to set ttl_success_seconds to an integer value between 1 and 86400 seconds (inclusive).
- FR-2.3.2: Any other value for ttl_success_seconds SHALL cause Router startup to fail with a clear configuration error.
- FR-2.3.3 (definition): ttl_success_seconds SHALL apply to cached entries created from DNS lookups that return at least one usable address (e.g., A/AAAA result).
- FR-2.3.4 (default): If the "ttl_success_seconds" option is not specified, Router SHALL default to "ttl_success_seconds=60".
- FR-2.4: The user SHALL be able to change how long a deterministic negative DNS result is cached by setting the ttl_negative_seconds option:
- FR-2.4.1: The user SHALL be able to set ttl_negative_seconds to an integer value between 1 and 86400 seconds (inclusive).
- FR-2.4.2: Any other value for ttl_negative_seconds SHALL cause Router startup to fail with a clear configuration error.
- FR-2.4.3 (definition): ttl_negative_seconds SHALL apply only to deterministic negative DNS outcomes (e.g., NXDOMAIN and optionally NODATA/NOERROR with no records) and SHALL NOT apply to transient resolver errors (e.g., timeout, SERVFAIL, REFUSED).
- FR-2.4.4 (default): If the "ttl_negative_seconds" option is not specified, Router SHALL default to "ttl_negative_seconds=10".
- FR-2.5: The user SHALL be able to configure TTL jitter for cached entries by setting the ttl_jitter_ratio option:
- FR-2.5.1: The user SHALL be able to set ttl_jitter_ratio to a numeric value between 0.0 and 0.5 (inclusive).
- FR-2.5.2: Setting ttl_jitter_ratio to 0.0 SHALL disable jitter.
- FR-2.5.3: Any other value for ttl_jitter_ratio SHALL cause Router startup to fail with a clear configuration error.
- FR-2.5.4: When enabled, jitter SHALL be applied as a random adjustment to the effective TTL for each cache entry using the formula: effective_ttl = base_ttl * (1 ± ttl_jitter_ratio) where base_ttl is the configured TTL (ttl_success_seconds or ttl_negative_seconds).
- FR-2.5.5 (default): If the "ttl_jitter_ratio" option is not specified, Router SHALL default to "ttl_jitter_ratio=0.2".
- FR-2.1: The user SHALL be able to disable/enable the DNS cache by changing the "enabled" option:
- FR-3: The DNS cache SHALL manage entries using a Least Recently Used (LRU) policy when the cache reaches max_entries.
- FR-3.1: When inserting a new entry into a full cache, the Router SHALL dispose the entry that has not been accessed for the longest time.
- FR-3.2: “Access” SHALL include both cache hits (read) and updates (write/refresh).
- FR-3.3: The cache SHALL treat entries with expired TTL as invalid and SHALL NOT return them.
- FR-3.4: When inserting into a full cache, the Router SHALL purge any expired entries first, only if the cache is still full SHALL it dispose an entry according to LRU.
- FR-4: Host resolution/cache
- FR-4.1: When resolving a hostname, if the cache contains a non-expired entry for that hostname, the Router SHALL return the cached resolution result without performing a new DNS lookup. /Cache hit/
- FR-4.2: When resolving a hostname, if the cache does not contain a valid (non-expired) entry, the Router SHALL perform a DNS lookup and process the result as defined below: /Cache miss/
- FR-4.2.1: Before performing the DNS lookup, the Router SHALL create an in-flight (temporary) cache record for the hostname to represent a resolution in progress.
- FR-4.2.2: After the DNS lookup completes, the Router SHALL process the result as follows:
- FR-4.2.2.1: A successful resolution that returns one or more usable IP addresses (e.g., A and/or AAAA records) SHALL result in inserting a final cache entry (replacing the in-flight record) with an effective TTL derived from ttl_success_seconds with jitter applied. The cached entry for the hostname SHALL include the complete set of usable addresses returned by the resolver.
- FR-4.2.2.2: A deterministic negative result (e.g., NXDOMAIN/NODATA) SHALL result in inserting a final cache entry (replacing the in-flight record) with an effective TTL derived from ttl_negative_seconds with jitter applied.
- FR-4.2.2.3: Transient resolver errors (e.g., timeout/SERVFAIL) SHALL NOT be cached.
- FR-4.3: If multiple concurrent requests attempt to resolve the same hostname while an in-flight record exists, the Router SHALL synchronize those requests on the in-flight record such that only one DNS lookup is performed:
- FR-4.3.1: All concurrent requests SHALL wait for the in-flight resolution to complete and then SHALL use the resulting final cache entry (or error result).
- FR-5: Observability - The Router SHALL expose a REST endpoint providing observability for the DNS cache.
- FR-5.1: The response SHALL include cache counters:
- FR-5.1.1: hits (number of cache hits)
- FR-5.1.2: misses (number of cache misses)
- FR-5.1.3: inserts (total number of inserted final entries)
- FR-5.1.4: evictions (number of entries evicted due to max_entries)
- FR-5.1.5: expired_purges (number of entries removed due to TTL expiry)
- FR-5.2: The response SHALL include cache capacity and current utilization:
- FR-5.2.1: enabled state
- FR-5.2.2: max_entries
- FR-5.2.3: current number of entries
- FR-5.2.4: current number of temporary entries
- FR-5.3: The response SHALL include current cached hostnames and their remaining TTLs:
- FR-5.3.1: The response SHALL NOT expose cached/resolved IP addresses.
- FR-5.3.2: The remaining TTL in seconds.
- FR-5.4: The response SHALL include in-flight (temporary) entries:
- FR-5.4.1: list hostnames currently being resolved
- FR-5.4.2: number of waiting requests per hostname
- FR-5.4.3: in-flight age (time since resolution started)
- FR-5.1: The response SHALL include cache counters:
Overview
MySQL Router includes an in-process host cache that stores hostname resolution outcomes for a limited time and reuses them on subsequent requests. The cache is enabled by default and runs with default settings unless overridden in configuration.
The cache is designed for:
- fast reuse of recent hostname resolutions,
- predictable behavior under concurrency (avoiding duplicate DNS lookups),
- operational visibility via a REST endpoint.
The functionality is implemented in MySQL Router as the "host_cache" plugin, which by default reads its configuration from the 'mysqlrouter.conf' file section '[host_cache]'.
Configuration
The 'host_cache' plugin reads the following parameters:
- 'enabled' — enables/disables the host cache.
- 'max_entries' — sets the maximum number of entries stored in the LRU cache.
- 'ttl_success_seconds' — defines how long successful DNS resolution results are cached (i.e., lookups that return at least one usable address).
- 'ttl_negative_seconds' — defines how long deterministic negative DNS results are cached (e.g., NXDOMAIN and optionally NODATA/NOERROR with no records). This does not apply to transient resolver errors.
- 'ttl_jitter_ratio' — configures TTL jitter to avoid synchronized expiry bursts by randomly adjusting the effective TTL per entry.
Here is a summary of types, ranges, and defaults:
| Parameter | Type/Unit | Allowed values | Default value |
|---|---|---|---|
| enabled | Boolean/- | 0,1,true,false | true |
| max_entries | Integer/- | [1 .. 10000] | 250 |
| ttl_success_seconds | Integer/seconds | [1..86400] | 60 |
| ttl_negative_seconds | Integer/seconds | [1..86400] | 10 |
| ttl_jitter_ratio | Float/ratio | [0.0 .. 0.5] | 0.2 |
If any parameter value is outside its allowed range, or cannot be parsed according to the expected type, MySQL Router startup fails with an appropriate error message.
If the configuration file does not contain the '[host_cache]' section, the cache is still configured and enabled, which is equivalent to the following defaults:
[host_cache]
enabled=true
max_entries=250
ttl_success_seconds=60
ttl_negative_seconds=10
ttl_jitter_ratio=0.2
Impacted code
Most DNS lookups are initiated by the routing plugin. A lookup occurs when a client connects to Router and Router attempts to forward traffic to one of the destination addresses, provided either statically or via the metadata_cache plugin. In both cases, the routing plugin may resolve the same destination hostname frequently; therefore, the host resolution logic is integrated with the routing plugin and the destination_status plugin.
The destination_status plugin periodically checks quarantined MySQL Server destination hostnames to determine whether they have become available again. Because quarantine may be triggered by DNS-related failures, this plugin must always bypass the host cache for the read path and perform a fresh DNS resolution for each check (i.e., it must not use cached results when verifying availability). If the fresh DNS resolution succeeds, the resulting outcome must be used to update the host cache (refresh an existing entry or insert a new one), so that subsequent consumers benefit from the validated resolution result.
Host Resolution Flow
The algorithm uses two containers for DNS entries:
- a container for temporary items (hostname resolution in progress),
- an LRU cache for final (resolved) entries.
Resolution flow:
- A lookup request first consults the temporary (in-flight) container. If an entry is present, the request waits until the resolution completes and then returns the result to the caller.
- Otherwise, the LRU cache is checked for a valid (non-expired) entry. If such an entry exists (cache hit), it is returned.
- If neither the temporary container nor the LRU cache contains the requested hostname, Router creates a temporary entry, inserts it into the temporary container, and performs the resolution in the requesting thread.
- After the DNS query completes, all threads waiting on the temporary entry are notified.
- The temporary entry is replaced by a final entry, which is inserted into the LRU cache (cache write).
This algorithm ensures single-flight behavior (per-host synchronization) for DNS handling.
Additionally, depending on the DNS outcome, Router takes the following actions:
Successful resolution inserts an LRU cache entry containing all resolved addresses for the hostname, with the TTL calculated as:
entry_ttl = (1 + (2*rand() - 1) * jitter) * ttl_success_seconds
Resolution with an empty address list or a deterministic negative outcome (NXDOMAIN, NODATA, NOERROR with no records) inserts an LRU cache entry with TTL:
entry_ttl = (1 + (2*rand() - 1) * jitter) * ttl_negative_seconds
Transient resolver errors are not cached (e.g., timeout, SERVFAIL, REFUSED).
LRU cache
The cache size is limited by the configuration variable 'max_entries'. The cache uses Least Recently Used (LRU) eviction once it reaches 'max_entries'. When inserting into a full cache, Router evicts the entry that has gone the longest without being accessed.
“Access” includes both reads and updates: cache hit, cache write, and cache refresh.
When inserting into a full cache, Router purges expired entries first; only if the cache is still full does it evict via LRU.
Entries whose TTL has reached 0 are considered expired (invalid) and are purged first when space is required in the LRU cache.
Observability
Observability is implemented jointly by:
- the 'host_cache' plugin, responsible for collecting counters and state, and
- the 'rest_host_cache' plugin, which follows the pattern of other REST plugins and exposes data as a JSON document via an HTTP endpoint.
The Router REST interface already defines mechanisms such as realms/authentication; therefore, this plugin must respect existing REST security configuration.
Counters and data collected by 'host_cache':
- effectiveness counters: hits, misses, inserts, evictions, expired_purges,
- capacity and utilization: enabled, max_entries, number of cache entries, number of temporary entries,
- temporary entries: list of hostnames with age and number of waiting consumers,
- cached entries: list of hostnames with remaining TTL (do not expose IP addresses).
The following endpoints are defined:
I. GET /host_cache/config — effective cache configuration
Returns the effective runtime configuration of the host cache (after applying defaults). Fields use camelCase.
Response fields:
enabled(boolean): whether the host cache is enabled.maxEntries(number): maximum number of entries allowed in the LRU cache.ttlSuccessSeconds(number): base TTL applied to successful DNS resolutions (lookups returning at least one usable address).ttlNegativeSeconds(number): base TTL applied to deterministic negative outcomes (e.g., NXDOMAIN/NODATA/NOERROR-with-no-records).jitterRatio(number): TTL jitter ratio in range[0.0 .. 0.5]. When non-zero, Router randomizes each entry TTL around the base TTL to avoid synchronized expiry bursts (effective TTL ≈ base TTL × (1 ± jitterRatio)).
Example:
{
"enabled": true,
"maxEntries": 250,
"ttlSuccessSeconds": 60,
"ttlNegativeSeconds": 10,
"jitterRatio": 0.2
}
II. GET /host_cache/status — usage and current cache state
Returns counters and high-level state about cache effectiveness and utilization.
Response fields:
numberOfEntries(number): current number of entries stored in the LRU cache.numberOfTemporaryEntries(number): current number of in-flight (temporary) entries (hostnames being resolved).cache(object): effectiveness + maintenance counters:hits(number): number of cache hits (lookup served from non-expired LRU entry).misses(number): number of cache misses (lookup required DNS resolution).inserts(number): number of inserted final entries.evictions(number): number of LRU evictions due to reachingmaxEntries.expiredPurges(number): number of entries purged because they expired.
Example:
{
"numberOfEntries": 12,
"numberOfTemporaryEntries": 1,
"cache": {
"hits": 100,
"misses": 25,
"inserts": 20,
"evictions": 3,
"expiredPurges": 5
}
}
III. GET /host_cache/entries — LRU entries and in-flight resolutions
Returns the current content of:
- the LRU cache (final entries), and
- the in-flight container (temporary entries).
To keep the payload compact and stable, both entries and inProgress are represented as JSON objects keyed by hostname (not arrays). This also makes hostname lookup trivial for operators/tools.
Response fields:
entries(object): map ofhostname -> entryInfosecondsRemainingTtl(number): remaining TTL in seconds for the cached entry.cacheHits(number): number of times this entry was returned as a cache hit (if you plan to track per-entry hits; if not implemented, omit the field rather than returning dummy values).ttl(number): initial effective TTL (in seconds) assigned to this cache entry at insertion time (i.e., after applying ttlSuccessSeconds/ttlNegativeSeconds and jitter).singleFlight(number): number of concurrent consumers that waited on the same in-flight resolution that produced this entry (i.e., the waiter count observed for the single-flight DNS lookup at the time it completed).
inProgress(object): map ofhostname -> progressInfoageMilliseconds(number): time since the resolution started.consumers(number): number of concurrent waiters for the in-flight resolution.
Important: This endpoint MUST NOT expose resolved IP addresses.
Example:
{
"entries": {
"myhostname1": {
"secondsRemainingTtl": 10,
"cacheHits": 1
}
},
"inProgress": {
"myhostname2": { "ageMilliseconds": 1200, "consumers": 1 },
"myhostname3": { "ageMilliseconds": 200, "consumers": 3 }
}
}