WL#10719: Invalidate Metadata Cache based on Group Replication Notification
The Router maintains its view of the world in the metadata cache.
- fetches the topology information from the metadata server(s)
- polls the health state of the servers on an interval
- invalidates the cache based on connection errors
After an error the health checks has to guess/narrow down the reason why the connection failed:
- server is dead
- network failed
- idle connection timeout
and decide if the backend needs to be taken out of the pool or not.
The xplugin can expose different kinds of information that make the failover behaviour more reliable and stable:
- notification on idle connection close
- notification on group replication view change
As the notifications are a feature that is only available through the xprotocol the router has to connect to the backends via the xprotocol and enable the receiption of those notifications.
To support cases where the xplugin isn't activate, the current health check via the classic protocol should continue to work.
On reception of a GR view change notification from the xplugin the metadata cache should invalid its cache for that cluster and trigger a refresh of the group status
With this feature in place the Router (hence the user application) will get notified about most of the cluster changes asynchronously, right after they happened. Currently we encourage setting low ttl for metadata refresh (current default is 0.5s). That causes some overhead of reconnecting to the metadata servers and querying them quite often. With the GR notification feature, the ttl can be set to higher value and treated as an additional safeguard, not as a primary means of keeping the information about the cluster state.
- The "metadata refresh triggered by the GR notification" feature MUST be optional and disabled by the default.
To enable this feature while bootstrapping, the parameter
--conf-use-gr-notificationsMUST be used in the bootstrap command line.
To enable the feature for the existing configuration,
use_gr_notifications=1MUST be added to
[metadata-cache]section of the Router's configuration file
use_gr_notificationoption MUST only allow be 0 and 1. 0 MUST be treated as "disabled", 1 as "enabled". Any value other than 0 and 1 for this option MUST result in an configuration error.
If the parameter
--conf-use-gr-notificationsis used while bootstrapping, the TTL for the metadata refresh SHOULD use a higher value:
ttl=60.0. It MUST stay on its current value of 0.5 seconds when the GR notifications feature is not enabled.
When the GR notifications feature is enabled the Router MUST refresh the
metadata on each of the four notifications that Group Replication sends: (
- When the GR notifications feature is enabled, the Router should register and wait for the notifications from all the cluster nodes that got reported during the last metadata refresh and have valid port for XProtocol connection in the metadata.
- Failure to connect to any of the XProtocol ports for the notification or to read from that port should not be fatal. The router should operate without this feature (the TTL that is configured should be used as a fallback metadata refresh strategy in that case).
When the GR notification is received and processed (metadata is read back
from the cluster) the current remaining TTL MUST be restarted.
For example, if
ttl=60.0in the configuration and 50 seconds have passed since the last metadata refresh, if the GR notification has been received the next refresh triggered by the
ttlshould be in 60 seconds not in 10 seconds.
- The new GR notifications feature should only be used as an additional trigger for metadata refresh. The refresh itself will be done using the classic protocol connection, this wl does not change that.
- When the GR notifications feature is enabled, each connection loss discovered on the permanent connections that the Router keeps on XProtocol port for that feature, MUST trigger a metadata refresh.
For each notification received the Router MUST check if the
view_idvalue that comes with the notification is different than the
view_idof the last handled notification. The metadata trigger action should be taken only if it is different. This way the router should debounce multiple notifications that are received as a result of a single change.
- For each x-protocol connection the Router makes to listen for GR notifications it MUST use the same ssl_mode and other ssl parameters that is configured in the [metadata_cache] section and was used for classic metadata connection so far.
By default the new GR notifications feature is disabled.
To enable it a new
--conf-use-gr-notifications parameter should be used while bootstrapping. For example:
mysqlrouter -B 127.0.0.1:5000 --directory=test --conf-use-gr-notifications
That will add the
use_gr_notifications parameter to the configuration file. For example.
One can also add it manually in the
metadata_cache section in the existing configuration file to enable this feature.