WL#10719: Invalidate Metadata Cache based on Group Replication Notification

Affects: Server-8.0 — Status: Complete

Motivation

The Router maintains its view of the world in the metadata cache.

The cache

fetches the topology information from the metadata server(s)
polls the health state of the servers on an interval
invalidates the cache based on connection errors

After an error the health checks has to guess/narrow down the reason why the connection failed:

server is dead
network failed
idle connection timeout

and decide if the backend needs to be taken out of the pool or not.

Background: xplugin

The xplugin can expose different kinds of information that make the failover behaviour more reliable and stable:

notification on idle connection close
notification on group replication view change

Expected Behaviour

As the notifications are a feature that is only available through the xprotocol the router has to connect to the backends via the xprotocol and enable the receiption of those notifications.

To support cases where the xplugin isn't activate, the current health check via the classic protocol should continue to work.

On reception of a GR view change notification from the xplugin the metadata cache should invalid its cache for that cluster and trigger a refresh of the group status

Benefits

With this feature in place the Router (hence the user application) will get notified about most of the cluster changes asynchronously, right after they happened. Currently we encourage setting low ttl for metadata refresh (current default is 0.5s). That causes some overhead of reconnecting to the metadata servers and querying them quite often. With the GR notification feature, the ttl can be set to higher value and treated as an additional safeguard, not as a primary means of keeping the information about the cluster state.

Functional Requirements

FR1: The "metadata refresh triggered by the GR notification" feature MUST be optional and disabled by the default.
FR2: To enable this feature while bootstrapping, the parameter --conf-use-gr-notifications MUST be used in the bootstrap command line.
FR3: To enable the feature for the existing configuration, use_gr_notifications=1 MUST be added to [metadata-cache] section of the Router's configuration file
FR4: The use_gr_notification option MUST only allow be 0 and 1. 0 MUST be treated as "disabled", 1 as "enabled". Any value other than 0 and 1 for this option MUST result in an configuration error.
FR5: If the parameter --conf-use-gr-notifications is used while bootstrapping, the TTL for the metadata refresh SHOULD use a higher value: ttl=60.0. It MUST stay on its current value of 0.5 seconds when the GR notifications feature is not enabled.
FR6: When the GR notifications feature is enabled the Router MUST refresh the metadata on each of the four notifications that Group Replication sends: (group_replication/membership/quorum_loss, group_replication/membership/view, group_replication/status/role_change, group_replication/status/state_change)
FR7: When the GR notifications feature is enabled, the Router should register and wait for the notifications from all the cluster nodes that got reported during the last metadata refresh and have valid port for XProtocol connection in the metadata.
FR8: Failure to connect to any of the XProtocol ports for the notification or to read from that port should not be fatal. The router should operate without this feature (the TTL that is configured should be used as a fallback metadata refresh strategy in that case).
FR9: When the GR notification is received and processed (metadata is read back from the cluster) the current remaining TTL MUST be restarted. For example, if ttl=60.0 in the configuration and 50 seconds have passed since the last metadata refresh, if the GR notification has been received the next refresh triggered by the ttl should be in 60 seconds not in 10 seconds.
FR10: The new GR notifications feature should only be used as an additional trigger for metadata refresh. The refresh itself will be done using the classic protocol connection, this wl does not change that.
FR11: When the GR notifications feature is enabled, each connection loss discovered on the permanent connections that the Router keeps on XProtocol port for that feature, MUST trigger a metadata refresh.
FR12: For each notification received the Router MUST check if the view_id value that comes with the notification is different than the view_id of the last handled notification. The metadata trigger action should be taken only if it is different. This way the router should debounce multiple notifications that are received as a result of a single change.
FR13: For each x-protocol connection the Router makes to listen for GR notifications it MUST use the same ssl_mode and other ssl parameters that is configured in the [metadata_cache] section and was used for classic metadata connection so far.

By default the new GR notifications feature is disabled. To enable it a new --conf-use-gr-notifications parameter should be used while bootstrapping. For example:


mysqlrouter -B 127.0.0.1:5000 --directory=test --conf-use-gr-notifications

That will add the use_gr_notifications parameter to the configuration file. For example.


[metadata_cache:test]
router_id=3
user=mysql_router3_oigbmjxrmaid
metadata_cluster=test
ttl=0.5
use_gr_notifications=1

One can also add it manually in the metadata_cache section in the existing configuration file to enable this feature.