WL#10719: Invalidate Metadata Cache based on Group Replication Notification

Affects: Server-8.0   —   Status: Complete

Motivation

The Router maintains its view of the world in the metadata cache.

The cache

  • fetches the topology information from the metadata server(s)
  • polls the health state of the servers on an interval
  • invalidates the cache based on connection errors

After an error the health checks has to guess/narrow down the reason why the connection failed:

  • server is dead
  • network failed
  • idle connection timeout

and decide if the backend needs to be taken out of the pool or not.

Background: xplugin

The xplugin can expose different kinds of information that make the failover behaviour more reliable and stable:

  • notification on idle connection close
  • notification on group replication view change

Expected Behaviour

As the notifications are a feature that is only available through the xprotocol the router has to connect to the backends via the xprotocol and enable the receiption of those notifications.

To support cases where the xplugin isn't activate, the current health check via the classic protocol should continue to work.

On reception of a GR view change notification from the xplugin the metadata cache should invalid its cache for that cluster and trigger a refresh of the group status

Benefits

With this feature in place the Router (hence the user application) will get notified about most of the cluster changes asynchronously, right after they happened. Currently we encourage setting low ttl for metadata refresh (current default is 0.5s). That causes some overhead of reconnecting to the metadata servers and querying them quite often. With the GR notification feature, the ttl can be set to higher value and treated as an additional safeguard, not as a primary means of keeping the information about the cluster state.