WL#11318: persist last known set of metadata servers

Affects: Server-8.0   —   Status: Complete

Motivation

After bootstrap MySQL-Router stores the addresses of the metadata servers in its configuration file.

Group Replication allows to add servers and remove servers on the fly.

After MySQL Router is running for a long time and gets restarted, it will try to connect to the servers initially bootstraped, instead of the last known set.

The list of servers in the configuration may now even be part of different clusters as they their IP addresses got reused.

Goal

  1. Router should track and persist the addresses of the metadata-servers.
  2. After a restart, Router should prefer the persisted list of members when connecting to the metadata-servers.
  3. The persisted state should include an identifier of the group-replication group and check it when it connects to a metadata-server.

Currently the whole MySQLRouter configuration is static and it is stored in the configuration *.conf file that is read once on the application startup. This WL introduces notion of dynamic configuration.

:: Functional Requirements:

FR1
When MySQLRouter discovers the change in the InnoDB cluster metadata regarding the metadata servers list, it MUST save this to the persistent memory in order to use that (more current data) when restarted.
FR2
MySQLRouter MUST store replication group ID to be able check whether the current metada still belongs to the replication group for which the configuration was created.
FR3
In case the current metada does not belong to that replication group it MUST be discarded with a proper error message.
FR4
The persistent storage for the dynamic configuration MUST survive MySQLRouter restart and operating system reset.
FR5
The persisted storage MUST be versioned so that the potential incompatibly could be discovered and reported when loading.
FR6
The configuration file generated before introducing this feature MUST still be valid (backward compatibility). In the case of such configuration file the dynamic configuration functionality should be disabled.

:: Non-Functional Requirements:

NFR1
Some sort of a schema for the persisted starage MUST exist to automatically verify the storage validity.
User visible implementation decisions:
1
Dynamic configuration shall be stored in the json file.
2
The dynamic configuration json file shall contain "version" field with a version in x.y.z format. If the x and/or y in the file is different than the one required by the running MySQLRouter instance, the configuration will be discarded with proper error message. Different z (patch) part should not break the compatibility.
3
Dynamic configuration file shall be verified against the json schema when loaded. In case it does not match the schema MySQLRouter shall leave with the proper error message.
4
Dynamic configuration json file shall be created during the bootstrap with the initial data.
5
Dynamic configuration file shall be (by default) placed in MySQLRouter's data directory.
6
Path to the dynamic configuration file shall be added to the [DEFAULT] section of the static configuration file: [DEFAULT] dynamic_config=/home/areliga/dev/router/install/x2/data/mysqlrouter_dynamic_conf.json
7
For now the initial data in the dynamic configuration file shall be InnoDB Cluster metadata server addresses and group replication ID.
8
Upon the startup the metadata_cache component shall load the initial metadata server addresses and replication group ID from the dynamic configuration file.
9
When the MySQLRouter is running whenever "metadata_cache" component discovers the changes in the metadata (for now the addresses of the metadata servers) it shall update the dynamic configuration file with the new data.
10
Static configuration file created during the bootstrap shall no longer contain bootstrap_server_addresses entry in the [metadata_cache] section. This information will be stored in the dynamic configuration json file.
11
If the dynamic configuration file is specified in the [DEFAULT] section of the static configuration file and the bootstrap_server_addresses is present in [metadata_cache] section, the static configuration file is invalid and shall be discarded with proper error message.
12
For backward compatibility if the [DEFAULT] section of the static configuration file does not contain the "dynamic_config" entry, the MySQLRouter shall not use the dynamic configuration functionality and static bootstrap_server_addresses shall still be required.
Proposed tasks split:

1) Create generic dynamic configuration handler in MySQLRouter - reads initial configuration from json file - allows router components to read selected section (by name) as json object - allows router components to write new state to the json (flushed to file) - synchronizes the access - includes writing UTs

2) Use dynamic configuration handler to write initial data while bootstraping - create the handler object on bootstrap - use it to write metadata cache data from bootstrap (metadata cache servers and group replication id) to json file - put path to that file in the static configuration file

3) Add usage of dynamic configuration handler in mysqlrouter - creating the handler object with file path read from static configuration - add API to expose this to other components (DIM) - chcecks the version for compatibility - verifies the schema (need to embed schema in the binary?)

4) Add handling of the dynamic configuration in the metadata cache - check if group replication from configuration didn't change (error handling if did) - load and use last stored set of metadata cache servers - save new set of metata cache servers each time a change is discovered during runtime

5) Write component level tests - metadata cache scenarios: - group replication id doesn't match - bootstrap server list changed during runtime - error case scenarios (wrong schema, wrong config file version) - and more

Schema validation:

The json specification does not say directly what should happen if the json object has some fields duplicated. RapidJson schema validator does not allow to mark a object field as unique as it allows to do for the array elements for example. RapidJson returns the value for the first occurrence in that case but this is implementation specific and we leave this behavior as undefined. So for example if "group-replication-id" field appears twice in the "metadata-cache" section in the dynamic file the behavior is not specified (currently the first value used).