WL#13417: Handle the metadata upgrade in the Router
This WL's goal is to address the shell's AdminAPI Metadata Version Handling
The Router is expected to understand and properly handle the "0.0.0" version and "metadata_upgrade_in_progress" lock.
FR1: New routers (8.0.19) shall refuse to bootstrap against Cluster with metadata 1.x.x.
- FR1.1: It must error out with an appropriate message, suggesting the user to upgrade the cluster metadata.
FR2: When the router discovers that the metadata version is 0.0.0 (meaning metadata upgrade) it must refuse to bootstrap. The error message should inform the user that the metadata upgrade is in progress and advise to try again when it is done.
FR3: In the regular (non-bootstrap) mode the Router should work with both old (1.x.x) and new (2.x.x.) metadata.
FR4: The router must handle properly the scenario of metadata version changes between the consecutive metadata refresh it does every TTL.
- FR4.1: Routers need to continue to work (without causing any downtime) and use the new Metadata version whenever that is detected. Regardless if the new version is a newer or older version.
FR5: When the router discovers the metadata version 0.0.0 (upgrade in progress) it should not proceed with the metadata refresh it was about to begin. Instead it should keep using the last metadata it has cached. All the existing user connections should be kept, new connections should be routed according to the cached metadata.
- FR5.1: The Metadata refresh shall re-start when the Metadata version is != 0.0.0.
FR6: To prevent the inconsistent data that could happen when the upgrade started while the router is doing the metadata refresh:
- all the queries that the router does should be done inside a single transaction
- first SQL of that transaction should be a query on the schema_version view
- that should prevent the shell from starting the upgrade once the Router read the version as the first command the Shell would do should be dropping that view (that should block until router commits/rollbacks its transaction)
FR7: When bootstraping against the new metadata the Router should register its version in the routers table (new routers.version field).
FR8: When bootstrapping the new Router should grant the newly created user the right to select/insert/delete/update the routers table (for the version update) and the execute right on mysql_innodb_cluster_metadata.*.
FR9: When starting in the routing mode the Router should always update the routers.version with its version.
- FR9.1: If it discovers the old metadata it should put it in the routers.attributes $.version JSON field.
- FR9.2: If it discovers the new metadata it should put it in the routers.version field.
- FR9.3: In both cases it is assumed that the db user that the Router uses will have the proper rights for the update (the Shell is supposed to grant it). If that is not the case the Router should continue working, logging a warning about not being able to update the version in the metadata.
FR10: The running router should periodically update the routers.last_check_in field with the current timestamp. Failing to do so should also not be fatal.
Router bootstapping errors
Bootstrapping a Router that does not support the latest Metadata schema version (2.0) will automatically fail since the hosts table has been removed on the metadata schema 2.0.3 version. This is indentional and the error printed needs to be documented to indicate that an update of the Router is required.
As for bootstrapping a Router that supports the latest Metadata schema version (2.0) on a Cluster using the old Metadata Schema version (1.0) it must fail with an error indicating the user to fully upgrade the Metadata schema on the cluster.