WL#7648: Decoupling provisioning from updates to state store
Status: In-Documentation
GOAL
====
The aim of this work is to decouple operations that update the state store from
those that are responsible for setting up servers such as provisioning them or
simply configuring replication or any high-availability solution. This is key to
have an easy-to-use product, which could be easily adopted and deployed at
different environments.
In other words, this work is a stepping stone towards supporting different
provisioning mechanisms (e.g. MySQL Enterprise Backup, AWS Support, etc) and
different High-Availability Solutions (e.g. MySQL Cluster, DRDB, etc). Besides
it will allow users who are willing to use Fabric but want to rely on their own
scripts to continue doing so.
CONTEXT
=======
Fabric provides three main sub-systems: High-availability, Sharding and
Provisioning. Fabric organizes servers into high-availability groups for
providing resilience to failures and shards are assigned to groups in order to
take advantage of their high-availability features. Currently, only the standard
MySQL Replication is supported though.
Appropriate interfaces are provided so that groups and shards can be set up and
provisioned by a database administrator. On the other hand, connectors and, in
particular, users' applications don't need to manage groups and shards but need
to fetch information on them to find out which servers are responsible for a
group or shard. So Fabric must provide interfaces that allow to:
. fetch information - Retrieve information on high-availability groups and
shards stored in the state store, etc
. update information - Add/remove servers into/from a group or define/move/
split/remove shards, etc. In order words, update the state store.
. provisioning the system - Install new servers; take backups and restore
them; configure replication among servers, etc.
Usually when information on groups or shards is updated, provisioning steps are
executed as well. For example, moving a shard from one group to another may
require to restore a backup of the source group to the destination group before
updating any information which maps a shard to a group. However, in some cases,
users may want to execute they own provisioning steps because:
. Fabric provisioning steps are not a good fit to their environment;
. Their provisioning steps provide better performance;
. Integrating their provisioning steps into Fabric requires time, etc.
PROPOSAL
========
Any command that may execute a provisioning step must provide the --update_only
option so that users may choose whether they want to execute the provisioning
steps or skip them. By default the option is false meaning that the provisioning
shall be executed. If the option is set to true, the command will only update
the state store.
In the future, any provisioning method shall be available as a command as well.
Doing so, users will be able to choose the appropriate provisioning method while
setting up their environment.
Currently, Fabric only supports a provisioning method that is deeply rooted in
the shard.move/split routines. Besides, the group.add/set_status configures
replication which can also be considered a provisioning method. Note though that
we will not extract these provisioning methods as commands in the context of
this WL.
REMARKS
=======
. We will not provide support to different provisioning solutions in the context
of this work.
. We will not provide support to different High-availability solutions in the
context of this work. See WL#7392 for further details.
. This work depends on WL#7528 which provide the means to use/access optional
parameters in a command.
. We also took the opportunity to do the following changes in this patch:
. Removed the group.import_topology(...). It required to start servers
with the following options: --report-host and --report-port. This was
not very useful.
. Renamed group.check_group_availability(...) to group.health(...) and
removed the "is_master" from the returned value. Users can check the
same information in the "status" value.
CHANGED COMMANDS
================
We shall add the --update_only parameter to the following commands:
. group.add(server, ..., update_only=False, ...)
- This command adds a server into a group: updates the state store and
configures replication.
. group.promote(group_id, slave_uuid=None, update_only=False, ...)
- This command demotes the current master if there is any and promotes a new
server to master. Only secondaries are automatically chosen to become
primary. If users want to promote a spare to master, the --slave_uuid
parameter must be provided.
. group.demote(group_id, slave_uuid=None, update_only=False, ...)
- Demotes the current master if there is any.
. server.set_status(server, ..., update_only=False, ...)
- This command changes a server's status: updates the state store and
configures replication.
. sharding.move(shard_id, ..., update_only=False, ...)
- This command moves a shard from a group to another: updates the state store,
takes a backup of the source group, restore it to the destination group and
synchronizes the source and destination group.
. sharding.split(shard_id, ..., update_only=False, ...)
- This command splits a shard between to groups: updates the state store,
takes a backup of the source group, restore it to the destination group and
synchronizes the source and destination group.
STATUS TRANSITIONS
==================
These are the possible server's status:
. Primary denotes that a server may accept write transactions
and secondaries connect to it to fetch updates.
. Secondary denotes that a server accepts read-only transactions
and connect to a primary to fetch updates.
. Spare is a secondary that is not automatically elected to
become a primary if there is a need to do so.
. Faulty is a server that is not behaving as expected or
is unreachable.
+-----------+---------+-----------+-------+--------+
| FROM/TO | PRIMARY | SECONDARY | SPARE | FAULTY |
|-----------|---------|-----------|-------|--------|
| PRIMARY | | * | | + |
|-----------|---------|-----------|-------|--------|
| SECONDARY | * | | x | + |
|-----------|---------|-----------|-------|--------|
| SPARE | * | x | | + |
|-----------|---------|-----------|-------|--------|
| FAULTY | | | x | |
+-----------|---------+-----------+-------+--------+
The operations that can change a server's status are the following:
. group.promote() and group.demote() denoted with *.
. threat.report_faulty() and threat.report_error() denoted
with +.
. server.set_status() denoted with x.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.