WL#13177: New bootstrap option (--account) to allow reuse of an existing account

Affects: Server-8.0   —   Status: Complete

Abstract

Router needs a Server account, which it uses when it talks with the InnoDB cluster. Currently, this account is ALWAYS automatically created during bootstrapping (if previous account exists, it is wiped first and then recreated). This worklog adds an option to disable this behaviour and instead reuse an existing account.

Motivation

As the user grows his infrastructure, he may get to a point where he's using many Routers. Currently, each Router will create a separate Server account, leading to a situation where the user has to manage a lot of accounts, which is tedious. Therefore it is desirable to reuse accounts, such that many Routers share the same Server account to keep the number of accounts low, perhaps even down to just a single account.

Also, since such accounts are no longer private to just one Router, the user may want to choose meaningful account names (rather than Router autogenerating them) to ease administration.

Finally, the user might also want to manage the accounts via external tools instead of Router (e.g. MySQL Shell, manually by hand, etc), meaning that Router should to be able to work with such accounts.

User stories

  1. As a Cluster admin, I want to create one username for all routers of a InnoDB cluster and reuse for all other routers of the cluster to manage it remotely.
  2. As a Cluster admin, I want to automate the reuse of mysql-users for the router bootstrap.
  3. As a Cluster admin, I want to be sure that username that's created works for the current router.

Goal

  1. Allow bootstrap to reuse existing accounts
  2. Allow bootstrap to fail if account exists when it is supposed to be created.
  3. Keep existing behaviour for bootstrap and account creation as is.

Terminology Used In This Document

Bootstrap and Router

Currently, bootstrapping is done by running Router binary in a special bootstrap mode. However, to ease documentation, whenever discussion mentions Router, it means Router running in normal (non-bootstrap) mode. Whenever it mentions Bootstrap, it means Router running in bootstrap mode.

username, hostname and account

MySQL Server account identifiers consist of 3 parts: username, '@' and hostname (e.g. foo@bar). In this document we'll refer to the 1st and 3rd using terms username and hostname, and the entire identifier will be called account (and occasionally when disambiguation is needed, its more explicit form: account name). For the case of foo@bar, these would be:

username = foo
hostname = bar
account [name] = foo@bar

List of accounts to be created during bootstrap is determined by --account-host as follows:

For every --account-host <hostname> passed on command-line, bootstrap will create an account according to policies defined by --account and --account-create options (if passed). If no --account-host option is used, except for case described by FR10, behaviour will be equivalent to passing --account-host %. (This behaviour exists already, here we're just explicitly stating how it should work together with new --account-* options)

Functional Requirements

with --account

FR1
Adding to bootstrap command-line an option --account <username> and providing (optionally empty) password at the prompt MUST result in bootstrap using the provided username (instead of reusing the username specified in configuration file or auto-generating a new one) and password (instead of using password stored in keyring or autogenerating it).
FR2
If --account-create never is passed on command-line, bootstrap MUST NOT create new accounts (CREATE USER and GRANTs steps MUST be skipped, but all other setup steps still apply).
FR3
If --account-create always is passed on command-line, bootstrap MUST NOT reuse any accounts during course of bootstrap - it MUST try to create the accounts. If account creation fails, bootstrap MUST exit with pertinent error-message and make no changes to any files. The database MAY require manual cleanup by the user to revert back to its initial state.
FR4
If --account-create if-not-exists is passed on command-line, bootstrap MUST ensure that the account(s) are created if they don't exist yet.
FR5
If --account is specified and --account-create <arg> is omitted from command-line, it MUST default to --account-create if-not-exists.
FR6
Using --account without -B or --bootstrap MUST trigger program exit with a pertinent error message and make no changes to database or any files.
FR7
Using --account without an argument (account name) or with an empty argument ('') MUST trigger program exit with a pertinent error message and make no changes to database or any files.
FR8
Using --account-create without a valid argument (one of: always, if-not-exists, never) MUST trigger program exit with a pertinent error message and make no changes to database or any files.
FR9
Using --account-create without --account MUST trigger program exit with a pertinent error message and make no changes to database or any files.
FR10
Using --account-create never with --account-host <host> simultaneously, MUST trigger program exit with a pertinent error message and make no changes to database or any files.

without --account

FR11
Bootstrap running without --account MUST NOT recreate an existing account (this is a change of old behaviour, currently bootstrap will DROP USER and then CREATE [the same] USER).

common

FR12
Bootstrap MUST always ask a predictable set of questions, in predefined order. This set MAY depend on command-line options provided, but MUST NOT depend on initial state (i.e. config existence/contents, whether the account exists or not, whether keyring exists or has the password for the account, etc. The idea is to make it automation-friendly). However, it is permissible to ask less then a full set of questions when exiting with an error. The details for this requirement are described in section "User prompts".
FR13
The "effective-username" MUST be verified (account-verification) if they can log in into the mysql-server and execute the queries a Router would.
FR14
If account-verification fails and --strict is not provided, the reason MUST be shown to the user as warning and bootstrap should continue.
FR15
If account-verification fails and --strict is provided, the reason MUST be shown to the user as error, the bootstrap MUST fail and make no changes to any files.
FR16
Using --strict without -B or --bootstrap MUST trigger program exit with a pertinent error message and make no changes to database or any files.
FR17
If bootstrap fails for any reason after the account creation step, it MUST revert creation of any accounts that were created during bootstrap. If it is unable to determine which accounts it created[*], it MUST give user a message explaining this situation, which MUST include a list of all potential accounts that the user might have to remove by hand.

[*] When bootstrap calls CREATE USER IF NOT EXISTS with a list of accounts, it doesn't know which of those already exist. It might need to successfully query SHOW WARNINGS after, to find this out.


Non-functional requirements

NFR1
Password for the --account argument MUST NOT be provided from command-line. In particular, --account foo:bar will use foo:bar for username and expect the password to be still provided (rather than using foo for username and bar for password); such case will also not receive any special treatment, i.e. no warning will be issued advising a user that perhaps he meant to create an account with username foo.
NFR2
User is responsible for ensuring that the accounts reused by bootstrap have sufficient access rights to be used by Router. Bootstrap will not attempt to add missing grants to existing accounts.

SUMMARY OF CHANGES

THE NEW --account OPTION

New option --account changes several assumptions about what the user expects out of bootstrap. Below table outlines how the design assumptions compare (the reality may be different, but those are the assumptions):

without --account with --account
username and password are autogenerated; user does not care what they are username and password are specified by user; user may want control over what they are
each Router has its own accout multiple Routers may share an account
priority = ease of deployment priority = ease of management

CHANGES TO BOOTSTRAPPING WITHOUT --account OPTION

1. boostrap no longer does a DROP USER followed by CREATE USER on an existing user.  If the
   account needs to be created, it is, if it already exists, it's not modified.

2. username and router_id are no longer tied together.  Before, if configuration was missing
   router_id, or it was detected to belong to a different router, username got automatically
   reset to a new (autogenerated) one.  The new way is to keep the username.

NEW COMMAND-LINE OPTIONS DETAILS

--account <user>  Specify SQL user to use, should exist unless passing --account-create (always|if-not-exists)
                  Setting this option also triggers a password prompt for this account,
                  regardless of whether the password is available in the keyring.

--account-create  Specify account creation policy.  This option is useful for guarding against
                  accidentally bootstrapping using a wrong account.
                    'always'        - bootstrap only if account doesn't exist (create it)
                    'never'         - bootstrap only if account exists (reuse it)
                    'if-not-exists' - bootstrap either way (if account exists, reuse it,
                                                            if not, create it first)

                  If this option is omitted, it defaults to `if-not-exists`.

                  '--account-create' can only be used if --account is used.

                  '--account-create never' cannot be used together with `--account-host <host>`

BEHAVIOUR OF --account-host

BACKGROUND:

When running without --account option, --account-host is used to specify the hostname part of the account name (<username>@<hostname>) that's being created. If omitted, it defaults to '%'. It is possible to specify this option multiple times, allowing multiple accounts to be created. For example, running Bootstrap with --account-host foo --account-host bar will create two accounts named:

  • Router<id>_<random_alphanums>@foo
  • Router<id>_<random_alphanums>@bar

instead of the default:

  • Router<id>_<random_alphanums>@%

Note however that it's perfectly valid to pass --account-host % as well, thus --account-host foo --account-host bar --account-host % will yield three accounts:

  • Router<id>_<random_alphanums>@foo
  • Router<id>_<random_alphanums>@bar
  • Router<id>_<random_alphanums>@%

Naturally, the '%' account makes the other two unnecessary, but since the Server allows this and does not stand in the user's way to get what he asked for, neither do we. There is actually a valid reason why a user might want to have such a setup: he might have multiple Routers, currently online, that he wants to migrate to foo and bar domains in the future, but the network infrastructure is not set up yet. In anticipation of this, he also includes '%' to make the new Router work now, and he'll remove that account when it's no longer needed.

CHANGES TO FAILOVER DURING ACCOUNT CREATION

Currently, account creation consists of 4 queries (in this order), per each account:

1. CREATE USER `<username@hostname>`
2. GRANT SELECT ON mysql_innodb_cluster_metadata.* TO `<username@hostname>`
3. GRANT SELECT ON performance_schema.replication_group_members TO `<username@hostname>`
4. GRANT SELECT ON performance_schema.replication_group_member_stats TO `<username@hostname>`

All four queries write to database, and if any of these fail to execute due to connection loss or server being read-only, bootstrap will fail-over to next node and start over. While failing-over on the first query is ok, failing-over on subsequent queries enters a race between fail-over and replication which might automatically execute prior statments on the failed-over-to node. Therefore the new policy will be to exit with error, rather than fail-over, on any subsequent write operation error after 1st writing operation succeeds (i.e. in our case, only CREATE USER operation can be failed-over, all others must trigger exit with error).

ACCOUNT CREATION QUERIES

Account creation will execute 4 queries, with a possible 5th. They will be executed just once (rather than once per account, as is the case now). New queries will be:

1. CREATE USER [IF NOT EXISTS] `<comma-separated list of all username@hostname pairs>`
1a. SHOW WARNINGS
2. GRANT SELECT ON mysql_innodb_cluster_metadata.* TO `<comma-separated list of all username@hostname pairs>`
3. GRANT SELECT ON performance_schema.replication_group_members TO `<comma-separated list of all username@hostname pairs>`
4. GRANT SELECT ON performance_schema.replication_group_member_stats TO `<comma-separated list of all username@hostname pairs>`

[IF NOT EXISTS] in query #1 depends on whether --accout-create if-not-exists was used or not. Query #1a will execute only if CREATE USER IF NOT EXISTS runs and returns warnings. Through this query, bootstrap will learn which accounts already existed before query #1 ran.

ACCOUNT VERIFICATION

Bootstrap has no way to check if all accounts it sets up are actually usable by Router. The best it can do is to try to connect to Server using username and password, but which account (hostname) will be used to accomplish this is beyond its control, as that depends on what hostname the Server sees connecting to it.

Currently, Bootstrap performs no such check. Therefore it is possible to bootstrap into a configuration which will not work. For example:

$ bin/mysqlrouter -B 192.168.43.111:3310 --account-host foo
..
..
(succeeds)

Bootstrap succeeded. Yet when we try to run the Router, we're out of luck:

$ bin/mysqlrouter
..
..
..
2019-05-20 01:58:30 metadata_cache WARNING [7f7f8aa91700] Failed connecting with Metadata Server c18:3310: Access denied for user 'mysql_router1_r5o2n931offx'@'192.168.43.110' (using password: YES) (1045)
2019-05-20 01:58:30 metadata_cache ERROR [7f7f8aa91700] Failed to connect to metadata server
2019-05-20 01:58:30 metadata_cache WARNING [7f7f8aa91700] Failed connecting with Metadata Server c18:3320: Access denied for user 'mysql_router1_r5o2n931offx'@'192.168.43.110' (using password: YES) (1045)
2019-05-20 01:58:30 metadata_cache ERROR [7f7f8aa91700] Failed to connect to metadata server
2019-05-20 01:58:30 metadata_cache WARNING [7f7f8aa91700] Failed connecting with Metadata Server c18:3330: Access denied for user 'mysql_router1_r5o2n931offx'@'192.168.43.110' (using password: YES) (1045)
2019-05-20 01:58:30 metadata_cache ERROR [7f7f8aa91700] Failed to connect to metadata server

This WL introduces the needed validation, which will run regardess of whether --account switch is used. After all account setup is done and configuration file written, Bootstrap will now proceed to account validation. It will consist of logging in using the username and password we just stored in the keyring, and running all the SQL queries that Router uses during its operation. If this check fails, Bootstrap will still exit successfully, but with a warning message thrown on console advising of failed check. The reason why we don't want to error out, is because there are valid cases when the user might want the bootstrap to succeed (e.g. configuring "for the future", when the user will configure network to make Router work later). However, for the users that would like to see this warning turned into an error and fail bootstrap, we provide a new command-line switch: --strict.

ACCOUNT CREATION UNDO

Currently, if a fatal error occurrs after account(s) have been created, these accounts will remain in the database, causing accumulation of "dead accounts" over time. This WL introduces a clean up feature which will drop the account(s) just created in such case. Note that dropping of such accounts may fail if Bootstrap was trying to create multiple accounts with CREATE USER IF NOT EXISTS, but was unable to figure out which accounts already existed (it uses SHOW WARNINGS query to figure this out - which may fail just like any other query); in such case Bootstrap will inform the user of what happened, give the full list of accounts that it tried to create and suggest that the user removes the ones that did not exist before by hand. The actual drop of accounts can also fail; in such case Bootstrap will also present a message like in the previous case, but having successfully ran and parsed SHOW WARNINGS query, it will limit the list to just accounts that were actually created.

BEHAVIOURAL DETAILS

The details of expected behaviour are rather complicated, as there are many starting state dimensions to consider. Below we discuss just the dimensions that pertain to functionality changed by this WL.

PROBLEM DIMENSIONS

INPUT VARIABLES:

string cluster = name of the cluster to bootstrap

INDEPENDENT DIMENSIONS:

# config
bool C1 = config exists && is valid
bool C2 = [metadata_cache] exists && [metadata_cache].cluster == name reported by MD server
bool Cr = [metadata_cache].router_id exists && is owned by us
bool Cu = [metadata_cache].username  exists && !empty

# keyring
bool K = password(effective_username) exists in keyring

# command-line
bool --account           = --account <username> is passed from cmdline
enum --account-create    = (default = never), never, create, if-not-exists
bool --force             = --force              is passed from cmdline
// --force prevents bootstrap abort on:
//   1. router_id in config not owned by us
//   2. matching {host_id, router_name} already present in routers table
//      (see MySQLInnoDBClusterMetadata::register_router() at the end of cluster_metadata.cc)

// these three dimensions get simplified to Ee
# account state
bool Ec = [metadata_cache].username exists in DB
bool Ea = --account_username        exists in DB
bool Eg = autogenerated_username    exists in DB

ALIAS DIMENSIONS:

bool password_prompt = true if --account is passed from cmdline, false otherwise

OUTPUT VARIABLES:

string effective_username = username chosen by logic to bootstrap with, one of:
                            [metadata_cache].username, --account_username, autogenerated_usernamej

OUTPUT DIMENSIONS:

// This is true only if ALL <username>@<hostname> pairs exist in DB, false otherwise.
// <hostname>(s) are provided via --account-host args (if not given, defaults to '%')
bool Ee = effective_username exists in DB for all --account-host args

NOTE: If config has syntax errors (such as router_id=foo or router_id=-1), Bootstrap will throw (just as it will in many other cases such as I/O errors, bogus command-line, access violations, etc). A less-obvious config error is that Bootstrap will also throw if there exists more than one [metadata_cache] section, as this is not supported at the moment. To keep things simple, we assume none of this happens here.

TRUTH TABLES

The following tables summarise expected behaviour for all combinations of input state dimensions listed above.

effective_username computation

dimensions effective_username
--account C1 C2 Cr Cu --force before now
0 0 N/A N/A N/A * autogenerated_username = before
0 1 0 N/A N/A 0 error "Router already configured for cluster [name_from_config], use --force to replace" = before
0 1 0 N/A N/A 1 autogenerated_username = before
0 1 1 * 0 * autogenerated_username = before
0 1 1 0 1 * autogenerated_username [metadata_cache].username
0 1 1 1 1 * [metadata_cache].username = before
1 0 N/A N/A N/A * N/A --account_username
1 1 0 N/A N/A 0 N/A error "Router already configured for cluster [name_from_config], use --force to replace"
1 1 0 N/A N/A 1 N/A --account_username
1 1 1 * * * N/A --account_username

--account-create behaviour

hostnames hostnames after vs --account-create
--account-host args existing hosts never if-not-exists always
[none] [none] [none] % %
[none] % % % error "<username>@% already exists"
[none] % A % A % A error "<username>@% already exists"
[none] A A % A % A
% [none] error "cannot use --account-host with --account-create never" % %
% % error "cannot use --account-host with --account-create never" % error "<username>@% already exists"
% % A error "cannot use --account-host with --account-create never" % A error "<username>@% already exists"
% A % error "cannot use --account-host with --account-create never" % A error "<username>@% already exists"
% A error "cannot use --account-host with --account-create never" % A % A
A % error "cannot use --account-host with --account-create never" % A % A
A [none] error "cannot use --account-host with --account-create never" A A
A A error "cannot use --account-host with --account-create never" A error "<username>@A already exists"
A A B error "cannot use --account-host with --account-create never" A B error "<username>@A already exists"
A B A error "cannot use --account-host with --account-create never" A B error "<username>@A already exists"
A B error "cannot use --account-host with --account-create never" A B A B
1.2.3.4 1.2.3.4/0.0.0.0 error "cannot use --account-host with --account-create never" 1.2.3.4 1.2.3.4/0.0.0.0 1.2.3.4 1.2.3.4/0.0.0.0

account creation

NOTE: To avoid complicating the truth table, 2 dimensions were left out:

  1. whether account verification passes
  2. whether --strict is in effect

While they do not influence account creation itself, they would modify the outcome of entire bootstrap in the following way:

  • if account verification fails, a warning message like so will appear: "verification test: logging in as user <username> failed"
  • if such warning is issued, and bootstrap is running with --strict option, this warning will be upgraded to an error, preventing config generation and keyring update, and issuing DROP USER on accounts just created (only those which bootstrap just created, any that it reused would stay intact).
dimensions expected behaviour
--account/password_prompt --account-create effective_username effective_username exists in DB (Ee) before now
0 N/A autogenerated_username 0 create user autogenerated_username with autogenerated password, create config, add user to keyring = before
0 N/A autogenerated_username 1 drop user with autogenerated name, create user autogenerated_username with autogenerated password, create config, add user to keyring create config, add user to keyring
0 N/A [metadata_cache].username 0 create user autogenerated_username with autogenerated password, create config, add user to keyring = before
0 N/A [metadata_cache].username 1 drop user [metadata_cache].username, create user [metadata_cache].username with autogenerated password, create config, add user to keyring create config, add user to keyring
0 N/A --account_username * scenario not possible without --account = before
0 never/if-not-exists/always * * error "unknown option --account-create" error "missing option --account"
1 (omitted)/never/if-not-exists/always autogenerated_username * scenario not possible with --account = before
1 (omitted)/never/if-not-exists/always [metadata_cache].username * scenario not possible with --account = before
1 never --account_username * error "unknown option --account" prompt for user's password, create config, add user to keyring
1 (omitted)/if-not-exists --account_username * error "unknown option --account" prompt for user's password, create user (if doesn't exist yet) --account_username with entered password, create config, add user to keyring
1 always --account_username 0 error "unknown option --account" prompt for user's password, create user --account_username with entered password, create config, add user to keyring
1 always --account_username 1 error "unknown option --account" prompt for user's password, error "user <username>@<hostname> already exists, rerun with --account-create=never/if-not-exists"

USER PROMPTS

Since we want bootstrap be automation-friendly, we need to ensure that its user interaction is predictable, so that the script running the bootstrap can feed it the required data without having to do any parsing of output and reacting to it. In other words, bootstrap MUST ask the same set of questions (password prompts), in a predefined order.

The rules are simple:

  1. bootstrap will first always ask for bootstrap user password (bootstrap user = root, unless it is specified in --bootstrap|-B argument)
  2. if user supplies --account option, it will next ask for password for this account
  3. 2nd question, or both 1st and 2nd question may not be asked if bootstrap exists with an error
  4. each question can only be asked once (i.e. if the user supplies an incorrect password, bootstrap will end with an error rather than telling the user that the password is incorrect and ask for it again).

Atomic account creation

Currently, accounts are created in four steps, for each --account-host:

CREATE USER <account> IDENTIFIED BY <password>
GRANT SELECT ON mysql_innodb_cluster_metadata.* TO <account>
GRANT SELECT ON performance_schema.replication_group_members TO <account>
GRANT SELECT ON performance_schema.replication_group_member_stats TO <account>

If one of them fails, it makes database cleanup hard. To minimise the mess, these statements should be changed as follows:

  • user creation MUST be performed atomically for all users in one shot (i.e. Bootstrap MUST execute just one statement: "CREATE USER [IF NOT EXISTS] <list_of_users>" instead of one per user)
  • user grants MUST be performed atomically for all users in one shot (i.e. Bootstrap MUST execute just one statement per grant: "GRANT <grant> TO <list_of_users>" instead of one per user and per grant)

Not a perfect solution, because if one of GRANT statements fails, we're still left with incomplete setup. None-the-less, it's an improvement over what we have now. In the future, maybe we'll specify user ROLE during CREATE USER operation -we cannot use them now because Server 5.7 does not support this.


Pseudocode

The expected behaviour discussed in HLS can be achieved by the following pseudocode.

Pseudocode for command-line checks:

if --account-create && !--account:
  exit_with_error "--account-create can only be used in conjunction with --account"
if --account && !(-B || --bootstrap):
  exit_with_error "--account can only be used in conjunction with -B|--bootstrap"
if --account has no argument:
  exit_with_error "--account requires an argument"
if --account-create == "never":
  if any --account-host was provided:
    exit_with_error "cannot use --account-host with --account-create never"

Pseudocode for username and password computation:

router_id = 0
username = ""
if config_exists && config_is_valid:
  // can return empty username and/or 0 for router_id, if not found
  (username, router_id) = read_router_id_and_username_from_config()
if --account:
  username = --account value
  password = prompt("give password for user <username>")
elif username.empty():
  username = gen_random_username() // router_id is embedded in the name
  password = "" // will be autogenerated
else:
  if keyring_has_password(username):
    password = keyring_get_password(username)
  else:
    exit_with_error "password for user <username> not found in the keyring.  Add it to the
                     keyring using this command: <give the command>, and then try again"

Pseudocode for account creation:

// command-line arg parsing
if --account-create == "never":
  return  // bypass the rest of this section
assert(--account-create == "always" or "if-not-exists")
userhosts = []
foreach hostname in --account-host args:
  userhosts.append("<username>@<hostname>")
if --account-create == "if-not-exists":
  if_not_exists = true
else
  if_not_exists = false

// failover loop
accounts_created_ok = false
new_userhosts = []
foreach node in cluster:
  ok = connect_to_node(node)  // RAII will disconnect
  if !ok:
    next

  (ok, wrn_cnt) = do_CREATE_USER(if_not_exists, userhosts, password)
  if !ok:
    next

  if wrn_cnt > 0:
    (ok, userhosts_already_existing) = do_SHOW_WARNINGS()
    if !ok:
      exit_with_error("SHOW WARNINGS failed.  Is your DB version compliant?")
    new_userhosts = userhosts - userhosts_already_existing
  else:
    new_userhosts = userhosts

  ok = GRANT(new_userhosts)
  if !ok:
    exit_with_error("GRANT failed")

  accounts_created_ok = true
  break loop

if !accounts_created_ok:
  msg = cleaup_accounts()
  exit_with_error("Bootstrap failed because ..." + msg)
else:
  // do other stuff here, like write config and keyring
  // ensure that unless all operations succeed, the cleanup_accounts() will get called before exiting

function cleaup_accounts() {
  if !new_userhosts.empty():
    ok = do_DROP_USER_IF_EXISTS(new_userhosts)
    if !ok:
      return "\nBTW, accounts got created, but something later failed, tried dropping them, that failed too, here's the list of them if you want to clean them up yourself: " + new_userhosts
    else
      return ""
}

Pseudocode for account verification:

conn = mysql_connect(username, password)
if !conn:
  msg = "Account validation failed: couldn't connect to cluster using <username>"
  if --strict:
    exit_with_error(msg)
  else:
    exit_with_warning(msg)
foreach query in all_SQL_queries: // set of all SQL queries that Router can execute during its operation
  result = mysql_query(query)
  if !result:
    msg = "Account verification failed: couldn't execute query <query>"
    if --strict:
      exit_with_error(msg)
    else:
      exit_with_warning(msg)