WL#11512: Allow XCom's event horizon value to be configurable

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

# Executive summary
The current implementation of the Group Communication System (GCS) component is
XCom. The event horizon is one of XCom's internal parameters that govern its
behaviour. XCom decides several consensus instances concurrently, and the event
horizon value specifies the maximum number of outstanding consensus instances
at any time. Therefore, XCom's performance on any particular deployment depends
on the event horizon value.
This worklog's goal is to allow users to tune the GCS' performance for their
particular deployments. Specifically, to allow users to configure XCom's event
horizon value.

## 1 User stories
* As a MySQL user, I want to configure XCom's event horizon value so that I am
  able to tune performance on my particular deployment.
# Requirements

## 1 Definitions
*New instance*: a server version that implements this worklog.

*Old instance*: a server version that does **not** implement this worklog.

*Compile-time default*: the default event horizon value (10) defined at
compile-time on old instances.

## 2 Functional
1. It **must** be possible to change a group's event horizon through any member
   of the group *if* **all** members are running a newer instance.
2. It **must not** be possible to change the group's event horizon value *if*
   **any** member is running an older instance.
3. It **must** be possible to add an older instance to a group *if* the group's
   event horizon value is the compile-time default.
4. It **must not** be possible to add an older instance to a group *if* the
   group's event horizon value is **not** the compile-time default.
5. It **must** be possible to add a newer instance to a group regardless of the
   group's event horizon value.
6. It **must not** be possible for an instance that is not in a group to change
   or inspect the event horizon.
7. It **must** be possible to set the event horizon to a value inside the domain
   [10, 200].

## 3 Non-functional
1. All members of a group **must** agree on the event horizon value for every
   group configuration.
2. All members of a group **must** use the same event horizon value when
   processing a message.
3. It **should** be possible to change a group's event horizon value without
   having to stop the group's regular message flow, or the group itself.

## 4 Discussion
The functional requirements 1--4 and non-functional requirements 1 and 2, are
safety properties: they are required for correct behaviour.

The functional requirements 1, 5, and 6, and non-functional requirement 3, are
usability requirements.
# Interface specification

## 1 Configuration

### 1.1 Modifying the event horizon
We create a new UDF `group_replication_set_write_concurrency(new_event_horizon)`
to reconfigure the group's event horizon value to `new_event_horizon` at
runtime.

For example:

    mysql> SELECT group_replication_set_write_concurrency(42);

The UDF **must** fail with an appropriate error code and message if
`new_event_horizon` is outside the domain of possible event horizon values.

But since `group_replication_set_write_concurrency` is asynchronous, its final
outcome is only reflected in the server log.
If succeeding would violate functional requirement 2 or 6,
`group_replication_set_write_concurrency` fails and that fact, and reason, is
logged in the server log.
Otherwise, `group_replication_set_write_concurrency` succeeds and that fact is
logged in the server log.

### 1.2 Inspecting the event horizon
We create a new UDF `group_replication_get_write_concurrency` to inspect the
group's event horizon value at runtime.

For example:

    mysql> SELECT group_replication_get_write_concurrency();
    +-------------------------------------------+
    | group_replication_get_write_concurrency() |
    +-------------------------------------------+
    |                                        42 |
    +-------------------------------------------+

The command **must** fail with an appropriate error code and message if the
event horizon cannot be inspected.
The commands fails if succeeding would violate functional requirement 6.

## 2 Conclusion
A group is bootstrapped with the default event horizon value.

An user can inspect a group's event horizon value by connecting to a member and
executing the `group_replication_get_write_concurrency` UDF.

An user can reconfigure a group's event horizon value by connecting to a member
and executing the `group_replication_set_write_concurrency` UDF.
# Design specification

## 1 Preliminaries
Each consensus instance is uniquely identified by a `synode_no`. `synode_no`s
are totally ordered.

Each group configuration is stored in a `site_def`.

Each consensus instance is associated with a group configuration, i.e. there is
a mapping from `synode_no` to `site_def`.

Consider a group reconfiguration `r` decided in consensus instance `c`.
Currently, the reconfiguration `r` takes effect starting from consensus
instance `c + n + 1`, where `n` is the event horizon value.
For example, consider a group with 3 members `m1`, `m2`, and `m3`.
If the reconfiguration `add_member(m4)` is decided in consensus instance `42`
and the event horizon value is `10`, then `m4` becomes part of the group from
consensus instance `42 + 10 + 1 = 53` onwards.

Besides being used in the calculation of when a reconfiguration takes place,
the event horizon is also used as a "high water mark" on the number of
outstanding consensus instances.
Consider that consensus instance `c` is the latest whose command has been
executed.
Currently, the group can decide, but **not** execute, up to consensus instance
`c + n` before consensus instance `c + 1` is executed, where `n` is the event
horizon value.
For example, if consensus instance `41` is the latest executed instance and the
event horizon value is `10`, the group can decide up to instance `41 + 10 = 51`
before instance `42` is executed.

## 2 Objective
Currently the event horizon value is static, i.e. it is defined at compile-time.
Our objective is to have a dynamic event horizon value, i.e. that can be
changed at runtime.

## 3 Design
Make the event horizon a group parameter.
In practice this means that each `site_def` **must** now contain an event
horizon value.
It also means that changing the event horizon value requires a group
reconfiguration equivalent to adding a member to the group, for example.
Before this worklog, we had the relation `site_def(synode_no)` where each
consensus instance (`synode_no`) had an associated group configuration
(`site_def`).
We also have a static event horizon value `event_horizon`.
After this worklog, we change the event horizon from a static value to a
relation `event_horizon(site_def)` where each group configuration (`site_def`)
has an associated event horizon value (`event_horizon`).

On the normal case, when a group reconfiguration is decided on consensus
instance `c`, the reconfiguration takes effect starting from consensus
instance `c + event_horizon(site_def(c))`, i.e. after event-horizon instances
according to the group configuration of instance `c`.

Similarly, on the normal case, if consensus instance `c` is the latest whose
decision has been executed, then the group can decide, but **not** execute, up
to consensus instance `c + event_horizon(site_def(c))` before consensus
instance `c + 1` is executed.

### 3.1 Join
IIUC, a node receives the group configuration (via `gcs_snapshot`) when it
joins the group.
The joining node **must** receive the event horizon as part of the group
configuration.
Here a gist of the flow:

1. Node asks seed to join group.
2. Seed, on behalf of node, proposes to add node to group.
3. Group agrees when node becomes part of group.
4. Node will eventually receive *are you alive?* messages when it becomes part
   of the group.
5. Node replies to *are you alive?* with *yes, but need configuration*.
6. Node receives configuration.
7. Node is good to go and will in the mean time receive missed decisions.

### 3.2 Upgrade/downgrade
Older instances of the server, without this worklog implemented, have a static
event horizon value defined at compile-time (10).
Therefore, we need to ensure the invariant that a group with at least an older
instance as a member **must** use an event horizon value of 10.

#### 3.2.1 Join
When a node using an older instance of the server wants to join a group that is
using an event horizon value different from 10, we need to ensure our invariant
remains true.

One way to to maintain correctness is to prevent the older instance from
joining the group.
This requires an explicit reconfiguration of the event horizon value before an
older instance can be added to a group, but it does not change settings from
under the user.
We will implement this.

#### 3.2.2 Event horizon reconfiguration
If at least one of the group's members is using an older instance of the server
then reconfigurations of the event horizon **must** be denied.
The denial **must** be reflected in the log.

#### 3.2.3 Concurrency between join and event horizon reconfiguration
There is also a possibility of breaking our invariant when there is a join
concurrent with a reconfiguration of the event horizon value.
We say a join and an event horizon reconfiguration are concurrent if one is
decided on consensus instance `i` and thus takes effect on instance `k`, and
the other is decided on consensus instance `j` such that `i < j < k`.
This means that reconfiguration `j` is decided on a state where
the effects of reconfiguration `i` have been scheduled but not yet applied.

For example, assume the event horizon value is `10`.
Consider the join of member `m` that uses an older instance of the server.
The join is decided at consensus instance `i` and thus takes effect from
instance `i + 10` onwards.
Now consider that someone issues a reconfiguration of the event horizon value
to `20` which is decided at instance `i + 1` and takes effect from instance
`i + 11` onwards.
The invariant is broken from instance `i + 11` onwards because we have member
`m` using an older instance of the server and the group's event horizon value is
**not** `10`.
A similar issue can occur if it is the event horizon reconfiguration that is
decided before the join.

A possible solution is to validate reconfigurations at execution time.
Because reconfigurations are state machine commands, they are executed
sequentially in the same order on all members.
Before executing a reconfiguration `r` we can validate it against the active
configuration and all pending configurations.
The active and pending configurations at this point are the same on all members
because reconfigurations are state machine commands.
If reconfiguration `r` is incompatible with the active or pending configurations
then `r` **must** not be applied.
(E.g. because we are reconfiguring the event horizon and there are older
instances in the group, or because we are adding an older istance to a group
with a custom event horizon value.)
We will implement this.

### 3.3 Relationship with the `pax_machine` cache.
Currently XCom's code assumes that `#cache > event_horizon * #members`, i.e.
there is at least one code path that assumes that it is always possible to
obtain a free cache entry.
Custom event horizon values, particularly ones much greater than the previous
compile-time value (10) can possibly break this invariant.
We **must** maintain this invariant.

We identify at least two possible solutions:

1. This proposal already validates reconfigurations to maintain correctness
   (§3.2.1).
   We can additionally validate that `#cache > event_horizon * #members` remains
   true if the reconfiguration takes place.
2. Dynamically grow the `pax_machine` cache so that the equation
   `#cache > event_horizon * #members` remains true when the reconfiguration
   takes place.

We will implement (1).

### 3.4 Logging
When a reconfiguration is applied, this fact **must** be logged.

When a reconfiguration is **not** applied, due to the reasons above (§3.2),
this fact and the reason **must** be logged.

### 3.5 Concurrency between event horizon reconfigurations
It is possible for two, or more, reconfigurations of the event horizon to occur
concurrently.
We say two reconfigurations are concurrent if one is decided on consensus
instance `i` and thus takes effect on instance `k`, and the other is decided on
consensus instance `j` such that `i < j < k`.
This means that reconfiguration `j` is decided on a state where the effects of
reconfiguration `i` have been scheduled but not yet applied.

Consider the following example, where `configs` is the map from instance to
configuration:

    configs = {
        a: start=0 eh=10
    }
    
    instance   40: cmd=... <--------------- latest executed
    instance   41: cmd=change eh to 1000 <- going to execute
    instance   42: cmd=change eh to 1
    ...
    instance   50: cmd=...

When we execute the configuration at instance `41`, we search for the latest
pending configuration that modifies the event horizon.
In this case there are none, so we install the new configuration at
`this_instance + event_horizon(active_config) + 1`, i.e. `41 + 10 + 1 = 52`.
This is what we already did before this WL.
From this point on we know that the system may start deciding up to
instance `52 + 1000 = 1052`:

    configs = {
        a: start=0  eh=10
        b: start=52 eh=1000
    }
    
    instance   40: cmd=...
    instance   41: cmd=change eh to 1000 <- latest executed
    instance   42: cmd=change eh to 1
    ...
    instance   50: cmd=...
    instance   51: cmd=...
    instance   52: cmd=... <--------------- start(b)
    ...

However, let us consider the case where there is a concurrent reconfiguration:

    configs = {
        a: start=0  eh=10
        b: start=52 eh=1000
    }
    
    instance   40: cmd=...
    instance   41: cmd=change eh to 1000 <- latest executed
    instance   42: cmd=change eh to 1 <---- going to execute
    ...
    instance   50: cmd=...
    instance   51: cmd=...
    instance   52: cmd=... <--------------- start(b)
    ...

Again, we search for the latest pending configuration that modifies the event
horizon, which is configuration `b`.
(Configuration `b` is pending because `this_instance < start(b)`, i.e.
`42 < 52`.)
We apply the new configuration `c` *after* `b`.
This means that the new configuration `c` with an event horizon of `1` will take
effect at instance `start(b) + event_horizon(b) + 1`, i.e.
`52 + 1000 + 1 = 1053`.
From this point on we know that the system may start deciding up to
instance `1053 + 1 = 1054`:

    configs = {
        a: start=0    eh=10
        b: start=52   eh=1000
        c: start=1053 eh=1
    }
    
    instance   40: cmd=...
    instance   41: cmd=change eh to 1000
    instance   42: cmd=change eh to 1 <---- latest executed
    ...
    instance   50: cmd=...
    instance   51: cmd=...
    instance   52: cmd=... <--------------- start(b)
    ...
    instance 1053: cmd=... <--------------- start(c)

Using the same rationale for multiple reconfigurations:

    configs = {
        a: start=0    eh=10
        b: start=52   eh=1000
        c: start=1053 eh=1
    }
    
    instance   40: cmd=...
    instance   41: cmd=change eh to 1000
    instance   42: cmd=change eh to 1 <---- latest executed
    instance   43: cmd=change eh to 100 <-- going to execute
    ...
    instance   50: cmd=...
    instance   51: cmd=...
    instance   52: cmd=... <--------------- start(b)
    ...
    instance 1053: cmd=... <--------------- start(c)

The latest pending event horizon reconfiguration is `c`, so we apply the new
configuration at `start(c) + event_horizon(c) + 1`, i.e. `1053 + 1 + 1 = 1055`:

    configs = {
        a: start=0    eh=10
        b: start=52   eh=1000
        c: start=1053 eh=1
        d: start=1055 eh=100
    }
    
    instance   40: cmd=...
    instance   41: cmd=change eh to 1000
    instance   42: cmd=change eh to 1
    instance   43: cmd=change eh to 100 <-- latest executed
    ...
    instance   50: cmd=...
    instance   51: cmd=...
    instance   52: cmd=... <--------------- start(b)
    ...
    instance 1053: cmd=... <--------------- start(c)
    ...
    instance 1055: cmd=... <--------------- start(d)

Succintly, we linearize concurrent configurations if one of them reconfigures
the event horizon.
Hence, for any given concurrent reconfiguration commands `x` and `y` s.t.
`x` reconfigures the event horizon and `instance(x) < instance(y)`, there are
always `event_horizon(x)` instances between `start(x)` and `start(y)`.
Looking at the same diagrams as above:

    configs = {
        a: start=0    eh=10
        b: start=52   eh=1000
        c: start=1053 eh=1
        d: start=1055 eh=100
    }
    
    instance   40: cmd=...
    instance   41: cmd=change eh to 1000
    instance   42: cmd=change eh to 1
    instance   43: cmd=change eh to 100 <-- latest executed
    ...
    instance   50: cmd=...
    instance   51: cmd=...
    instance   52: cmd=... <--------------- start(b)
    [there are event_horizon(b) instances in this window]
    instance 1053: cmd=... <--------------- start(c)
    [there are event_horizon(c) instances in this window]
    instance 1055: cmd=... <--------------- start(d)

### 3.6 Algorithms
#### 3.6.1 Reconfigure event horizon

    procedure apply_config(consensus_instance, new_config):
      // are event horizon reconfigurations scheduled?
      if event_horizon_reconfigurations_pending():
        // yes, apply this reconfiguration after the latest
        after_config := get_latest_event_horizon_reconfig()
        new_config.start(after_config.start + after_config.event_horizon + 1)
        install_config(new_config)
      else:
        // no event horizon reconfigurations scheduled, apply normally
        active_config := get_active_config()
        new_config.start(consensus_instance + active_config.event_horizon + 1)
        install_config(new_config)

    function reconfig_event_horizon(consensus_instance, event_horizon) -> boolean:
      success := yes
      new_config := create_new_config()
      new_config.event_horizon(event_horizon)
      latest_config := get_latest_config()
      // can we reconfigure?
      if no_old_servers(latest_config) or is_default(event_horizon):
        apply_config(consensus_instance, new_config)
      else:
        success := no
      return success

`void handle_config(app_data_ptr)` will implement this algorithm.

#### 3.6.2 Add new member

    function reconfig_new_member(consensus_instance, new_member) -> boolean:
      success := yes
      new_config := create_new_config()
      new_config.add_member(new_member)
      latest_config := get_latest_config()
      // is the configuration compatible with the new member?
      if is_compatible(latest_config, new_member):
        apply_config(consensus_instance, new_config) // see §3.6.1
      else:
        success := no
      return success

`void handle_config(app_data_ptr)` will implement this algorithm.

#### 3.6.3 Check if instance is inside event horizon
The function below features its current (before this WL) algorithm:

    function is_instance_inside_event_horizon(consensus_instance) -> boolean:
      active_config := get_active_config()
      last_executed_instance := get_last_executed_instance()
      threshold := last_executed_instance + active_config.event_horizon
      return consensus_instance <= threshold

The function above may clash with `executor_task`'s exit logic if there are
reconfigurations that decrease the event horizon.
The rationale of that logic is as follows.
Consider a node `n` which is leaving the group.
A new reconfiguration command `c`, which removes `n` from the group, is decided
at consensus instance `decided(c)`.
When it comes the time to execute instance `decided(c)`, the `executor_task`
performs two actions:

* Computes `start(c)`, the instance where configuration `c` takes effect; and
* Computes `can_leave(c) = start(c) + event_horizon(c)`, the instance that, when
  decided, ensures that all instances up to `start(c)` were executed on a
  majority.

The invariant that, when `can_leave(c)` is decided, then `start(c) - 1` has been
executed by a majority of the remaining nodes in the new configuration, can be
violated if we use `is_instance_inside_event_horizon` as defined above.

Consider the following example, where `configs` is the map from instance to
configuration:

    configs = {
        a: start=0 eh=10 nodes={n,...}
    }
               instance   40: cmd=... <------------ latest executed
             / instance   41: cmd=change eh to 2 <- going to execute
    event   |  instance   42: cmd=remove n
    horizon |  ...
             \ instance   50: cmd=...

We execute an event horizon reconfiguration from 10 to 2:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2 <- latest executed
             / instance   42: cmd=remove n <------- going to execute
    event   |  ...
    horizon |  instance   50: cmd=...
             \ instance   51: cmd=...
               instance   52: cmd=...

And now we execute the removal of `n`:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n <------- latest executed
             / ...
    event   |  instance   50: cmd=...
    horizon |  instance   51: cmd=...
             \ instance   52: cmd=...
               instance   53: cmd=...
               instance   54: cmd=...
               instance   55: cmd=... <------------ start(c)
               instance   56: cmd=...
               instance   57: cmd=... <------------ can_leave(c)

If we now roll forward until we execute instance `47`:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n
               ...
               instance   47: cmd=... <------------ latest executed
             / ...
    event   |  instance   50: cmd=...
    horizon |  instance   51: cmd=...
            |  instance   52: cmd=...
            |  instance   53: cmd=...
            |  instance   54: cmd=...
            |  instance   55: cmd=... <------------ start(c)
            |  instance   56: cmd=...
             \ instance   57: cmd=... <------------ can_leave(c)

It is apparent that after executing instance `47`, it is possible
to decide instance `can_leave(c) = 57`, which means we have violated the
invariant that `start(c) - 1` (54) was executed by a majority when
`can_leave(c)` (57) is decided.

The core issue here is that `is_instance_inside_event_horizon` does not take
into account pending event horizon reconfigurations, which can violate the
invariant required by `executor_task`.
We solve this issue by *not* extending the event horizon until it stabilizes on
the reconfigured size:

    function is_instance_inside_event_horizon(consensus_instance) -> boolean:
      last_executed_instance := get_last_executed_instance()
      active_event_horizon := get_active_config().event_horizon
      // are event horizon reconfigurations scheduled?
      if event_horizon_reconfigurations_pending():
        // compute common-case threshold
        possibly_unsafe_threshold := last_executed_instance + active_event_horizon
        // compute threshold taking into account first event horizon reconfiguration
        new_config := get_first_event_horizon_reconfig()
        maximum_safe_threshold := new_config.start - 1 + new_config.event_horizon
        // use the minimum of both for safety
        threshold := min(possibly_unsafe_threshold, maximum_safe_threshold)
      else:
        // no event horizon reconfigurations scheduled, common case
        threshold := last_executed_instance + active_event_horizon
      return consensus_instance <= threshold

With this new function, let us go back to our example:

    configs = {
        a: start=0 eh=10 nodes={n,...}
    }
               instance   40: cmd=... <------------ latest executed
             / instance   41: cmd=change eh to 2 <- going to execute
    event   |  instance   42: cmd=remove n
    horizon |  ...
             \ instance   50: cmd=...

We execute an event horizon reconfiguration from 10 to 2:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2 <- latest executed
             / instance   42: cmd=remove n <------- going to execute
    event   |  ...
    horizon |  instance   50: cmd=...
             \ instance   51: cmd=...
               instance   52: cmd=... <------------ start(b)

And now we execute the removal of `n`:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n <------- latest executed
             / ...
    event   |  instance   50: cmd=...
    horizon |  instance   51: cmd=...
             \ instance   52: cmd=... <------------ start(b)
               instance   53: cmd=...
               instance   54: cmd=...
               instance   55: cmd=... <------------ start(c)
               instance   56: cmd=...
               instance   57: cmd=... <------------ can_leave(c)

If we now roll forward until we execute instance `47`:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n
               ...
               instance   47: cmd=... <------------ latest executed
             / ...
    event   |  instance   50: cmd=...
    horizon |  instance   51: cmd=...
            |  instance   52: cmd=... <------------ start(b)
             \ instance   53: cmd=...
               instance   54: cmd=...
               instance   55: cmd=... <------------ start(c)
               instance   56: cmd=...
               instance   57: cmd=... <------------ can_leave(c)

The event horizon stops extending beyond `53` (`52 - 1 + 2`) until
configuration `b` (that decreases the event horizon to `2`) takes effect, which
happens on instance `52`.
For example, when we execute instance `50`:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n
               ...
               instance   47: cmd=...
               ...
               instance   50: cmd=... <------------ latest executed
    event    / instance   51: cmd=...
    horizon |  instance   52: cmd=... <------------ start(b)
             \ instance   53: cmd=...
               instance   54: cmd=...
               instance   55: cmd=... <------------ start(c)
               instance   56: cmd=...
               instance   57: cmd=... <------------ can_leave(c)

And then execute instace `51`:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n
               ...
               instance   47: cmd=...
               ...
               instance   50: cmd=...
               instance   51: cmd=... <------------ latest executed
    event    / instance   52: cmd=... <------------ start(b)
    horizon  \ instance   53: cmd=...
               instance   54: cmd=...
               instance   55: cmd=... <------------ start(c)
               instance   56: cmd=...
               instance   57: cmd=... <------------ can_leave(c)

And finally execute instance `52`, when the event horizon reconfiguration takes
place:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n
               ...
               instance   47: cmd=...
               ...
               instance   50: cmd=...
               instance   51: cmd=...
               instance   52: cmd=... <------------ latest executed, start(b)
    event    / instance   53: cmd=...
    horizon  \ instance   54: cmd=...
               instance   55: cmd=... <------------ start(c)
               instance   56: cmd=...
               instance   57: cmd=... <------------ can_leave(c)


And if we roll forward until `can_leave(c)` is within the event horizon:

    configs = {
        a: start=0  eh=10 nodes={n,...}
        b: start=52 eh=2  nodes={n,...}
        c: start=55 eh=2  nodes={...}
    }
               instance   40: cmd=...
               instance   41: cmd=change eh to 2
               instance   42: cmd=remove n
               ...
               instance   47: cmd=...
               ...
               instance   50: cmd=...
               instance   51: cmd=...
               instance   52: cmd=... <------------ start(b)
               instance   53: cmd=...
               instance   54: cmd=...
               instance   55: cmd=... <------------ latest executed, start(c)
    event    / instance   56: cmd=...
    horizon  \ instance   57: cmd=... <------------ can_leave(c)

We can see that the invariant is maintained, i.e. instance `start(c) - 1` (54)
has been executed by a majority when `can_leave(c)` (57) is decided.

`int too_far(synode_no)` will implement this algorithm.