WL#10433: Group Replication: Member weight for automatic primary election on failover
Affects: Server-8.0
—
Status: Complete
PROBLEM STATEMENT ================= Currently the user doesn't have any option to control the outcome of the primary election algorithm in single-primary mode. The primary member is selected based on lowest server uuid. PROPOSED SOLUTIONS ================== This worklog will enable the user to influence the primary member election in single-primary mode by providing a member weight value for each member node. This member weight value will be used for electing primary member instead of member uuid which was getting used so far. This allows users to: - Select a specific primary for the next election (e.g. planned maintenance on current primary). - Ensure that primaries are always in the local/primary Data Center. - Ensure that primaries are always on a "bigger machine profile".
FR1: The member with highest member weight will be selected as primary.
FR2: When two or more members have same member weight, member with
lower uuid will be elected as a primary member.
FR3: All members broadcast their member weight whenever a new view gets
installed, so elections can be held on each member independently.
FR4: The elections are held only when either there is no primary or current
primary leaves the group.
FR5: The weight change on existing member or the addition of a new member
with a higher weight will not cause a new primary elections.
FR6: When a group contain members of two different versions, one which elect
primary based on server uuid, and other based on member weight, than
older primary election algorithm based on server uuid will be used to
elect primary member for whole group.
FR7: When the group contains members with different versions and all support
weight based election, only the members with the lowest version should
participate in the election.
In group replication single-primary mode, currently the smallest uuid in the
group is elected as primary member. And when this primary member leaves the
next member with smallest uuid is elected as primary member. The elections are
held only if there is no primary defined for the group.
If the user want to prioritize certian members(s) for primary member
election, currently he has no way to do that. So a new plugin system variable
'group_replication_member_weight' will be introduced having integer values
between 0 and 100. The default value for group_replication_member_weight is
kept as 50 so that a user when adding up new node without consideration,
will allow him later to tune new members to a higher or lower value without
messing with the existing ones. This new group_replication_member_weight
variable value will be used for new primary leader elections.
The election based on group_replication_member_weight variable will be as
follows:
1. The member with highest group_replication_member_weight will be selected
as primary.
2. When two or more members have same group_replication_member_weight value,
member with lower uuid will be elected as primary.
3. When primary goes down, the member with next highest
group_replication_member_weight value will be elected as primary.
4. For election group_replication_member_weight value of every member of the
group is required on every member node, so group_replication_member_weight
value will get broadcast to all group members.
5. For a group containing members of two different versions:
- older version which elects primary based on server uuid
- newer version which elects primary based on member weight
than older primary election algorithm based on server uuid will be used.
6. For a group containing members of two different versions and both support
weight based election, only the lowest version members are candidates for
the election.
1. Introduce a new plugin system variable 'group_replication_member_weight'
- Name: group_replication_member_weight
- Input Values: [0, 100]
- Default: 50
- Description: This option influence chances for particular member to be
elected as primary member.
- Dynamic: yes
- Scope: Global
- Type: Integer
2. To share member weight among group members-
1. Add a new enum type PIT_MEMBER_WEIGHT is added to
enum_payload_item_type of Group_member_info class.
2. The member weight information will be shared among members using
PIT_MEMBER_WEIGHT enum type which will get added and encoded to member
info payload in Group_member_info::encode_payload() and removed and
decoded in Group_member_info::decode_payload().
3. A new function Group_member_info::get_member_weight() will be added to
get member weight value.
3. The following changes will be done for primary member election:
1. A new marco PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION will be introduced
in gcs_event_handlers.cc with the value of major server release version
(8) of this WL.
#define PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION 8
2. A new Plugin_gcs_events_handler class function
sort_members_for_election() will be introduced, defined as follows:
std::vector::iterator
sort_members_for_election(
std::vector* all_members_info) const;
1. The input to sort_members_for_election() will be pointer to vector of
all members info (Group_member_info).
2. It will sort input members info vector in descending order of major
member version using comparator_group_member_version and
has_greater_version().
std::sort(all_members_info->begin(), all_members_info->end(),
Group_member_info::comparator_group_member_version);
bool
Group_member_info::comparator_group_member_version(
Group_member_info *m1, Group_member_info *m2)
{
return m2->has_greater_version(m1);
}
bool
Group_member_info::has_greater_version(Group_member_info *other)
{
if (this->member_version->get_major_version() >
other->member_version->get_major_version())
return true;
return false;
}
3. It will traverse and find first iterator position in input vector,
which is sorted in descending order of major member version, where
major member version differs. Lets call this first iterator position
lowest_version_members_pos. This members preceding to
lowest_version_members_pos position will only be used for primary
member election.
4. If the first member major version is equal to or greater than
PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION than input vector is sorted
till lowest_version_members_pos position saved in step 3.2.3 using
comparator_group_member_weight.
std::sort(all_members_info->begin(), first_iterator_position,
Group_member_info::comparator_group_member_weight);
bool
Group_member_info::comparator_group_member_weight(
Group_member_info *m1, Group_member_info *m2)
{
return m1->has_greater_weight(m2);
}
bool
Group_member_info::has_greater_weight(Group_member_info *other)
{
if (this->get_member_weight() > other->get_member_weight())
return true;
if (this->get_member_weight() == other->get_member_weight())
return has_greater_uuid(other);
return false;
}
5. Else if the first member major version is not equal to or greater than
PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION then its sorted till
lowest_version_members_pos position saved in step 3.2.3 using
comparator_group_member_uuid, as group contains members of older uuid
version as well.
std::sort(all_members_info->begin(), first_iterator_position,
Group_member_info::comparator_group_member_uuid);
bool
Group_member_info::comparator_group_member_uuid(
Group_member_info *m1, Group_member_info *m2)
{
return m2->has_greater_uuid(m1);
}
bool
Group_member_info::has_greater_uuid(Group_member_info *other)
{
return this->get_uuid().compare(other->get_uuid()) < 0;
}
6. The first iterator position saved in 3.2.3 is returned from
sort_members_for_election() function, so primary leader can be
selected only from members having lowest major version.
3. The sort_members_for_election() function will be invoked from
Plugin_gcs_events_handler::handle_leader_election_if_needed(), and primary
leader will be elected as first online member from sorted (based on either
uuid or member weight as explained in 3.2) member info vector till
lowest_version_members_pos position (returned from
sort_members_for_election()), if no primary leader present. If no valid
primary leader is found within the lowest version then the election fails.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.