WL#10433: Group Replication: Member weight for automatic primary election on failover
Affects: Server-8.0
—
Status: Complete
PROBLEM STATEMENT ================= Currently the user doesn't have any option to control the outcome of the primary election algorithm in single-primary mode. The primary member is selected based on lowest server uuid. PROPOSED SOLUTIONS ================== This worklog will enable the user to influence the primary member election in single-primary mode by providing a member weight value for each member node. This member weight value will be used for electing primary member instead of member uuid which was getting used so far. This allows users to: - Select a specific primary for the next election (e.g. planned maintenance on current primary). - Ensure that primaries are always in the local/primary Data Center. - Ensure that primaries are always on a "bigger machine profile".
FR1: The member with highest member weight will be selected as primary. FR2: When two or more members have same member weight, member with lower uuid will be elected as a primary member. FR3: All members broadcast their member weight whenever a new view gets installed, so elections can be held on each member independently. FR4: The elections are held only when either there is no primary or current primary leaves the group. FR5: The weight change on existing member or the addition of a new member with a higher weight will not cause a new primary elections. FR6: When a group contain members of two different versions, one which elect primary based on server uuid, and other based on member weight, than older primary election algorithm based on server uuid will be used to elect primary member for whole group. FR7: When the group contains members with different versions and all support weight based election, only the members with the lowest version should participate in the election.
In group replication single-primary mode, currently the smallest uuid in the group is elected as primary member. And when this primary member leaves the next member with smallest uuid is elected as primary member. The elections are held only if there is no primary defined for the group. If the user want to prioritize certian members(s) for primary member election, currently he has no way to do that. So a new plugin system variable 'group_replication_member_weight' will be introduced having integer values between 0 and 100. The default value for group_replication_member_weight is kept as 50 so that a user when adding up new node without consideration, will allow him later to tune new members to a higher or lower value without messing with the existing ones. This new group_replication_member_weight variable value will be used for new primary leader elections. The election based on group_replication_member_weight variable will be as follows: 1. The member with highest group_replication_member_weight will be selected as primary. 2. When two or more members have same group_replication_member_weight value, member with lower uuid will be elected as primary. 3. When primary goes down, the member with next highest group_replication_member_weight value will be elected as primary. 4. For election group_replication_member_weight value of every member of the group is required on every member node, so group_replication_member_weight value will get broadcast to all group members. 5. For a group containing members of two different versions: - older version which elects primary based on server uuid - newer version which elects primary based on member weight than older primary election algorithm based on server uuid will be used. 6. For a group containing members of two different versions and both support weight based election, only the lowest version members are candidates for the election.
1. Introduce a new plugin system variable 'group_replication_member_weight' - Name: group_replication_member_weight - Input Values: [0, 100] - Default: 50 - Description: This option influence chances for particular member to be elected as primary member. - Dynamic: yes - Scope: Global - Type: Integer 2. To share member weight among group members- 1. Add a new enum type PIT_MEMBER_WEIGHT is added to enum_payload_item_type of Group_member_info class. 2. The member weight information will be shared among members using PIT_MEMBER_WEIGHT enum type which will get added and encoded to member info payload in Group_member_info::encode_payload() and removed and decoded in Group_member_info::decode_payload(). 3. A new function Group_member_info::get_member_weight() will be added to get member weight value. 3. The following changes will be done for primary member election: 1. A new marco PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION will be introduced in gcs_event_handlers.cc with the value of major server release version (8) of this WL. #define PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION 8 2. A new Plugin_gcs_events_handler class function sort_members_for_election() will be introduced, defined as follows: std::vector::iterator sort_members_for_election( std::vector * all_members_info) const; 1. The input to sort_members_for_election() will be pointer to vector of all members info (Group_member_info). 2. It will sort input members info vector in descending order of major member version using comparator_group_member_version and has_greater_version(). std::sort(all_members_info->begin(), all_members_info->end(), Group_member_info::comparator_group_member_version); bool Group_member_info::comparator_group_member_version( Group_member_info *m1, Group_member_info *m2) { return m2->has_greater_version(m1); } bool Group_member_info::has_greater_version(Group_member_info *other) { if (this->member_version->get_major_version() > other->member_version->get_major_version()) return true; return false; } 3. It will traverse and find first iterator position in input vector, which is sorted in descending order of major member version, where major member version differs. Lets call this first iterator position lowest_version_members_pos. This members preceding to lowest_version_members_pos position will only be used for primary member election. 4. If the first member major version is equal to or greater than PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION than input vector is sorted till lowest_version_members_pos position saved in step 3.2.3 using comparator_group_member_weight. std::sort(all_members_info->begin(), first_iterator_position, Group_member_info::comparator_group_member_weight); bool Group_member_info::comparator_group_member_weight( Group_member_info *m1, Group_member_info *m2) { return m1->has_greater_weight(m2); } bool Group_member_info::has_greater_weight(Group_member_info *other) { if (this->get_member_weight() > other->get_member_weight()) return true; if (this->get_member_weight() == other->get_member_weight()) return has_greater_uuid(other); return false; } 5. Else if the first member major version is not equal to or greater than PRIMARY_ELECTION_MEMBER_WEIGHT_VERSION then its sorted till lowest_version_members_pos position saved in step 3.2.3 using comparator_group_member_uuid, as group contains members of older uuid version as well. std::sort(all_members_info->begin(), first_iterator_position, Group_member_info::comparator_group_member_uuid); bool Group_member_info::comparator_group_member_uuid( Group_member_info *m1, Group_member_info *m2) { return m2->has_greater_uuid(m1); } bool Group_member_info::has_greater_uuid(Group_member_info *other) { return this->get_uuid().compare(other->get_uuid()) < 0; } 6. The first iterator position saved in 3.2.3 is returned from sort_members_for_election() function, so primary leader can be selected only from members having lowest major version. 3. The sort_members_for_election() function will be invoked from Plugin_gcs_events_handler::handle_leader_election_if_needed(), and primary leader will be elected as first online member from sorted (based on either uuid or member weight as explained in 3.2) member info vector till lowest_version_members_pos position (returned from sort_members_for_election()), if no primary leader present. If no valid primary leader is found within the lowest version then the election fails.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.