The Group Replication distributed recovery procedure can be summarized as the process through which a new server gets missing data from a online server in the group, while listening for events happening in the group. During recovery a server listens to membership events and also to the transactions that are happening while recovery is happening. This is a high level summary. The following sections provide additional detail, by describing the two phases of the procedure.
In the first phase, the joiner (joining server), will select one of the online servers on the group to be the donor of the state that it is missing. The donor is responsible for handing the joiner all the data it is missing up to the moment it has joined the group. This is achieved by relying on a standard asynchronous replication channel, established between donor and joiner. Through that channel, binary logs flow up until the point that the view change happened when the joiner became part of the group. While the joiner receives binary logs from the donor, it is also applying them.
Furthermore, while the binary logs transfer is ongoing, the joiner is also caching every transaction that is exchanged within the group. I.e., it is listening for transactions that are happening after it joined the group and while it is applying the missing state from the donor. When the first phase ends and the replication channel to the donor is closed, the joiner then starts phase two: the catch up.
In this phase, the joiner proceeds to the execution of the cached transactions and when the number of transactions queued for execution finally reaches zero, then the member is declared online.
The recovery procedure withstands donor failures while the joiner is fetching binary logs from it. In such cases, whenever a donor fails during phase 1, the joiner fails over to a new donor and resumes from that one. When that happens the joiner closes the connection to the failed joiner explicitly and opens a connection to a new donor. This happens automatically.