WL#9850: MySQL GCS: Change XCom interface to queues instead of sockets

Affects: Server-8.0 — Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

Executive summary

XCom instances communicate between themselves using TCP. Currently, the GCS layer of a MySQL instance uses TCP as well to send requests to its local XCom instance. This approach has the benefit of code simplicity and uniformity but at the cost of performance. Examples of overhead we incur due to communication with the local XCom via networking mechanisms include:

(De)serialization of requests, and
Memory copying throughout the network stack.

Goal

The goal of this WL is to change the way we send some requests to XCom, from the sockets that exist today, to a new, flexible, input mechanism decoupled from XCom itself. This will allow us to avoid the overheads described above, and to move us one step closer to the vision of "Modularize XCom."

User stories

As a MySQL user, I do not want to incur in unnecessary performance overhead when GCS communicates with its local XCom instance.

Scope

This WL will not introduce any user-visible features to MySQL. The outcome of this WL will be:

The design and implementation of a new input mechanism in XCom, and
The integration of the new input mechanism in GCS.

Requirements

1 Functional

It must be possible to communicate with the local XCom instance using the new input mechanism.
It must be possible to communicate with any XCom instance using the original TCP input, i.e. both input mechanisms must coexist.

2 Non-functional

Performance regression must not occur.

Interface specification

This WL does not affect any externally-observable interfaces.

Design specification

1 Definitions

Throughout this section we will refer to the new producer-defined input mechanism as input channel.

2 Objective

Currently an XCom instance only accepts requests via TCP. Our objective is to create an input channel from which XCom will consume requests. The input channel must be decoupled from XCom, i.e. the channel's details are implemented by the producer.

3 Design

XCom has a single thread running an event-driven loop. This restricts the way XCom consumes requests from the input channel. For example, it cannot block waiting for requests because that would block the entire XCom processing. But XCom's single-threaded nature also simplifies the requirements of the input channel. Specifically, it is sufficient for the channel to support the multiple-producer single-consumer model. While many user threads may concurrently produce requests, they will only be consumed by XCom's single thread.

The input channel must integrate cleanly with XCom's current event-driven model. The design achieves this objective via two new API categories: the input API, and the notification API. The input API allows the producer to "hook up" the logic to consume requests from the channel, and the logic of how to reply to the request. The notification API allows the producer to, upon producing a request to the channel, generate an event that will make XCom consume the request from the channel. Here is a rough diagram:

Producer                              XCom
|                   +generates event  |
produces a request -----------------> +waits for event <--.
                                      |                   |
                                      *consumes a request |
                                      |                   |
                                      processes request   |
                                      |                   |
                                      *replies to request |
                                      |___________________|

The steps prepended with * are responsability of the input API (§2.1), and the ones prepended with + are responsability of the notification API (§2.2).

3.1 Input API

To produce a request to be consumed by XCom, the producer:

1. creates a new request
2. pushes the request to the input channel
3. notifies XCom that the input channel has requests to process

To create a request in step 1, XCom will expose functionality similar to the following:

type xcom_request_reply_cb: (void *, pax_msg *) -> void
function new_xcom_request(app_data_ptr a, xcom_request_reply_cb reply_function, void *reply_arg)

The producer specifies a function of type xcom_request_reply_cb that XCom will use to reply after processing the request. The first parameter of the function is a generic pointer that the producer can use to pass any necessary data to the reply function. The second parameter is a pax_msg with XCom's reply. The request will contain the reply function and its generic argument, and XCom will call the reply function accordingly after it processes the request. Note that a reply function that "does nothing" can be used to achieve fire-and-forget semantics for a request.

The producer must implement step 3 via the notification API, described below.

To consume a request, XCom will expose functionality similar to the following:

type xcom_input_try_pop_cb: void -> app_data_ptr
procedure set_xcom_input_try_pop_cb(xcom_input_try_pop_cb consume_hook)

The producer registers a function of type xcom_input_try_pop_cb, which must encapsulate the necessary logic to consume and return a request from the input channel. XCom will react to the the notification that the channel has requests and consume them.

Note that these functions need to be thread-safe because they will be concurrent. The concurrency matrix is the following:

                       Concurrent with | Pushing a request (producer) | xcom_input_try_pop_cb (XCom, consumer)
--------------------------------------------------------------------------------------------------------------
          Pushing a request (producer) | yes                          | yes
xcom_input_try_pop_cb (XCom, consumer) | yes                          | no

Commands may be pushed concurrently because user threads may produce requests concurrently. Pushing requests may be concurrent with xcom_input_try_pop_cb because user threads may produce requests while XCom is consuming. xcom_input_try_pop_cb is not concurrent with itself because XCom uses a single thread of execution.

3.2 Notification API

XCom will expose functionality similar to the following:

function xcom_input_notify() -> boolean

The producer must call this function as part of its logic to produce a request (§2.1). If this function returns false then it is not possible to notify XCom.

3.3 XCom internals

Assuming that the producer has set up the hook from §2.1, and uses the function from §2.2 when producing requests, this WL adds the following additional logic to XCom:

while xcom_is_running:
    wait_for_notification_event()
    request := input_consume_hook()
    dispatch(request)
    reply(request)

4 GCS adaptation

With this WL three ways to send requests to XCom will exist, each with different properties:

Via TCP, following a synchronous request-reply model
Via the new input mechanism, following an asynchronous fire-and-forget model
Via the new input mechanism, following an asynchronous request-reply model

4.1 TCP socket

Using the existing TCP socket method, GCS can communicate with remote XCom instances, and receives the following replies, depending on the request made:

I will process your request
I will not process your request
Here's the data you requested

4.2 New input mechanism

Using the new input mechanism, GCS communicates exclusively with the local XCom instance. There will be two ways to send a request to XCom, each with its own flavor.

4.2.1 Fire-and-forget

The first way has fire-and-forget semantics, i.e. we eonly receive feedback on whether the request was successfuly pushed to the input channel or not.

4.2.2 Request-reply

The other way has request-reply semantics. The reply will come via a future, and the replies are equivalent to the TCP socket method.

4.3 Adaptation guidelines

GCS can decide which method to use for any request it needs to send to XCom. However, depending on the circumstances, a particular method may be necessary due to its properties. Below we outline some rules to aid the adaptation of the existings calls to XCom made by GCS.

4.3.1 Bootstrap

During the bootstrap process, GCS:

Communicates with the local XCom instance
Checks for success asynchronously

Therefore, GCS can use the new input mechanism's fire-and-forget model.

4.3.2 Join

During the join process, GCS:

Communicates with a remote XCom instance

Therefore, GCS must use the TCP socket.

4.3.3 Remove node(s)

During the process of removing node(s), GCS:

Communicates with the local XCom instance
Checks for success asynchronously

Therefore, GCS can use the new input mechanism's fire-and-forget model.

4.3.4 Send message

During the process of sending a message, GCS:

Communicates with the local XCom instance

Therefore, GCS can use the new input mechanism's fire-and-forget model. The implementation should use it, because it can bypass overheads that are proportional to the message size, such as copying and (de)serialization.

4.3.5 Force membership

During the process of forcing the membership, GCS:

Communicates with the local XCom instance

Therefore, GCS can use the new input mechanism's fire-and-forget model.

4.3.6 Inspect the event horizon

During the process of inspecting the event horizon, GCS:

Communicates with the local XCom instance
Requests the value of the event horizon synchronously

Therefore, GCS can use the new input mechanism's request-reply model.

4.3.7 Modify the event horizon

During the process of modifying the event horizon, GCS:

Communicates with the local XCom instance

Therefore, GCS can use the new input mechanism's fire-and-forget model.