WL#12999: Make the client get a better error message on wait_timeout timeout

Affects: Server-8.0   —   Status: Complete

This is to track BUG#93240

From the bug: > If clients are not using the connection within wait_timeout, then they will be closed by the server and the server can write a non-clear error message to the error log: > [Note] Aborted connection <connection_id> to db: '<db>' user: '<user>' host: > '<host>' (Got timeout reading communication packets)

> Which also is the same as if there is an error due to net_read_timeout.

> And the client is not informed of the reason for the closed connection, it typically sees: > 2013: 'Lost connection to MySQL server during query' > or > 2006: 'MySQL server has gone away'

> These are very generic errors and does not give a clear hint that the > connection was closed by the server due to wait_timeout.

This worklog is about the message the client receives. BUG#93240 will have the improvement of the server error log message.

R1. The new feature must not impact old clients. If a timeout occurs and the client gets a write error while trying to communicate with the server it will close both read and write sockets and report that an unknown connection error occurred.

R2. In case of a server induced timeout, the server must try to send an error message to the client before closing the connection.

R3. If the client suffers an error while trying to send to the server, it must try to read the connection to see if there's any error messages waiting before closing the connection.

R4. If a new clients try to connect to an old server and experience a write error, it will first try to read the socket but if there's no message it will fall back on reporting that an unknown connection error occurred.

Contents


Notes on timeout implementation.

Server side

Client time out (ER_CLIENT_INTERACTION_TIMEOUT, ER_NET_WAIT_TIMEOUT) is determined by the server and happens before the first packet is sent or read and if vio_errno(vio) == SOCKET_ETIMEDOUT

Timeout is set in execute_init_command(...) when my_net_set_read_timeout(...) is called with a parameter based on the wait_timeout or interactive_timeout SQL variables.

This in turn sets two states: One in net->read_timeout and one in vio_timeout(..)

The latter sets vio->read_timeout = timeout_ms; which means we have a redundancy where net->readtimeout == net->vio->read_timeout.

If there's a vio->timeout function then vio->timeout(vio, ..) is called vio->timeout = vio_socket_timeout(...).

In vio_socket_timeout(..) code path diverge based on platform or if SSL is used or not. SSL wraps the receive and send calls and handles timeout by itself.

If the server writes to the client and the timeout timer has triggered then the server will try to push an error to the client, iff no other packets have been sent since last interaction. This message will wait for an ACK from the client or timeout.

Client side

Looking at cli_advanced_command() on the client side.

This client function calls net_write_command() [->net_write_buff() -> net_write_packet() ] which can fail or succeed. If it fails then net->last_errno and net->error will tell cli_advanced_command() what went wrong.

In case there's a WRITE error then it will try to READ from the socket using cli_safe_read() to capture any error message package.

cli_safe_read() is just a wrapper for cli_safe_read_with_ok() which has the following call chain: my_net_read(net) -> -> cli_safe_read_with_ok_complete(...). The last function will receive the error message the server sent (through the buffer read by my_net_read).

The patch also sets the timeout for recv() to 0s. The reason is likely to avoid the client having to wait for two timeouts from the server. However, it opens up for the slim possibility that we try to read before the server error message arrived. The client will then not know why it was closed and the server error message send will have to wait until the connection times out because it is in turn expecting an ACK from a connection which is closed for receiving. The risk that this happen seems very small, because the server will send the error messages immediately after closing the connection to the client which hasn't done anything up until this point.

Looking at this from the other way around: Could a previously received error message drop out from the TCP buffer before it is delivered to the client? TCP level variables like TCP_USER_TIMEOUT might affect if the client ever sees the server message before it is removed.

net->error states

Rules

NET_ERROR_UNSET -> NET_ERROR_UNUSABLE

  := If error occurs which prevent reading or writing of data. For example if the compression protocol fails, or packets are unordered.

NET_ERROR_UNSET -> NET_ERROR_SOCKET_NOT_READABLE

 := If error occurred while trying to WRITE to a socket.

NET_ERROR_UNSET -> NET_ERROR_SOCKET_NOT_WRITABLE

 := If error occurred while trying to READ from a socket.

NET_ERROR_UNSET -> NET_ERROR_SOCKET_RECOVERABLE

 := If the NET packet is larger than max_packet_size the socket can still be used to try again.
    If we fail to allocate enough space for the packet.

NET_ERROR_SOCKET_NOT_READABLE -> NET_ERROR_UNUSABLE

 := If we tried to WRITE to a socket in this state but failed.

NET_ERROR_SOCKET_NOT_WRITABLE -> NET_ERROR_UNUSABLE

 := If we tried to READ from a socket in this state but failed.