Topics in this section:
to reduce the latency and CPU usage.
The client should decode the messages it receives from the server in a generic way and track the possible messages with a state-machine.
def getMessage(self, message): ## handle out-of-band message msg = messageFactory(message.type).fromString(message.payload) if message.type is Notification: notification_queue.add(msg) raise NoMessageError() if message.type is Notice: notice_queue.add(msg) raise NoMessageError() return msg
The client may send several messages to the server without waiting for a response for each message.
Instead of waiting for the response to a message like in:
the client can generate its messages and send it to the server without waiting:
The client has to ensure that when pipeline messages that in case of an error the following messages also error out correctly:
In network programming it is pretty common the to prefix the message payload with the header:
HTTP header + HTTP content
a pipeline of messages
message header + protobuf message
import struct import socket s = socket.create_connection(( "127.0.0.1", 33060)) msg_type = 1 msg_payload = "abc" msg_header = struct.pack(">I", len(msg_payload)) + struct.pack("B", msg_type) ## concat before send s.send(msg_header + msg_payload) ## multiple syscalls s.send(msg_header) s.send(msg_payload) ## vectored I/O s.sendmsg([ msg_header, msg_payload ])
concat before send leads to pretty wasteful reallocations and copy operations if the payload is huge.
multiple syscalls is pretty wasteful for small messages as a few bytes only the whole machinery of copying data between user land and kernel land has to be started.
vectored io combines the best of both approaches and sends multiple buffers to the OS in one syscall and OS can optimize sending multiple buffers in on TCP packet.
On Unix this is handled by
Any good buffered iostream implementation should already make use of vectored I/O.
Known good implementation:
Further control about how when to actual send data to the other endpoint can be achieved with "corking":
They work in combination with
(aka Nagle's Algorithm).
The protocol is structured in a way that the messages can be decoded completely without of knowing the state of the message sequence.
If data is available on the network, the server has to:
read the message
decode the message
execute the message
Instead of a synchronous read-execution cycle:
the Reader and the Executor can be decoupled into separate threads:
which allows to hide cost of decoding the message behind the execution of the previous message.
The amount of messages that are prefetched this way should be configurable to allow a trade-off between: