Topics in this section:

Client and Server implementations of the protocol should make use of the following:

vectorized IO
pipelining

to reduce the latency and CPU usage.

Client

Out-of-Band Messages

The client should decode the messages it receives from the server in a generic way and track the possible messages with a state-machine.

def getMessage(self, message):
  ## handle out-of-band message
  msg = messageFactory(message.type).fromString(message.payload)
 
  if message.type is Notification:
     notification_queue.add(msg)
     raise NoMessageError()
 
  if message.type is Notice:
     notice_queue.add(msg)
     raise NoMessageError()
 
  return msg

Pipelining

The client may send several messages to the server without waiting for a response for each message.

Instead of waiting for the response to a message like in:

Client Pipeline

the client can generate its messages and send it to the server without waiting:

Client Pipeline

The client has to ensure that when pipeline messages that in case of an error the following messages also error out correctly:

Client Pipeline

Vectored I/O

In network programming it is pretty common the to prefix the message payload with the header:

HTTP header + HTTP content
a pipeline of messages
message header + protobuf message

import struct
import socket
 
s = socket.create_connection(( "127.0.0.1", 33060))
 
msg_type = 1
msg_payload = "abc"
msg_header = struct.pack(">I", len(msg_payload)) +
             struct.pack("B", msg_type)
 
## concat before send
s.send(msg_header + msg_payload)
 
## multiple syscalls
s.send(msg_header)
s.send(msg_payload)
 
## vectored I/O
s.sendmsg([ msg_header, msg_payload ])

concat before send* leads to pretty wasteful reallocations and copy operations if the payload is huge.

multiple syscalls* is pretty wasteful for small messages as a few bytes only the whole machinery of copying data between user land and kernel land has to be started.

vectored io* combines the best of both approaches and sends multiple buffers to the OS in one syscall and OS can optimize sending multiple buffers in on TCP packet.

On Unix this is handled by writev(2), on Windows exists WSASend()

Note: Any good buffered iostream implementation should already make use of vectored I/O.

Known good implementation:

Boost::ASIO
GIO's GBufferedIOStream

Corking

Further control about how and when to actually send data to the other endpoint can be achieved with "corking":

linux: TCP_CORK http://linux.die.net/man/7/tcp
freebsd/macosx: TCP_NOPUSH https://www.freebsd.org/cgi/man.cgi?query=tcp&sektion=4&manpath=FreeBSD+9.0-RELEASE

They work in combination with TCP_NODELAY (aka Nagle's Algorithm).

http://stackoverflow.com/questions/3761276/when-should-i-use-tcp-nodelay-and-when-tcp-cork?rq=1

Server

Pipelining

The protocol is structured in a way that the messages can be decoded completely without of knowing the state of the message sequence.

If data is available on the network, the server has to:

read the message
decode the message
execute the message

Instead of a synchronous read-execution cycle:

Server Pipeline

the Reader and the Executor can be decoupled into separate threads:

Separate Threads

which allows to hide cost of decoding the message behind the execution of the previous message.

The amount of messages that are prefetched this way should be configurable to allow a trade-off between:

resource usage
parallelism