WL#10703: scale to 50k connections

Affects: Server-8.0   —   Status: Complete

Motivation

Routers network core relies on a

1:1 thread-connection

design to forward connections. The design limits the number of connections the router can handle. OSes apply limits on:

  • number of threads per process
  • max memory used per process
  • number of open file-descriptors per process

which set the boundaries such a design can handle.

In WL#???? the limit of concurrent connections was raised from 500 to ~5000 connections by using poll() instead of select(), but the underlying design of 1-thread==1-connection was kept intact.

To go beyond 5k concurrent connections, the 1:1 design of the routing core needs to be replaced.

System Limits

MacOS X

number of threads per process is limited by the OS to based on installed memory:

RAM max_threads max_taskthreads
16G 20480 4096
32G 40960 8192

The values can be retrieved via sysctl:

sysctl hw.memsize
sysctl kern.num_taskthreads
sysctl kern.num_thread

Goal

Refactor the routing plugin into a event-driven + IO-threadpool design which:

  • uses non-blocking IO

    • no thread-stack issues
    • low memory usage
    • no thundering herd
  • uses a low number of threads (~num of cores)

    • no more max-threads-per-process limits
FR1
router MUST handle 50k or more connections

Configuration Options

section io

backend

io backend which handles async operations. The generic poll backend is available on all platforms, while each platform may provide faster, more scalable backends.

possible values
platform specific. on Linux: linux_epoll and poll, elsewhere poll.
default
best available platform specific backend.

threads

number of IO threads which handle connections.

possible values
  • 0 == as many as available CPU cores/threads
  • 1..1024 == number of io-threads. At runtime the system may restrict the upper limit further.
default
as many as available CPU cores/threads

Example

[io]
backend=linux_epoll
threads=32

Implementation

Currently, the routing plugin spawns 1 thread per connect:

  1. wait for listen-socket is readable
  2. accepts a connection
  3. finds a valid backend according to the list of destinations
  4. spawns a thread
  5. forwards the client/server data as is in the thread with block socket ops
  6. closes the thread when done

As the number threads a system can handle is limited, the routing plugin is changed to:

  1. spawn io-threads
  2. wait for listen-socket is readable
  3. accepts a connection
  4. finds a valid backend according to the list of destinations
  5. assign connection to an io-thread
  6. async-wait for client-socket to be readable
  7. async-wait for server-socket to be readable

Instead of running blocking socket operations in a thread, non-blocking IO is used and a thread may be used on when an socket becomes ready.

Implementation

Implementation is based on the networking-ts which provides:

  • portable socket layer
  • allows async-blocking socket ops
  • allows non-blocking socket-ops
  • allows run completion-handlers (callbacks) in a pool of worker threads

error-codes

To handle the failure of a socket operation like recv()

  • on Windows, WSAGetLastError() needs to a be called
  • on POSIX, errno contains the value.

On Windows the errno will be of the kind WSAEWOULDBLOCK, on POSIX EWOULDBLOCK which means the same thing, but uses different error-codes.

To handle this, the std::error_code is used in all places.

net::impl::socket::last_error_code() returns either:

  • std::error_code{WSAGetLastError(), std::system_category()} or
  • std::error_code{errno, std::generic_category()} or

which can be compared with:

std::error_code ec = impl::socket::last_error_code();
if (ec == make_error_condition(std::errc::operation_would_block)) {
  // ...
}

Expected Return values

Contrary to the networking-ts functions error-reporting relies on

stdx::expected<T, std::error_code>

instead of throwing an exception or passing in a std::error_code by reference.

// stdx::expected<size_t, std::error_code> recv(...) noexcept;
auto recv_res = impl::socket::recv(...);
if (!recv_res) {
  // recv failed, .error() contains the error_code
  auto ec = recv_res.error();
} else {
  size_t written = recv_res.value();
}

Low-Level abstractions

Router already had an abstraction for socket and poll operations which is low-level and only partially covers portability. Its main concern was mock-ability in tests.

To improve on that, the low-level socket/poll layer is replaced with:

-> portable (win32/posix) socket layer -> portable (win32/posix) readiness layer -> returns std::error_code

Implemented in:

  • net_ts/impl/socket.h
  • net_ts/impl/poll.h

Example: socketpair()

On POSIX socketpair() returns two file handles according to address-family.

On Windows, no socketpair() call exists, but it can be implemented in the form of a AF_INET socket that's accepted from an randomly assigned port.

IO Readiness

An io_context owns the socket-descriptors that are waiting to for readiness and their callback to call when the socket becomes ready or cancelled.

  • net_ts/io_context.h

On the low-level:

  • poll
  • linux_epoll

are supported:

  • net_ts/impl/linux_epoll.h
  • net_ts/impl/linux_epoll_io_service.h
  • net_ts/impl/poll.h
  • net_ts/impl/poll_io_service.h

Buffers

Buffers an abstraction over a memory-range (start-pointer and length) and have conversions for:

  • std::vector
  • std::array
  • char arr[N]

They can be passed as net::const_buffer to socket ops like send() to send a single buffer.

To avoid merging multiple buffers into a single buffer before sending the socket layer has the syscalls sendmsg() and recvmsg().

They requirement a sequence of buffers which are implement as:

ConstBufferSequence

which can be anything type that is iterable and returns a const-buffer.

The application is free to provide:

  • std::list
  • std::vector
  • ...

Socket, Tcp

The socket-layer provides socket-operations in a typesafe manner:

  • instead of passing sockaddr-structs and requiring reinterpret-casting an endpoint-class ensures that types and sizes are correct.
  • only an SocketAcceptor can actually call accept()
  • ...

Implemented in:

  • net_ts/socket.h
  • net_ts/internet.h

Classic Protocol Codec

The old classic protocol tracker relied on a partial classic protocol implementation which handled IO itself.

  • rewritten to work with net::const_buffer instead of std::vector<uint8>
  • only works on buffers, does not socket IO

Classic Protocol Tracker

The routing plugin tracks the classic protocol's handshake to:

  • track if SSL is enabled on the connection
  • avoid max_connect_error on the server if client aborts connections early, by sending a client::Greeting message

It needs to be rewritten to use the new Codec implementation.