WL#7299: Binlog_sender: do not reallocate the event buffer for every event sent

Affects: Server-5.7   —   Status: Complete

EXECUTIVE SUMMARY
-----------------

This worklog implements an optimization on the dump thread that
removes unnecessary reallocation of the send buffer. The user visible
effect is that the CPU will be used less by each dump thread the
master has spawned.

MOTIVATION
----------

For several reasons:

  1. Several and recurrent requests have been made by a high profile
     MySQL replication user;

  2. To make the mysql server better utilize the hardware resources
     (adaptative memory allocation by the dump thread and less CPU
     usage);

  2. The general direction is that we improve replication performance
     and scalability and this is yet one more step down that path.

REFERENCES
----------

MySQL BUG#31932
Functional Requirements
=======================

None.

Non-Functional Requirements
===========================


- NF1. The sender thread SHALL use less CPU (how much exactly, depends
       on the workload).
- NF2. The buffer size SHALL grow automatically and dynamically,
       without the need of user intervention.
- NF3. The buffer size SHALL shrink if over time the memory allocated
       is not used.
- NF4. There SHALL not be any memory leak when the thread is killed.
No other visible user changes other than the fact that there should
be less reallocations and less CPU utilization.
PROBLEM STATEMENT
-----------------

For every connected slave, the master keeps a binary log sender
thread, aka dump thread, running. The sender thread is responsible to
read the binary log and push it to the slave receiver thread, aka IO
thread. The send unit is an event. For every event that the sender
thread reads from the binary log, it puts it in a memory buffer and
then calls the network send primitive with the contents of this buffer
as a paremeter.

However, for every event read, the sender thread frees the memory of
the buffer and then reallocates memory, when it handles the next
event. This is sub-optimal and results in unnecessary CPU usage.

ANALYSIS
--------

The problem can be pin-pointed by looking at the code in
mysql-trunk. In rpl_binlog_sender.cc we find that the buffer used is a
String buffer in THD, called packet (THD::packet).

The contents of this buffer is sent by calling the member function:
Binlog_sender::send_packet. Crawling upwards in the call graph, one
can find that this function call results from three mnajor points:

  1. Binlog_sender::send_heartbeat_event
      Binlog_sender::send_packet_and_flush
       Binlog_sender::send_packet
     
  2. Binlog_sender::send_format_description_event
      Binlog_sender::send_packet
 
  3. Binlog_sender::fake_rotate_event
      Binlog_sender::send_packet
  
  4. Binlog_sender::send_events
      Binlog_sender::send_packet

There may be a 5th call to Binlog_sender::send_packet indirectly from
Binlog_sender::send_events, but in that case, the buffer used is a
temporary buffer:

  5. Binlog_sender::send_events
      Binlog_sender::send_heartbeat_event
       Binlog_sender::send_packet_and_flush
        Binlog_sender::send_packet

A temporary buffer here is used because the sender thread needs to
send a heartbeat before actually sending the data that it has read
from the binary log. Since it cannot just drop the data it read to use
THD::packet again, the sender thread uses a temporary local buffer.

Now... The problem is that before seanding an event the buffer needs
to be reset. This happens on the member function
Binlog_sender::reset_transmit_packet. And inside we find this code:

  packet->length(0);
  /*
    set() will free the original memory. It causes dump thread to free and
    reallocate memory for each sending event. It consumes a little bit more CPU
    resource. TODO: Use a shared send buffer to eliminate memory reallocating.
  */
  packet->set("\0", 1, &my_charset_bin);

As the comment says, the set function frees the buffer memory. Later
this memory is either explicitly reallocated in the stacks above #1
and #3 or implicitly by read_log_event when it calls
String::append(...). This happens in the stacks #2 and #4 above.

SOLUTION
--------

The solution to this problem is to not reallocate the event memory
unless really needed. Doing this requires removing the reallocation
calls from stack #1 and #3, and remove the resetting of the buffer
using String::set in Binlog_sender::reset_transmit_packet.

Furthermore, it requires that the buffer is pre-allocated before being
actually used. We already know the size needed for the buffer
beforehand in #1,#2,#3 and #5. In #4, we just need to read/peek the
event header and determine the event_len before calling
read_log_event. Therefore, once we know the size of the event before
reset_transmit_packet, we just call into that function and allocate
the buffer if needed.

Conversely, to avoid that the buffer grows too big and remains that
large, the buffer size must be re-evaluated periodically. As such,
every N events a decision needs to be taken, whether to shrink or keep
the current buffer size. The approach is further detailed down.

LOW LEVEL IMPLEMENTATION CONSIDERATIONS
---------------------------------------

The solution proposed will require three big blocks of changes:

1. Encapsulating the allocation and shrinkage of the buffer.

   - Growing the buffer

   To better encapsulate the logic to grow the buffer size, we move
   to Binlog_sender::reset_transmit_packet the action of actually
   reallocating the buffer. This function is called everytime the
   dump thread loads an event to the buffer, right before sending
   it. This means:

   a) we can remove the calls to packet->realloc from
      Binlog_sender::fake_rotate_event and
      Binlog_sender::send_heartbeat_packet.
  
   b) everytime an event is to be sent, the reset_transmit_packet
      function needs to be called and take as input the size of the 
      event that is to be loaded into the buffer.
   
   Therefore, this requires a change in the function signature to
   contain a new parameter that states how much buffer size the event
   will require. This makes the reset_transmit_packet function able
   to decide whether to realloc the buffer or not:

     inline int Binlog_sender::reset_transmit_packet(
          String *packet, 
          ushort flags, 
          uint32 min_buff_size)

   Now, inside the function, we need to remove this:
  
     packet->set("\0", 1, &my_charset_bin); 

   and replace it with:

     packet->qs_append('\0');

   Then, the reallocation is done, if needed, after the call to the
   hook:

     /* reserve and set default header */
     if (RUN_HOOK(binlog_transmit, reserve_header, (m_thd, flags, packet)))
     {
       set_unknow_error("Failed to run hook 'reserve_header'");
       DBUG_RETURN(1);
     }

     needed_buffer_size= packet->length() + min_buff_size;
  
     /* Resizes the buffer if needed. */
     this->grow_packet(cur_buffer_size, needed_buffer_size, packet);

   We encapsulate the realloc call inside grow_packet, since it hides
   the check to decide whether to reallocate or not.

   - Shrinking the buffer

   The buffer can be shrinked after the event is sent. This happens
   in Binlog_sender::send_packet. Then we can just deploy a call to a
   member function shrink_packet:
   
     /* Shrink the packet if needed. */
     this->shrink_packet(packet);

   Inside this function we implement the logic to shrink the buffer
   size.

2. Logics to dynamically and online adjust the buffer size

   - Growing the buffer size

   If the buffer is too small, then increase the buffer to the
   required size, but at least by a factor K: new_size =
   max(needed_size, buffer_size * K)

   Implementation could look like this:

     inline void grow_packet(ulong cur_buffer_size, 
                          ulong needed_buffer_size,
                          String *packet)
    {
      /*
        Grow the buffer if needed. If not, update the counters used to decide 
        if we are ever going to shrink the buffer after sending the packet.
      */
      if (needed_buffer_size > cur_buffer_size)
      {
        ulong new_buffer_size= min(
                max(static_cast(cur_buffer_size * PACKET_GROWTH_FACTOR), 
                    needed_buffer_size), 
                m_thd->variables.max_allowed_packet);
        packet->realloc(new_buffer_size);
      }
    }

   - Shrinking the buffer size

   We will shrink the buffer by a factor M, if less than 1/M of the
   buffer has been used for the last N consecutive events. The
   implementation should be something similar to this:

    inline void shrink_packet(String *packet)
    {
      ulong cur_buffer_size= packet->alloced_length();
      ulong buffer_used= packet->length();
    
      if (buffer_used < 
          static_cast((cur_buffer_size * PACKET_SHRINKAGE_FACTOR)))
        this->m_half_buffer_size_req_counter ++;
      else
        this->m_half_buffer_size_req_counter= 0;
    
      /* Check if we should shrink the buffer. */
      if (m_half_buffer_size_req_counter == PACKET_SHRINKING_COUNTER_THRESHOLD)
      {
        uint32 new_buffer_size= cur_buffer_size * PACKET_SHRINKAGE_FACTOR;
        if (new_buffer_size >= PACKET_MINIMUM_SIZE &&
            new_buffer_size != cur_buffer_size)
        {
          /* 
           The last PACKET_SHRINKING_COUNTER_THRESHOLD consecutive packets
           required less than half of the current buffer size. Lets shrink
           it to not waste memory.
          */
          packet->shrink(new_buffer_size);
        }

        /* Reset the counter. */
        this->m_half_buffer_size_req_counter= 0;
      } 
    
      DBUG_ASSERT(packet->alloced_length() >= PACKET_MINIMUM_SIZE);
    }

      
3. Logics to read the size of a log event from the binlog.

   Change Binlog_sender::read_event in order to read the event header
   from the binary log before calling
   Binlog_sender::reset_transmit_packet. Thence, we can load the event
   header, read the event length and calculate how much buffer will be
   needed, before we actually end up calling
   Log_event::read_log_event.

   Something like this before calling reset_transmit_packet:

     char header[LOG_EVENT_MINIMAL_HEADER_LEN];

     if (error= Log_event::peek_event_header(header, log_cache))
     {
       error= (error == LOG_READ_EOF) ? LOG_READ_IO : error;
       set_fatal_error(log_read_error_msg(error));
       DBUG_RETURN(1);    
     }
     uint32 buffer_needed= uint4korr(header + EVENT_LEN_OFFSET);

     if (reset_transmit_packet(packet, 0, buffer_needed))
       DBUG_RETURN(1);