WL#11567: Robust event deserialization by libbinlogevents

Affects: Server-8.0   —   Status: Complete

Event deserialization in libbinlogevents must ensure that it will never read
outside the limits of the buffer being deserialized.

When an event deserialization tries to reach out of buffer positions, the event
shall be considered as invalid and cannot be applied (by replication
applier|worker threads) or displayed (by mysqlbinlog client program or SHOW
BINARY|RELAY LOG EVENTS).

This worklog aims to create a buffer reader interface and class at
libbinlogevents to ease and standardize events deserialization with respect to
avoiding reads out of buffer boundaries. This interface shall provide all buffer
access types libbinlogevents needs to deserialize events, for example:
- memcpy: copy a portion of the buffer to other memory buffer;
- get ordinary values: copy a value from the buffer, such as byte, integer,
  long, sometimes doing additional operations such as le16toh, le32toh, le64toh;
- strings: copy a string content from the buffer;
- get_field_length: get a packed field length from the buffer;

Use case: As a MySQL user, I want to know that my MySQL server is robust by
default and will not crash or leak data when replicating corrupt events,
regardless of the use of binary logging checksums.
Functional requirements
=======================

F-1: MySQL server SHALL NOT crash or leak data when replicating corrupt events
     regardless of CRC usage.

F-2: mysqlbinlog client program SHALL NOT crash or lead data when processing
     corrupt events regardless of CRC usage.

F-3: No data collected from corrupted events shall be displayed. Error
     messages about corrupted events should be limited to inform that the
     event was assumed as corrupted. Debug builds may give additional
     information about the event corruption in debug traces.

Non-Functional requirements
===========================

NF-1: Any regression in replication performance should not be greater than 5%.
Interface Specification
=======================

I-1: No new files
I-2: No changes in existing syntax
I-3: No new commands
I-4: No new tools
I-5: No impact on existing functionality


Problems
========

While investigating to implement this worklog, the following problems were
found:

P1) libbinlogevents has no standard way of tagging an event as invalid

Even though Log_event class has is_valid_param (false by default) to be set true
by children classes when assuming all event parameters are valid, some of event
parameters are collected at event deserialization in libbinlogevents.

Example: Gtid_log_event

        Binary_log_event ------> Log_event_header
               ^ no check          no check
               |
               |
Binary_log::Gtid_event   Log_event
  no check      \         / is_valid_param=false
                 \       /
                  \     /
                   \   /
               Gtid_log_event
                 is_valid_param=true|false

Gtid_log_event deserialization starts with calling Binary_log_event constructor,
that will create a Log_event_header and a Log_event_footer. The Log_event_header
will deserialize the event header. Then, Binary_log::Gtid_event constructor will
be called to do the event specific deserialization. Finally, the Log_event
constructor will be called, passing the header and footer objects to be used.

Currently, is_valid_parameter is only set at Gtid_log_event constructor.


P2) Event deserialization issues

Not all event buffer reads check for buffer boundaries. Not all buffer
boundaries checks are correct. There are also other issues related to reading
from the buffer and checking for errors, aborting event deserialization in the
middle of the process.

P2a) Avoid reading out of buffer boundaries

Some event constructors dealing with event deserialization check buffer
boundaries, but not all reads are verified.

Also, some checks are performed after reading from the buffer.

All reads from the event buffer shall be guaranteed to be withing buffer
boundaries.

P2b) Be careful when calculating sizes and boundaries: avoiding overflow

Many of buffer size, pointers and boundaries calculations are performed using
unsigned numeric variables.

Assume all values as 64 bit unsigned:
- size: size of the buffer;
- buffer: base pointer of the buffer;
- ptr: current position in the buffer to be read;
- wanted: the amount of bytes that is wanted to be read from the buffer;

Consider (((ptr - buffer) + wanted) <= size) as one logic to check buffer
boundaries.

Assume: buffer = x; ptr = 16 + x; size = 32. Buffer has only 16 bytes available
to be read.

The "wanted" part of the logic was read from the buffer (as the size of a field
about to be read).

An ordinary "wanted" (i.e. 8) would pass the logic:
  16 + 8 = 24; 24 <= 32: "wanted" is allowed to be read.

A not so big invalid "wanted" (i.e. 32) would also pass the logic:
  16 + 32 = 48; 48 > 32: "wanted" is not allowed to be read.

But notice that a very large "wanted" (2^64 - 8) may mislead the logic:
  16 + (2^64 - 8) = (overflowed) 8; 8 <= 32: "wanted" is allowed to be read.

The correct logic in this case should trust in non-deserialized values:
  ((size - (ptr - buffer)) >= wanted)
  wanted = 8: (32 - 16) >= 8: "wanted" is allowed to be read.
  wanted = 32: (32 - 16) >= 32: "wanted" is not allowed to be read.
  wanted = 2^64 - 8: (32 - 16) >= 2^64 - 8: "wanted" is not allowed to be read.


P2c) Pointers and extra buffers initialization and cleanup

Some event destructors rely on the fact that some pointers always point to
allocated buffers. Doing additional buffer boundary checks may lead to abort
event deserialization before allocating memory for some structures.

This worklog should make all event pointers initialized as nullptr by default,
and free everything that was allocated when pointers are not nullptr.

P2d) Some asserts should in fact be turned to error and be properly handled

Example: in Log_event_head constructor, there is a line with:
  BAPI_ASSERT(type_code < ENUM_END_EVENT || flags & LOG_EVENT_IGNORABLE_F);

This line is asserting that the event type being deserialized is supported by
the server format description or the event has the LOG_EVENT_IGNORABLE_F flag
set.

This logic holds only if the code (both server and mysqlbinlog codes) ensure
one of this conditions before trying to instantiate an event from a given
buffer.

Once having P1 solved, the Log_event_header can set the event as invalid if at
least one of the conditions are not true. This would allow developers to be able
to test event deserialization using invalid buffers and seeing the server (or
mysqlbinlog) handling the errors without controlled crashes.


Minor Problems
--------------

P3) The event object is not fully aware of CRC payload

All deserialized events have a "data_written" field on its reader with the
amount of bytes the event had when it was serialized. This field takes into
account also the CRC payload when the format description states that the events
have CRC.

When an event is being deserialized using a buffer and expecting a Log_event
child object in return (Log_event::read_log_event(char *, uint, ...)), is the
read_log_event function that adjusts the event_size to be passed to event
constructors, removing the CRC payload if necessary.

This leads to complex logic when, for example, the receiver thread is
deserializing a Rotate event it received from the master.


P4) There is almost no debugging information in libbinlogevents

Log_event classes use to have DBUG_ENTER/DBUG_VOID_RETURN to mark class
construction in debug traces. Some events also generate extra information about
what is being deserialized.

Unfortunately, libbinlogevents has almost no debugging information.


High Level Specification
========================

Below are listed the proposed solutions that should be accomplished by this
worklog.


S1) Moved is_valid_param into libbinlogevents common_header

The idea is to move is_valid_param functionality to the Log_event_header object,
so both Binary_log_event and Log_event children classes can use it.

By moving the is_valid_param logic to Log_event_header, all objects that read
from the buffer to deserialize the event (both header and event specific parts)
will have access to set the event as invalid and check it.

Example: Gtid_log_event

        Binary_log_event ------------------> Log_event_header
               ^ if (!is_valid) return         is_valid=true|false
               | ...
               | is_valid=true|false
               |
               |
    Binary_log::Gtid_event        Log_event
  if (!is_valid) return  \         / if (!is_valid) return
  ...                     \       /  ...
  is_valid=true|false      \     /   is_valid=true|false
                            \   /
                      Gtid_log_event
                        if (!is_valid) return
                        ...
                        is_valid_param=true|false

So, all *_event constructors inheriting from Binary_log_events and Log_event
should, before any other activity, check if a parent constructor set the event
as invalid (so no further action is necessary).


S2) A new class should be implemented to handle robust event buffer access

Adapting events deserialization to use this new class instead of directly
accessing the buffer should address all P2 issues. As this process will change
all events constructors, we can address P4 too.

The "Event_reader" class should be instantiated by passing as parameter the base
pointer to the buffer and the buffer size.

It should be part of the Log_event_header event, so all children classes will be
able to use it to deserialize event content from the buffer with proper error
handling.

The class should have at least the following variables:

  /* The buffer to read from */
  const char *m_buffer;

  /* The pointer to the current read position in the buffer */
  const char *m_ptr;

  /* The length of the buffer */
  unsigned long long m_length;

The class should provide some error handling. In theory, any error detected by
the Event_reader class would result in a single server error code:

ER_READ_LOG_EVENT_FAILED
eng "Error in Log_event::read_log_event(): '%s', data_len: %lu, event_type: %d."

Note that the I/O thread also deserialize some events (Rotate, FD and GTID) and
should also report about invalid events. Maybe we should create a new error type
for a failure detected while deserializing an event.

As read_log_event() error handling expects a string do give more information
about the failure, the Event_reader class should also have something like:

  /* The pointer to the current error message, or nullptr */
  const char *m_error;

In this way, m_error == nullptr means no error, and m_error != nullptr means
that an error happened while reading from the buffer and m_error points to the
error message.

The error message shall never contain any information obtained from the buffer,
or else we risk leaking information. Some example error messages are:

e1) Unable to read content from the event

    This should be used every time an event deserialization tries to read out of
    event buffer boundaries.

e2) Invalid checksum algorithm

    This should be used when deserializing a format description event trying to
    use and invalid checksum algorithm.

The class should support all type of operations performed by event
deserialization functions, such as:

f1) go_to(ulong)

    jump to a given buffer position;

f2) forward(ulong)

    advance the read position;

f3)  read_():

    read a  from the buffer and forwards sizeof();

f4)  read_and_letoh()

    read  from the buffer, storing them into a  variable, call
    letoh, forward /8 bytes;

f5) copy(var, length)

    copy the length from the buffer into var, forwarding length bytes;

All event classes constructors that deserialize events from a given buffer
will have to do some standard verification before starting deserializing.

Unfortunately, MySQL coding standard does not support exceptions.

In theory, an event constructor will have to follow more or less the following
pattern:

{
  BAPI_ENTER("Constructor");

  /* Check if event deserialization failed in parent objects */
  if (!header->is_valid()) BAPI_VOID_RETURN;

  Event_reader reader = header->get_event_reader();

  /* Read contents from the buffer */
  var1_size = reader->read_and_le64toh();
  if (reader->has_error()) goto err;
  var1_content = reader->ptr();
  reader->forward(var_size);
  if (reader->has_error()) goto err;
  var2_size = reader->read_and_le64toh();
  if (reader->has_error()) goto err;
  reader->copy(var2_content, var2_size);
  if (reader->has_error()) goto err;
  ...
  if ()
  {
    reader->set_error('Unexpected condition');
    goto err;
  }
  ...

err:
  if (reader->has_error()) header->set_is_valid(false);
  BAPI_VOID_RETURN;
}

The idea is to wrap the reader function with the check in a macro, maybe
wrapping also the initial check and the 'err:' label:

{
  BAPI_ENTER("Constructor");

  EVENT_READER_EVENT_CONSTRUCTOR_HEADER; // initial check, declare reader

  /* Read contents from the buffer */
  EVENT_READER_SET_AND_CHECK(var1_size, read_and_le64toh);
  EVENT_READER_SET(var1_content, ptr);
  EVENT_READER_CALL_AND_CHECK(forward, var1_size);
  EVENT_READER_SET_AND_CHECK(var2_size, read_and_le64toh);
  EVENT_READER_CALL_AND_CHECK(copy, var2_content, var2_size);
  ...
  if () EVENT_READER_THROW('Unexpected condition');
  ...

  EVENT_READER_EVENT_CONSTRUCTOR_FOOTER; // err: and call set_is_valid
  BAPI_VOID_RETURN;
}


S3) Move CRC logic to Log_event_footer

This should address P3.

This CRC logic will be moved to Log_event_footer, simplifying and standardizing
the CRC payload processing.

With this change, there will be no need to pass the event size to an event
constructor: the event header will know its size and the event footer will know
if it has CRC payload or not, and the CRC algorithm used.
Low Level Design
================

The worklog changes will be divided into 7 steps:


Step 1) Cover HLS' S1: move is_valid_param to Log_event_header
--------------------------------------------------------------

This step will not require specific test cases. The current MTR suites should be
enough to validate the changes as no change in behavior should be expected.


Step 2 to step 6) Cover HLS' S2: introduce the Event_reader class and use it
----------------------------------------------------------------------------

These steps will not require specific test cases too. The current MTR suites
should be enough to validate the changes as no changes in behavior are expected.
Maybe some test cases will have to be refactored as a result of invalidation of
events.

Step 2 will introduce the Event_reader class and should also cover HLS' S3.

Three new files will be added to libbinlogevents:
libbinlogevents/include/event_reader.h
libbinlogevents/include/event_reader_macros.h
libbinlogevents/src/event_reader.cpp

They will contain the Event_reader class to be used by any function requiring
event buffer deserialization.

They should also contain the macros to standardize and "simulate" the exception
behavior:

#define READER_CONSTRUCTOR_HEADER \
  if (!header()->is_valid()) BAPI_VOID_RETURN

#define READER_CONSTRUCTOR_FOOTER \
event_reader_footer:              \
  header()->set_is_valid(READER_CALL(has_error) == false)

#define READER_CHECK \
  if (header()->reader()->get_error()) goto event_reader_footer

#define READER_SET_AND_CHECK(var, func, ...) \
  var = header()->reader()->func(__VA_ARGS__);     \
  EVENT_READER_CHECK

#define READER_CALL_AND_CHECK(func, ...) \
  header()->reader()->func(__VA_ARGS__);       \
  EVENT_READER_CHECK

#define READER_THROW(message)         \
  {                                         \
    header()->reader()->set_error(message); \
    goto event_reader_footer;               \
  }

Step 3 will change control events.

Step 4 will change statement events.

Step 5 will change load data events.

Step 6 will change rows events.

Step 7) Related test cases and cleaning up
------------------------------------------

This step will introduce some test cases of related issues fixed by the worklog.

It will also remove code not used anymore by libbinlogevents after migrating all
the event deserialization code to the Event_reader class.