WL#7126: Refactoring of protocol class

Status: Complete

The goal of this WL is to make an initial step toward unbinding core server
and parsing protocol format. Essentially it's a refactoring work.
Scope is a somewhat minimal effort to be to provide API for protocol plugins.
This API will be exposed by a service for server commands execution. Other teams
(Runtime, Replication, etc) would need to move protocol parsing code they own to
the Protocol class in scope of separate WLs. This WL is independent on those
follow up WLs and should be pushed separately.

Limitations.
This WL won't support Prepared Statements officially. Despite the fact that this
WL makes a step toward such support, i.e it allows plugins to run STMT_PREPARE
and STMT_EXECUTE commands, parameters for PS aren't parsed and passed as is to
server. This allows legacy protocols to work with PS as before, but doesn't
allow run such statements for plugins via a service or protocol plugins.
Protocol plugins won't be supported in scope of this WL, although all legacy
code will work as before. 

User Documentation
==================

Code refactoring. No user documentation required.
This worklog is more of a refactoring worklog so none of the functionalities 
should be affected by the change. Basically it groups most of the I/O
interaction into one single interface and different implementations.

F-1: All clients should work as before

F-2: There should be no test failures

Non-Functional requirements:

NF-1: There should be no performance impact from the changes made in this worklog
Aim of the WL
=============
1) Unbinding server and protocols implementations
2) Providing API for implementations of new protocols
3) Providing API to be exposed via a service

Current state
=============
Result/error sending is handled by the Protocol class and its descendants.
This part is well-defined and doesn't require any work on the server side.

Commands/queries are read with help of my_net_read function which isn't
wrapped in any class, but a part of NET interface. Incoming packets are
parsed by the code that handles each separate command, i.e. spread all over
the server (roughly). NET descriptor is public and is accessible from
everywhere.

New design
==========
Changes are consisting of several logical parts:
1) The Protocol class becomes an abstract parent, that defines the API between 
server and the outer world. All extensions should use only methods/data
defined there. This parent class will contain methods for
  1.1) Sending data - this is already existing code and requires no change.

  1.2) Receiving data. This part consists of adding an API that allows to:
    *) Read a command
    *) Parse last read command into server's internal data structures and handle
    parsing errors in a consistent way
    *) Create a command to allow code outside of server core to run SQL queries
    by calling a single function and without need to binary encode the query.
    In this WL this functionality is limited only to COM_QUERY commands. Other
    commands could be added when needed in scope of separate WL(s).

    This is mostly a refactoring part. Methods for this part are defined in the
    Protocol class and the actual implementation that reads and
    parses some simple commands from server is placed in the Protocol_classic
    class.
    During parsing, Protocol_classic fills server's new internal structures
    with arguments to commands.  Each command now has a structure dedicated to
    it which holds arguments, all of them are joined into a union to limit
    occupied space. This union is stored in THD and is publicly accessible.

    Examples of new data structures:

    struct COM_INIT_DB_DATA
    {
      uchar *db_name;
      ulong length;
    };

    struct COM_REFRESH_DATA
    {
      uchar options;
    };

    struct COM_SHUTDOWN_DATA
    {
      enum mysql_enum_shutdown_level level;
    };

    and so on.

  1.3) Providing metadata of the result set. In order to limit exposure of
    internal structures, the metadata sending code was refactored. The
    following methods:

    bool send_result_set_metadata(List *list, uint flags);
    bool send_result_set_row(List *row_items);

    will be moved from the Protocol as it ties the interface to server 
    internals(Item) and will be placed outside the class. Three helper methods
    will be added to the protocol class in order the accomplish this:
    bool start_result_metadata(uint num_cols, uint flags,
                               const CHARSET_INFO *resultcs)
    bool send_field_metadata(Send_field *field, const CHARSET_INFO *charset)
    end_result_metadata(uint num_cols, uint flags);

  1.4) providing result of the execution and the client's capabilities.

2) Current protocol implementation is moved to the Protocol_classic class, 
Protocol_text and Protocol binary are derived from it.

3) NET, packet and other legacy protocol related info is hidden behind
Protocol_classic. All code that isn't refactored to use new API defined in the
Protocol class will have to cast in to Protocol_classic and access that data
via Protocol_classic's methods. When all code would be converted to use new API
(in scope of various WLs implemented by code owning teams), need in those
extensions will gradually fade away and functions will be removed.


Overall workflow
================
Overall workflow schema is as follows:
1) MySQL server handles incoming connections as it does now
*) server calls protocol->read_command() instead of my_net_read() in
do_command().
**) protocol implementation handles all actual reading
**) if reading was successful do_command calls protocol->parse_command()
**) protocol implementation fills appropriate command data structure 
*) server works on the data provided by protocol, not messing with actual
packet format
*) after query execution server calls appropriate Protocol methods to send
result/error to the client
**) protocol implementation represent this data/error in whatever format it
needs and handles actual writing 
*) goto 1


Runtime review (Alik) notes
===========================

High-level notes
----------------

I reviewed only parts related to the protocol: plugins are beyond my
understanding.

The cumulative patch looks good to me.

The patch is also consistent with the latest edition of HLD (the HLD does
not talk any more about single protocol API). So, we have just a
refactoring WL, which makes the further development somewhat easier.
That's Ok with me.

Overview
--------

I reviewed the parts of the cumulative patch from git:mysql-trunk-wl7126,
which belong to the Runtime team.

In particular:
  - Field refactoring hasn't been reviewed;
  - Changes to the ACL code (plugins & co) haven't been reviewed;
  - Test changes haven't been reviewed.

Here is the list of the files, which have been reviewed, and for which
I don't have any specific notes (apart from the general notes):
  - libmysqld/lib_sql.cc
  - plugin/innodb_memcached/innodb_memcache/src/handler_api.cc
  - plugin/semisync/semisync_master_ack_receiver.cc
  - plugin/semisync/semisync_master_plugin.cc
  - sql/auth/sql_authorization.cc
  - sql/binlog.cc
  - sql/bootstrap.cc
  - sql/conn_handler/channel_info.cc
  - sql/conn_handler/connection_handler_one_thread.cc
  - sql/conn_handler/connection_handler_per_thread.cc
  - sql/conn_handler/socket_connection.cc
  - sql/sql_profile.cc
  - sql/sql_show.cc
  - sql/sql_table.cc
  - sql/sql_update.cc
  - sql/sql_yacc.yy
  - sql/sys_vars.cc
  - sql/xa.cc
  - storage/myisam/ha_myisam.cc
  - sql/events.cc
  - sql/ha_ndbcluster.cc
  - sql/ha_ndbcluster_binlog.cc
  - sql/ha_partition.cc
  - sql/handler.cc
  - sql/log_event.cc
  - sql/mf_iocache.cc
  - sql/mysqld.cc
  - sql/parse_tree_items.cc
  - sql/parse_tree_items.h
  - sql/rpl_binlog_sender.cc
  - sql/rpl_rli.cc
  - sql/sp_head.cc
  - sql/sp_instr.cc
  - sql/sp_rcontext.cc
  - sql/sql_admin.cc
  - sql/sql_cache.cc
  - sql/sql_connect.cc
  - sql/sql_error.cc
  - sql/sql_handler.cc
  - sql/sql_insert.cc
  - sql/sql_load.cc
  - sql/sql_parse.h
  - sql/sql_plugin.cc

Review notes to the protocol classes and other pieces of the patch
have been sent out by email.

List of identified changes
--------------------------

This is just for the record. Since the patch is "all-in-one",
special efforts had to be taken to identify the refactoring steps.
Here is a list of them in no particular order.

1. Split dispatch_command() to create_command() + dispatch_command();

2. memset() -> wipe_net()

3. end_statement()

4. Interface to work with client capabilities:

-  thd->client_capabilities= client_flag;
+ protocol->set_client_capabilities(client_flag);
+ protocol->add_client_capability(CLIENT_TRANSACTIONS);
+ protocol->has_client_capability(CLIENT_SSL))
+ protocol->remove_client_capability(CLIENT_MULTI_RESULTS);

5. Protocol::prepare_for_resend() -> start_row()

6. Protocol::write()              -> end_row()

7. Protocol::remove_last_row()    -> abort_row()

8. Split Protocol::send_result_set_metadata() into
    - ::send_result_metadata(THD, Protocol, ...)
    - Protocol::start_result_metadata()
    - Protocol::send_field_metadata()
    - Protocol::end_result_metadata()

9. send_result_set_row()

10. send_string_item()

11. my_net_init() -> Protocol::init_net()

12. net_end() -> Protocol::end_ned()

13. vio_shutdown() -> Protocol::shutdown()

14. my_net_write() -> Protocol::write()
    net_flush() -> Protocol::flush_net()

15. Protocol::get_ssl() must return (SSL *) to avoid unnecessary casts.

16. my_net_read() -> Protocol::read_packet()

17. Protocol::get_last_error()

  Dangerous practice:
  + ((Protocol_classic *) thd->protocol)->get_last_error() ?
  +  ((Protocol_classic *) thd->protocol)->get_last_error() : "");

18. Field refactoring

19. ACL changes + Plugin
New Protocol class hierarchy
============================
The new hierarchy consists of 4 classes:
  Protocol
       |
  Protocol_classic
       |
  Protocol_text
       |
  Protocol_binary

Protocol is an abstract class that defines the new API. Protocol_classic is
ex-Protocol class, implements core of both classic protocols - text and
binary. Protocol_text and Protocol_binary are implementations of appropriate
classic protocols.

Protocol's API
============== 
The API of the Protocol class consists of 4 logical parts:
1) Receiving data from outside. 
  This WL moves parsing of limited set of commands. Commands and new data
  structures dedicated to them are as follows:

  struct COM_INIT_DB_DATA
  {
    uchar *db_name;
    ulong length;
  };

  struct COM_REFRESH_DATA
  {
    uchar options;
  };

  struct COM_SHUTDOWN_DATA
  {
    enum mysql_enum_shutdown_level level;
  };

  struct COM_KILL_DATA
  {
    ulong id;
  };

  struct COM_SET_OPTION_DATA
  {
    uint opt_command;
  };

  struct COM_STMT_EXECUTE_DATA
  {
    ulong stmt_id;
    ulong flags;
    uchar *params;
    ulong params_length;
  };

  struct COM_STMT_FETCH_DATA
  {
    ulong stmt_id;
    ulong num_rows;
  };

  struct COM_STMT_SEND_LONG_DATA_DATA
  {
    ulong stmt_id;
    uint param_number;
    uchar *longdata;
    ulong length;
  };

  struct COM_STMT_PREPARE_DATA
  {
    char *query;
    uint length;
  };

  struct COM_STMT_CLOSE_DATA
  {
    uint stmt_id;
  };

  struct COM_STMT_RESET_DATA
  {
    uint stmt_id;
  };

  struct COM_QUERY_DATA
  {
    char *query;
    uint length;
  };

  struct COM_FIELD_LIST_DATA
  {
    uchar *table_name;
    uint table_name_length;
    uchar *query;
    uint query_length;
  };

  These new structures are wrapped in a union:

  union COM_DATA {
    COM_INIT_DB_DATA com_init_db;
    COM_REFRESH_DATA com_refresh;
    COM_SHUTDOWN_DATA com_shutdown;
    COM_KILL_DATA com_kill; 
    COM_SET_OPTION_DATA com_set_option;
    COM_STMT_EXECUTE_DATA com_stmt_execute;
    COM_STMT_FETCH_DATA com_stmt_fetch;
    COM_STMT_SEND_LONG_DATA_DATA com_stmt_send_long_data;
    COM_STMT_PREPARE_DATA com_stmt_prepare;
    COM_STMT_CLOSE_DATA com_stmt_close;
    COM_STMT_RESET_DATA com_stmt_reset;
    COM_QUERY_DATA com_query;
    COM_FIELD_LIST_DATA com_field_list;
  };

  which is stored in public part of THD.

  The new API that uses these structures:

  class Protocol
  {
    ...
    /*
      read packet from client
      returns
        -1  fatal error
         0  ok
         1 non-fatal error
    */
    int read_packet();
    /*
      Parse read packet and fill com_data union.
      returns 
        true  malformed packet, appropriate error is thrown
        false packet was successfully parsed
    */
    bool parse_packet(union COM_DATA *data);
    /*
      Creates a command. This call substitutes read_packet() + parse_packet()
      calls and have to be used when dispatch_command() needs to be called
      directly.

      @notes Data pointed to by cmd isn't expected to change prior to
      dispatch_command() returns. In case it's changed, the behavior is
      undefined, from wrong result to a crash.
    */
    void create_command(COM_DATA *cmd);
    /*
      Return command from last parsed packet
    */
    enum enum_server_command get_command() { return cmd; }
    ...
  }

  Most code changes for this part are done to dispatch_command() and functions
  it calls. Typical change is giving parsed arguments instead of pointer to the
  raw NET buffer and packet's length. For example:
     case COM_STMT_FETCH:
     {
  -    mysqld_stmt_fetch(thd, packet, packet_length);
  +    mysqld_stmt_fetch(thd, thd->com_data.com_stmt_fetch.stmt_id,
  +                      thd->com_data.com_stmt_fetch.num_rows);
       break;

  The dispatch_command method will also check if the protocol type is 
  PROTOCOL_PLUGIN and if the command it tries to invoke is available for this type 
  of protocol and rise an ER_PLUGGABLE_PROTOCOL_COMMAND_NOT_SUPPORTED error in 
  case it is not. 

  The other change is that do_command() now calls thd->protocol->read_packet()
  and thd->protocol->parse_packet(&thd->com_data) instead of interacting with
  NET directly.

2) Providing metadata
  Originally metadata was sent by calling Protocol::send_result_set_metadata().
  To avoid sharing too much internal structures with plugins, this method is
  moved to THD::send_result_metadata() which uses these Protocol methods to
  actually send the metadata:
    - Protocol::start_result_metadata()
    - Protocol::send_field_metadata()
    - Protocol::end_result_metadata()
  Also, only for the embedded library an additional method called
  Protocol::send_string_metadata() is used. 
  Protocol::end_statement() is moved to THD as well.

3) Sending data
  Now result is sent by THD::send_result_set_row accompanied by
  Protocol::start_row(), Protocol::end_row(), Protocol::abort_row() (in case
  of error). Latter were renamed to provide more understandable API
  Protocol::prepare_for_resend() -> start_row()
  Protocol::write()              -> end_row()
  Protocol::remove_last_row()    -> abort_row()
  The data itself is sent by Protocol::store(...) methods as before this WL.
  
  A supportive change: to allow Protocol to send fields in text or binary
  forms, the Field class now has a parent:
  class Proto_field
  {
  public:
    virtual bool send_binary(Protocol *protocol)= 0;
    virtual bool send_text(Protocol *protocol)= 0;
  };
  
  this way, we have to share only the Proto_field class. The protocol
  implementation could decide in which format it like to send the field and
  call appropriate method. The actual format would be defined by the field
  implementation, as before this WL.

4) Providing result of the execution and getting clients capabilities
  To check client's capabilities two methods exists:
  Protocol::get_client_capabilities()
  Protocol::has_client_capability()
  Also there is other methods providing info about connection:
  Protocol::get_ssl()
  Protocol::get_rw_status()
  Protocol::get_compression()

  Methods for sending status of the execution:
  Protocol::send_ok()
  Protocol::send_eof()
  Protocol::send_error()

Changes to dispatch_command()
-----------------------------
As now parsing is done elsewhere, dispatch_command() uses API defined in
the Protocol class. Though not all commands handled by dispatch_command were
refactored to be able to use new API. To avoid crashes, dispatch_command() now
checks whether the command that's about to run supports pluggable protocols.
The new CF_ALLOW_PROTOCOL_PLUGIN flag set for appropriate commands in the
server_command_flags array tells that. If command comes from pluggable
protocol, but server doesn't support them for this command, the new error 
"Pluggable protocols isn't supported yet by this command" is issued and
command execution is aborted.

Dealing with legacy (classic) protocols
=======================================
As classic protocols (text and binary) are too hard wired into the server code
and it's not possible to extract them in given time frame, they're moved into
dedicated classes, which implement extended API in addition to the one
provided by Protocol class. This extended API is left only for compatibility
reasons and after additional refactoring (in scope of a separate wL) will be
removed. Thus this API shouldn't be used in any new work.

Basic idea behind these changes is to hide all protocol-related internals
behind the Protocol_classic class, e.g NET, packet, raw_packet, etc, and make
all code in server that works with these internals to work with them only via
Protocol implementation. 

To ensure that correct protocol (pluggable Protocol or legacy
Protocol_classic) THD now has two new methods: THD::get_protocol() and
THD::get_protocol_classic(). Both methods return the same THD::m_protocol, but
the latter asserts that it's either legacy text or binary protocol and casts
it to Protocol_classic. This ensures that old code won't get a pluggable
protocol which doesn't support legacy protocol API.

Protocol_classic allows server to set client's capabilities by providing two
methods: Protocol_classic::set_client_capabilities() and
Protocol_classic::add_client_capability().

Hiding NET
----------
As it's not possible to move NET from THD to Protocol (due to complex
interdependency between text and binary protocols which can't be solved in
given timeframe) NET is kept in THD but moved to private scope. Same is done
to THD::packet. 

To keep server functioning, Protocol has a new API consisting of two parts.
First one allows server to interact with NET and VIO behind the Protocol:

class Protocol_classic
{
  ...
  /* Initialize NET */
  bool init_net(Vio *vio);
  /* Deinitialize NET */
  void end_net();
  /* Flush NET buffer */
  bool flush_net();
  /* Write data to NET buffer */
  bool write(const uchar *ptr, size_t len);
  /* Return last error from NET */
  uchar get_error();
  /* Return last errno from NET */
  uint get_last_errno();
  /* Set NET errno to handled by caller */
  void set_last_errno(uint err);
  /* Return last error string */
  char *get_last_error();
  /* Set max allowed packet size */
  void set_max_packet_size(ulong max_packet_size);
  /* Return SSL descriptor, if any */
  void *get_ssl();
  /* Deinitialize VIO */
  int shutdown_vio();
  /* Wipe NET with zeros */
  void wipe_net();
  /* Check whether VIO is healhty */
  bool vio_ok();
  ...
}

Second part of this API is a temporary one and is needed to allow code that
still parses packets on it own to keep working. Basically it returns NET or
different other things considered to be Protocol's internals. This part of the
API have to be removed when all packet parsing is moved to Protocol class.

class Protocol_classic
{
  ...
  /* Return NET */
  NET *get_net();
  /* return VIO */
  Vio *get_vio();
  /* Set VIO */
  void set_vio(Vio *vio);
  /* Set packet number */
  void set_pkt_nr(uint pkt_nr);
  /* Return packet number */
  uint get_pkt_nr();
  /* return packet string */
  String *get_packet();
  /* return packet length */
  uint get_packet_length();
  /* return raw packet buffer */
  uchar *get_raw_packet();
  ...
}

Beside introducing new API code needs to start using that API. Those changes
are fairly trivial replacements, e.g. replace "&thd->net" with
"thd->get_protocol()->get_net()".

Some additional changes include:
-------------------------------
MPVIO_EXT:
*) removed NET* and added pointer to Protocol
*) removed client_capabilities and added it to Protocol
*) removed client_capabilities from thread and use the Protocol capabilities