WL#7126: Refactoring of protocol class
Status: Complete
The goal of this WL is to make an initial step toward unbinding core server and parsing protocol format. Essentially it's a refactoring work. Scope is a somewhat minimal effort to be to provide API for protocol plugins. This API will be exposed by a service for server commands execution. Other teams (Runtime, Replication, etc) would need to move protocol parsing code they own to the Protocol class in scope of separate WLs. This WL is independent on those follow up WLs and should be pushed separately. Limitations. This WL won't support Prepared Statements officially. Despite the fact that this WL makes a step toward such support, i.e it allows plugins to run STMT_PREPARE and STMT_EXECUTE commands, parameters for PS aren't parsed and passed as is to server. This allows legacy protocols to work with PS as before, but doesn't allow run such statements for plugins via a service or protocol plugins. Protocol plugins won't be supported in scope of this WL, although all legacy code will work as before. User Documentation ================== Code refactoring. No user documentation required.
This worklog is more of a refactoring worklog so none of the functionalities should be affected by the change. Basically it groups most of the I/O interaction into one single interface and different implementations. F-1: All clients should work as before F-2: There should be no test failures Non-Functional requirements: NF-1: There should be no performance impact from the changes made in this worklog
Aim of the WL ============= 1) Unbinding server and protocols implementations 2) Providing API for implementations of new protocols 3) Providing API to be exposed via a service Current state ============= Result/error sending is handled by the Protocol class and its descendants. This part is well-defined and doesn't require any work on the server side. Commands/queries are read with help of my_net_read function which isn't wrapped in any class, but a part of NET interface. Incoming packets are parsed by the code that handles each separate command, i.e. spread all over the server (roughly). NET descriptor is public and is accessible from everywhere. New design ========== Changes are consisting of several logical parts: 1) The Protocol class becomes an abstract parent, that defines the API between server and the outer world. All extensions should use only methods/data defined there. This parent class will contain methods for 1.1) Sending data - this is already existing code and requires no change. 1.2) Receiving data. This part consists of adding an API that allows to: *) Read a command *) Parse last read command into server's internal data structures and handle parsing errors in a consistent way *) Create a command to allow code outside of server core to run SQL queries by calling a single function and without need to binary encode the query. In this WL this functionality is limited only to COM_QUERY commands. Other commands could be added when needed in scope of separate WL(s). This is mostly a refactoring part. Methods for this part are defined in the Protocol class and the actual implementation that reads and parses some simple commands from server is placed in the Protocol_classic class. During parsing, Protocol_classic fills server's new internal structures with arguments to commands. Each command now has a structure dedicated to it which holds arguments, all of them are joined into a union to limit occupied space. This union is stored in THD and is publicly accessible. Examples of new data structures: struct COM_INIT_DB_DATA { uchar *db_name; ulong length; }; struct COM_REFRESH_DATA { uchar options; }; struct COM_SHUTDOWN_DATA { enum mysql_enum_shutdown_level level; }; and so on. 1.3) Providing metadata of the result set. In order to limit exposure of internal structures, the metadata sending code was refactored. The following methods: bool send_result_set_metadata(List- *list, uint flags); bool send_result_set_row(List
- *row_items); will be moved from the Protocol as it ties the interface to server internals(Item) and will be placed outside the class. Three helper methods will be added to the protocol class in order the accomplish this: bool start_result_metadata(uint num_cols, uint flags, const CHARSET_INFO *resultcs) bool send_field_metadata(Send_field *field, const CHARSET_INFO *charset) end_result_metadata(uint num_cols, uint flags); 1.4) providing result of the execution and the client's capabilities. 2) Current protocol implementation is moved to the Protocol_classic class, Protocol_text and Protocol binary are derived from it. 3) NET, packet and other legacy protocol related info is hidden behind Protocol_classic. All code that isn't refactored to use new API defined in the Protocol class will have to cast in to Protocol_classic and access that data via Protocol_classic's methods. When all code would be converted to use new API (in scope of various WLs implemented by code owning teams), need in those extensions will gradually fade away and functions will be removed. Overall workflow ================ Overall workflow schema is as follows: 1) MySQL server handles incoming connections as it does now *) server calls protocol->read_command() instead of my_net_read() in do_command(). **) protocol implementation handles all actual reading **) if reading was successful do_command calls protocol->parse_command() **) protocol implementation fills appropriate command data structure *) server works on the data provided by protocol, not messing with actual packet format *) after query execution server calls appropriate Protocol methods to send result/error to the client **) protocol implementation represent this data/error in whatever format it needs and handles actual writing *) goto 1 Runtime review (Alik) notes =========================== High-level notes ---------------- I reviewed only parts related to the protocol: plugins are beyond my understanding. The cumulative patch looks good to me. The patch is also consistent with the latest edition of HLD (the HLD does not talk any more about single protocol API). So, we have just a refactoring WL, which makes the further development somewhat easier. That's Ok with me. Overview -------- I reviewed the parts of the cumulative patch from git:mysql-trunk-wl7126, which belong to the Runtime team. In particular: - Field refactoring hasn't been reviewed; - Changes to the ACL code (plugins & co) haven't been reviewed; - Test changes haven't been reviewed. Here is the list of the files, which have been reviewed, and for which I don't have any specific notes (apart from the general notes): - libmysqld/lib_sql.cc - plugin/innodb_memcached/innodb_memcache/src/handler_api.cc - plugin/semisync/semisync_master_ack_receiver.cc - plugin/semisync/semisync_master_plugin.cc - sql/auth/sql_authorization.cc - sql/binlog.cc - sql/bootstrap.cc - sql/conn_handler/channel_info.cc - sql/conn_handler/connection_handler_one_thread.cc - sql/conn_handler/connection_handler_per_thread.cc - sql/conn_handler/socket_connection.cc - sql/sql_profile.cc - sql/sql_show.cc - sql/sql_table.cc - sql/sql_update.cc - sql/sql_yacc.yy - sql/sys_vars.cc - sql/xa.cc - storage/myisam/ha_myisam.cc - sql/events.cc - sql/ha_ndbcluster.cc - sql/ha_ndbcluster_binlog.cc - sql/ha_partition.cc - sql/handler.cc - sql/log_event.cc - sql/mf_iocache.cc - sql/mysqld.cc - sql/parse_tree_items.cc - sql/parse_tree_items.h - sql/rpl_binlog_sender.cc - sql/rpl_rli.cc - sql/sp_head.cc - sql/sp_instr.cc - sql/sp_rcontext.cc - sql/sql_admin.cc - sql/sql_cache.cc - sql/sql_connect.cc - sql/sql_error.cc - sql/sql_handler.cc - sql/sql_insert.cc - sql/sql_load.cc - sql/sql_parse.h - sql/sql_plugin.cc Review notes to the protocol classes and other pieces of the patch have been sent out by email. List of identified changes -------------------------- This is just for the record. Since the patch is "all-in-one", special efforts had to be taken to identify the refactoring steps. Here is a list of them in no particular order. 1. Split dispatch_command() to create_command() + dispatch_command(); 2. memset() -> wipe_net() 3. end_statement() 4. Interface to work with client capabilities: - thd->client_capabilities= client_flag; + protocol->set_client_capabilities(client_flag); + protocol->add_client_capability(CLIENT_TRANSACTIONS); + protocol->has_client_capability(CLIENT_SSL)) + protocol->remove_client_capability(CLIENT_MULTI_RESULTS); 5. Protocol::prepare_for_resend() -> start_row() 6. Protocol::write() -> end_row() 7. Protocol::remove_last_row() -> abort_row() 8. Split Protocol::send_result_set_metadata() into - ::send_result_metadata(THD, Protocol, ...) - Protocol::start_result_metadata() - Protocol::send_field_metadata() - Protocol::end_result_metadata() 9. send_result_set_row() 10. send_string_item() 11. my_net_init() -> Protocol::init_net() 12. net_end() -> Protocol::end_ned() 13. vio_shutdown() -> Protocol::shutdown() 14. my_net_write() -> Protocol::write() net_flush() -> Protocol::flush_net() 15. Protocol::get_ssl() must return (SSL *) to avoid unnecessary casts. 16. my_net_read() -> Protocol::read_packet() 17. Protocol::get_last_error() Dangerous practice: + ((Protocol_classic *) thd->protocol)->get_last_error() ? + ((Protocol_classic *) thd->protocol)->get_last_error() : ""); 18. Field refactoring 19. ACL changes + Plugin
New Protocol class hierarchy ============================ The new hierarchy consists of 4 classes: Protocol | Protocol_classic | Protocol_text | Protocol_binary Protocol is an abstract class that defines the new API. Protocol_classic is ex-Protocol class, implements core of both classic protocols - text and binary. Protocol_text and Protocol_binary are implementations of appropriate classic protocols. Protocol's API ============== The API of the Protocol class consists of 4 logical parts: 1) Receiving data from outside. This WL moves parsing of limited set of commands. Commands and new data structures dedicated to them are as follows: struct COM_INIT_DB_DATA { uchar *db_name; ulong length; }; struct COM_REFRESH_DATA { uchar options; }; struct COM_SHUTDOWN_DATA { enum mysql_enum_shutdown_level level; }; struct COM_KILL_DATA { ulong id; }; struct COM_SET_OPTION_DATA { uint opt_command; }; struct COM_STMT_EXECUTE_DATA { ulong stmt_id; ulong flags; uchar *params; ulong params_length; }; struct COM_STMT_FETCH_DATA { ulong stmt_id; ulong num_rows; }; struct COM_STMT_SEND_LONG_DATA_DATA { ulong stmt_id; uint param_number; uchar *longdata; ulong length; }; struct COM_STMT_PREPARE_DATA { char *query; uint length; }; struct COM_STMT_CLOSE_DATA { uint stmt_id; }; struct COM_STMT_RESET_DATA { uint stmt_id; }; struct COM_QUERY_DATA { char *query; uint length; }; struct COM_FIELD_LIST_DATA { uchar *table_name; uint table_name_length; uchar *query; uint query_length; }; These new structures are wrapped in a union: union COM_DATA { COM_INIT_DB_DATA com_init_db; COM_REFRESH_DATA com_refresh; COM_SHUTDOWN_DATA com_shutdown; COM_KILL_DATA com_kill; COM_SET_OPTION_DATA com_set_option; COM_STMT_EXECUTE_DATA com_stmt_execute; COM_STMT_FETCH_DATA com_stmt_fetch; COM_STMT_SEND_LONG_DATA_DATA com_stmt_send_long_data; COM_STMT_PREPARE_DATA com_stmt_prepare; COM_STMT_CLOSE_DATA com_stmt_close; COM_STMT_RESET_DATA com_stmt_reset; COM_QUERY_DATA com_query; COM_FIELD_LIST_DATA com_field_list; }; which is stored in public part of THD. The new API that uses these structures: class Protocol { ... /* read packet from client returns -1 fatal error 0 ok 1 non-fatal error */ int read_packet(); /* Parse read packet and fill com_data union. returns true malformed packet, appropriate error is thrown false packet was successfully parsed */ bool parse_packet(union COM_DATA *data); /* Creates a command. This call substitutes read_packet() + parse_packet() calls and have to be used when dispatch_command() needs to be called directly. @notes Data pointed to by cmd isn't expected to change prior to dispatch_command() returns. In case it's changed, the behavior is undefined, from wrong result to a crash. */ void create_command(COM_DATA *cmd); /* Return command from last parsed packet */ enum enum_server_command get_command() { return cmd; } ... } Most code changes for this part are done to dispatch_command() and functions it calls. Typical change is giving parsed arguments instead of pointer to the raw NET buffer and packet's length. For example: case COM_STMT_FETCH: { - mysqld_stmt_fetch(thd, packet, packet_length); + mysqld_stmt_fetch(thd, thd->com_data.com_stmt_fetch.stmt_id, + thd->com_data.com_stmt_fetch.num_rows); break; The dispatch_command method will also check if the protocol type is PROTOCOL_PLUGIN and if the command it tries to invoke is available for this type of protocol and rise an ER_PLUGGABLE_PROTOCOL_COMMAND_NOT_SUPPORTED error in case it is not. The other change is that do_command() now calls thd->protocol->read_packet() and thd->protocol->parse_packet(&thd->com_data) instead of interacting with NET directly. 2) Providing metadata Originally metadata was sent by calling Protocol::send_result_set_metadata(). To avoid sharing too much internal structures with plugins, this method is moved to THD::send_result_metadata() which uses these Protocol methods to actually send the metadata: - Protocol::start_result_metadata() - Protocol::send_field_metadata() - Protocol::end_result_metadata() Also, only for the embedded library an additional method called Protocol::send_string_metadata() is used. Protocol::end_statement() is moved to THD as well. 3) Sending data Now result is sent by THD::send_result_set_row accompanied by Protocol::start_row(), Protocol::end_row(), Protocol::abort_row() (in case of error). Latter were renamed to provide more understandable API Protocol::prepare_for_resend() -> start_row() Protocol::write() -> end_row() Protocol::remove_last_row() -> abort_row() The data itself is sent by Protocol::store(...) methods as before this WL. A supportive change: to allow Protocol to send fields in text or binary forms, the Field class now has a parent: class Proto_field { public: virtual bool send_binary(Protocol *protocol)= 0; virtual bool send_text(Protocol *protocol)= 0; }; this way, we have to share only the Proto_field class. The protocol implementation could decide in which format it like to send the field and call appropriate method. The actual format would be defined by the field implementation, as before this WL. 4) Providing result of the execution and getting clients capabilities To check client's capabilities two methods exists: Protocol::get_client_capabilities() Protocol::has_client_capability() Also there is other methods providing info about connection: Protocol::get_ssl() Protocol::get_rw_status() Protocol::get_compression() Methods for sending status of the execution: Protocol::send_ok() Protocol::send_eof() Protocol::send_error() Changes to dispatch_command() ----------------------------- As now parsing is done elsewhere, dispatch_command() uses API defined in the Protocol class. Though not all commands handled by dispatch_command were refactored to be able to use new API. To avoid crashes, dispatch_command() now checks whether the command that's about to run supports pluggable protocols. The new CF_ALLOW_PROTOCOL_PLUGIN flag set for appropriate commands in the server_command_flags array tells that. If command comes from pluggable protocol, but server doesn't support them for this command, the new error "Pluggable protocols isn't supported yet by this command" is issued and command execution is aborted. Dealing with legacy (classic) protocols ======================================= As classic protocols (text and binary) are too hard wired into the server code and it's not possible to extract them in given time frame, they're moved into dedicated classes, which implement extended API in addition to the one provided by Protocol class. This extended API is left only for compatibility reasons and after additional refactoring (in scope of a separate wL) will be removed. Thus this API shouldn't be used in any new work. Basic idea behind these changes is to hide all protocol-related internals behind the Protocol_classic class, e.g NET, packet, raw_packet, etc, and make all code in server that works with these internals to work with them only via Protocol implementation. To ensure that correct protocol (pluggable Protocol or legacy Protocol_classic) THD now has two new methods: THD::get_protocol() and THD::get_protocol_classic(). Both methods return the same THD::m_protocol, but the latter asserts that it's either legacy text or binary protocol and casts it to Protocol_classic. This ensures that old code won't get a pluggable protocol which doesn't support legacy protocol API. Protocol_classic allows server to set client's capabilities by providing two methods: Protocol_classic::set_client_capabilities() and Protocol_classic::add_client_capability(). Hiding NET ---------- As it's not possible to move NET from THD to Protocol (due to complex interdependency between text and binary protocols which can't be solved in given timeframe) NET is kept in THD but moved to private scope. Same is done to THD::packet. To keep server functioning, Protocol has a new API consisting of two parts. First one allows server to interact with NET and VIO behind the Protocol: class Protocol_classic { ... /* Initialize NET */ bool init_net(Vio *vio); /* Deinitialize NET */ void end_net(); /* Flush NET buffer */ bool flush_net(); /* Write data to NET buffer */ bool write(const uchar *ptr, size_t len); /* Return last error from NET */ uchar get_error(); /* Return last errno from NET */ uint get_last_errno(); /* Set NET errno to handled by caller */ void set_last_errno(uint err); /* Return last error string */ char *get_last_error(); /* Set max allowed packet size */ void set_max_packet_size(ulong max_packet_size); /* Return SSL descriptor, if any */ void *get_ssl(); /* Deinitialize VIO */ int shutdown_vio(); /* Wipe NET with zeros */ void wipe_net(); /* Check whether VIO is healhty */ bool vio_ok(); ... } Second part of this API is a temporary one and is needed to allow code that still parses packets on it own to keep working. Basically it returns NET or different other things considered to be Protocol's internals. This part of the API have to be removed when all packet parsing is moved to Protocol class. class Protocol_classic { ... /* Return NET */ NET *get_net(); /* return VIO */ Vio *get_vio(); /* Set VIO */ void set_vio(Vio *vio); /* Set packet number */ void set_pkt_nr(uint pkt_nr); /* Return packet number */ uint get_pkt_nr(); /* return packet string */ String *get_packet(); /* return packet length */ uint get_packet_length(); /* return raw packet buffer */ uchar *get_raw_packet(); ... } Beside introducing new API code needs to start using that API. Those changes are fairly trivial replacements, e.g. replace "&thd->net" with "thd->get_protocol()->get_net()". Some additional changes include: ------------------------------- MPVIO_EXT: *) removed NET* and added pointer to Protocol *) removed client_capabilities and added it to Protocol *) removed client_capabilities from thread and use the Protocol capabilities
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.