WL#12542: Query attributes

Affects: Server-8.0   —   Status: Complete

Often times applications want to be able to set per query metadata for their
queries, e.g:
* The URL of the page that produced the query
* Extra processing information to be passed with the query to e.g. an audit
plugin or a query rewrite plugin

Currently this is worked around with specially formatted comments. But not all
apps can change the query text to include these. 
Or with connection attributes. But sometimes these are not flexible enough.
Or with user variables. But these are not visible enough and require an extra
round trip.

This worklog is to introduce support for the clients to supply query attributes to the server.

This is based on contributions from facebook (BUG#92532/BUG#90420), but it's not taking the contribution verbatim. Just the requirements.

This is also the Minimum Viable Product. More extensions can optionally be added later as the need arises. 
=== C API requirements ===

* CFR1: It must be possible to supply query attributes for the next query through the C API
* CFR2: The supplied list of query attributes is only valid until the query is processed
* CFR3: both the blocking (mysql_[real_]query) and the async client (mysql_real_query_nonblocking) will send the accumulated query attributes and reset them
* CFR4: Supplying empty name (null or empty string)  or not supplying it at all for a query attribute name makes it invisible for the server when it searches by name.
* CFR5.1.: if an unsupported type is supplied an error is returned.
* CFR6: the attribute names are interpreted as being in the client character set for the connection
* CFR7: duplicate attribute names can be supplied. No check for uniqueness is done on the client side. Both are transferred to the server
* CFR8:  There's currently no way to specify named query attributes for mysql_stmt_execute().
* CFR9: mysql_reset_connection() will also reset the query attributes

== MySQL protocol requirements ===

* PFR1: The changes to the protocol should be backward compatible: i.e. older clients will work with newer servers and vice versa. Of course in these cases the query attributes will not be transferred to the other party that doesn't support them.

== Server requirements ===

* SFR1: The server will receive the accumulated list of query attributes from the client together with the query execution request itself.
* SFR2: The server will keep the received list of query attributes at least until the end of the execution of the current RPC command. 
* SFR3: each server/client session will have its own set of query attributes
* SFR4: the server will provide a read only iterator component service for plugins and components to read the accumulated query attributes.
* SFR5: a SQL function in a separate component will be supplied by the server to return the value of the attribute as a string (or NULL) given its name for the current session
** SFR5.1: all non-string parameter binds will be converted to string
** SFR5.2: the function will work in both MYSQL_STMT_EXECUTE (if names for the parameters are provided) and MYSQL_QUERY
* SFR6: The values of the parameters will be available to all commands of a multi-query COM_QUERY RPC command.
* SFR7: Query attributes are not propagated through replication topologies, i.e. INSERT INTO tbl VALUES (mysql_query_attribute_string("foo"));  will produce different results on the replica. 
* SRF8: query attributes are immutable to the server and cannot be altered.
* SFR9: PREPARE SQL command can't supply query attributes. 
* SFR10: it should be impossible to specify mysql_query_attribute_string() in a DDL statement, especially in a replication context. 

== Client tools requirements ==

* CLFR1: the mysql binary will have a new command to supply query attributes for the next query
* CLFR2: the mysqltest binary will have a new command to supply query attributes  for the next query
* CLFR3: mysql/mysqltest commands will apply only to the first query following these
* CLFR4: mysql/mysqltest will take a maximum of 32 parameters
* CLFR5: the maximum length of a name or a value mysql can process is 1024.
We will send the query attributes in an optional part of the COM_QUERY RPCs.
The format will be similar to the format used for query parameters in COM_STMT_EXECUTE (so a binary format), but will, in addition have the names next to the data types in the first row and the value rows will be prefixed by a count of value sets. 
Currently of course we will only support a single value set. Anything else will trigger an error. 

=== Use cases ===

Use case 1: query metadata. (FB/booking).

Apps can benefit from passing on app-defined data to back-end server plugins and components. E.g. allow logging of the application user in the audit log. Or log the page the query is coming from. Note that there's no specific need to do that via SQL. Any backend will do. 

Use case 2:  COM_QUERY with parameters. (booking.com)

Apps can use SQL positional parameters to avoid having to embed (and escape) constants into the query text. But for that they need to use prepared statements. And that means at least 2 round trips: COM_STMT_PREPARE and COM_STMT_EXECUTE. And some memory handling (allocate the statement, and deallocate it). This quickly adds up if you do LOTS of dynamically generated relatively simple and fast queries. 
Query attributes offer a way for an application to do the above in a single round trip by extending COM_QUERY. 
Also a side effect is that the data are passed in binary format. This can be beneficial if one is passing lots of e.g. integers vs their string representation. 
This story would be so much more complete if we had SQL syntax for named parameters in the fashion MS SQL has them, e.g. "SELECT :param1: + 12 FROM t1 WHERE a  < :param1"". But, since it's much easier to add then it is to extract, testing the waters with the UDF is a good and valid first step. 

=== Architectural considerations ===

Query attributes are very similar in functionality they provide to prepared statement parameters. The principle differences between the two are:

- Because of the presence of the prepare call the server knows exactly how many query parameters (and of what types) to expect. And the server's count and use of parameters is the all-important one. Not what it receives over the wire. The rest of the values are simply not processed and discarded. With query attributes we receive and store all of the supplied attributes and their data into the server's session in hope that the query execution (or supporting layers like e.g. auditing etc that get triggered by it) will request the values of these. 
- mysql prepared statements parameters do not have (and will not have even after this worklog is pushed) names for parameters. They rather do it by position. The rationale for this is that using query attributes brings very little value over using positional parameters that the prepared statements already have. We want to keep the door open and add the network bits to support this if we decide. But we don't want to do it right away unless there's future demand. Doing it will be easy: we need a new C API function to supply the names and we need to embed these names when sending the parameter values. 
- prepared statements can deal with long parameter data. I.e. you say at execution time that you will be supplying a stream of data post-execution and the server will expect that you do that in separate set of round-trips. Query attributes do not do that. They have a fixed maximum value size limit and are sent in their entirety (and cached into the server session) before the query processing starts.
- prepared statement parameter binds "survive" the execution and can be re-used for multiple executions. Query attributes are reset after each successful execution.

=== Typical life-cycle of the query attributes processing ===

1. the client app supplies zero or more query attributes by calling mysql_bind_param()
2. the client app executes a query via mysql_real_query(). 
3. mysql_real_query() collects the query and the parameter values, serializes them into the COM_QUERY command and sends that over the wire. Then it disposes of the binding data. 
4. the server reads the COM_QUERY data and copies the values and metadata of all  named parameters into the THD
5. Now server-side code (plugins, components and the server code itself) can now access the values through an iterator by using the component services suite. Optionally the query too can use the values supplied though the user defined function that works with the component services to retrieve the values and provide them (as read-only values) to the SQL execution engine. 
6. Once the query execution is complete the server, as part of cleaning things up for the next query, will dispose of the query attributes values and metadata cached inside the THD.

This worklog will not (at least at this point) provide a way to attach names to prepared statements parameters as supplied vi COM_STMT_EXECUTE. These will remain unnamed. But, for future-proofness, we will extend the COM_STMT_EXECUTE packet as well to allow adding them at a later point without having to add yet another capability flag. 

=== Implementation in bold strokes ===

For backward compatibility we will add a new capability flag called CLIENT_QUERY_ATTRIBUTES. Both the server and the client will have it on by default. If not on the network format is as it was before this worklog.

We will use the same MYSQL_BIND structures to bind the query attributes as we do for prepared statement parameters. 

To submit the query parameters binding we will add the following C API:

bool STDCALL mysql_bind_param(MYSQL *mysql, unsigned n_params,
                              MYSQL_BIND *binds, const char **names);

This function will take a MYSQL handle, a number of query attributes supplied and two arrays: one of the filled in MYSQL_BIND structures and one with the names of the query attributes.
Note that there's no uniqueness check done on the names. There can be several attributes with the same name. And they will all be serialized and sent over tho the server. In this case the order is important. 

It will copy the relevant bits into the MYSQL->extension structure and continue.
The next mysql_real_query() call will fetch the values from the MYSQL_BIND pointers and will amend the COM_QUERY RPC if the capability flag is on.

If the capability flag is on we will slightly extend (by adding the number of parameters to process and a length encoded name slot for each parameter) the COM_STMT_EXECUTE statement too to support named query attributes passed in the same batch as the prepared statement parameter values.

At the server side if the capability flag is on the server will fetch the query parameter binding info from COM_QUERY/COM_STMT_EXECUTE and will store it into the COM_QUERY/COM_STMT_EXECUTE packet data. Then dispatch_command() will store the query attributes values into the THD structure so they're accessible to the component service until cleaned up.  
After processing the command dispatch_command() will clean the THD query attributes data.

There will be a new component service to read the data from the THD and return it to callers. Note that we do no synchronization of the calls to this service thus all calls to it should be about the current thread the code is executing on. 

There will be a new, optional component implementing a SQL user defined function to return the query attribute value by name to SQL as a string.

A new mysql/mysqltest command line command "query_attributes" will be added taking pairs of strings representing name and value (up to 1024 symbols each in total). It will then store the values in the mysql global structure and will produce string binding calling mysql_bind_param() so that the next mysql_real_query() will pick these up.
The format of the string parameters is the same as the format documented for the delimiter command's argument (from https://dev.mysql.com/doc/refman/8.0/en/mysql-commands.html), namely:
The string can be specified as an unquoted or quoted argument on the command line. Quoting can be done with either single quote ('), double quote ("), or backtick (`) characters. To include a quote within a quoted string, either quote the string with a different quote character or escape the quote with a backslash (\) character. For an unquoted argument, the delimiter is read up to the first space or end of line. For a quoted argument, the delimiter is read up to the matching quote on the line. 
If you call the mysql command a second time without a SQL command inbetween the cumulative effect of the multiple query_attribute commands will be equal to the effect of just the last one.
If the query attributes are an odd count the client app will produce an error. 

After the query execution completes the global mysql structure will be cleaned up so the next query won't get it. 

If a reconnect occurs while mysql executes the query the query attributes will be re-bound again.

The names for the query attributes will be transmitted over the wire in the charset they're supplied in via the C API (the client charset). 

Format of COM_QUERY

int<1> 	command 	0x03: COM_QUERY
int 	parameter_count 	Number of parameters
int 	parameter_set_count 	Number of parameter sets. Currently always 1
if parameter_count > 0 {
binary 	null_bitmap 	NULL bitmap, length= (num_params + 7) / 8
int<1> 	new_params_bind_flag 	Always 1. Malformed packet error if not 1
if new_params_bind_flag, for each parameter {
int<2> 	param_type_and_flag 	Parameter type (2 bytes). The MSB is reserved for unsigned flag
string 	parameter name 	String
binary 	parameter_values 	value of each parameter: Binary Protocol Value
string 	query 	the text of the SQL query to execute 


COM_STMT_EXECUTE asks the server to execute a prepared statement as identified by statement_id.

It sends the values for the placeholders of the prepared statement (if it contained any) in Binary Protocol Value form. The type of each parameter is made up of two bytes:

    the type as in enum_field_types
    a flag byte which has the highest bit set if the type is unsigned [80]

The num_params used for this packet reffers to num_params of the COM_STMT_PREPARE_OK of the corresponsing prepared statement.

The server will use the first num_params (from prepare) parameter values to satisfy the positional anonymous question mark parameters in the statement executed regardless of whether they have names supplied or not. The rest num_remaining_attrs parameter values will just be stored into the THD and if they have a name they can later be accessed as query attributes. If any of the first num_params parameter values has a name supplied they could then be accessed as a query attribute too. If supplied, parameter_count will overwrite the num_params value by eventually adding a non-zero num_params_remaining value to the original num_params.


Payload Type	Name	Description
int<1> 	status 	[0x17] COM_STMT_EXECUTE
int<4> 	statement_id 	ID of the prepared statement to execute
int<1> 	flags 	Flags. See enum_cursor_type
int<4> 	iteration_count 	Number of times to execute the statement. Currently always 1.
int 	parameter_count 	The number of parameter metadata and values supplied. Overrrides the count coming from prepare (num_params) if present.
if num_params > 0 {
binary 	null_bitmap 	NULL bitmap, length= (num_params + 7) / 8
int<1> 	new_params_bind_flag 	Flag if parameters must be re-bound
if new_params_bind_flag, for each parameter {
int<2> 	parameter_type 	Type of the paremeter value. See enum_field_type
string 	parameter_name 	Name of the parameter or empty if not present
binary 	parameter_values 	value of each parameter 

The service definitions for the component services added



DEFINE_SERVICE_HANDLE(my_h_string);  // from mysql_string.h

#ifdef __cplusplus
class THD;
#define MYSQL_THD THD *
#define MYSQL_THD void *


  @ingroup group_components_services_inventory

  A service to fetch the query attributes for the current thread

  Use in conjuntion with all the related services that operate on thread ids
  @sa mysql_component_mysql_query_attributes_imp
  Creates iterator that iterates through all parameters supplied.
  The iterator will be positioned at the first parameter if any.

  @param thd The thread handle for the thread you want data for. It NULL it
             will read the current thread.
  @param the name of the parameter to position on. UTF8mb4. if non-NULL will
         position the iterator on the first occurance of a parameter with the
         specified name, if any.
  @param [out] out_iterator place to store the iterator handle.
  @return Status of performed operation
  @retval false success. Can read data.
  @retval true failure. Either failed to initialize iterator or no parameters
  to read.
DECLARE_BOOL_METHOD(create, (MYSQL_THD thd, const char *name,
                             mysqlh_query_attributes_iterator *out_iterator));
  Gets the type of element pointed to by the iterator.

  @param iterator query attributes iterator handle.
  @param [out] out_type place to store the parameter type.
  @return status of performed operation
  @retval false success
  @retval true Invalid iterator or not on an element
DECLARE_BOOL_METHOD(get_type, (mysqlh_query_attributes_iterator iter,
                               enum enum_field_types *out_type));
  Advances specified iterator to next element.

  @param iterator iterator handle to advance.
  @return Status of performed operation and validity of iterator after
  @retval false success
  @retval true Failure or no more elements
DECLARE_BOOL_METHOD(next, (mysqlh_query_attributes_iterator iter));
  Gets the name of the parameter.

  @param iterator query attributes iterator handle.
  @param[out] out_name_handle the name of the parameter if supplied.
              Otherwise a nullptr.
  @return status of operation
  @retval false Valid
  @retval true Invalid iterator or not on an element
DECLARE_BOOL_METHOD(get_name, (mysqlh_query_attributes_iterator iter,
                               my_h_string *out_name_handle));
  Releases the Service Implementations iterator. Releases read lock on the

  @param iterator Service Implementation iterator handle.
  @return Status of performed operation
  @retval false success
  @retval true failure
DECLARE_METHOD(void, release, (mysqlh_query_attributes_iterator iter));

  @ingroup group_components_services_inventory

  A service to fetch the query attribute value as a string

  Use in conjuntion with mysql_query_atrributes_iterator service

  @sa mysql_component_mysql_query_attributes_imp
  Gets the parameter as a string

  @param iterator query attributes iterator handle.
  @param[out] out_string_value the value.
  @return status of operation
  @retval false success. out_string_value valid
  @retval true Invalid iterator or a null value
DECLARE_BOOL_METHOD(get, (mysqlh_query_attributes_iterator iter,
                          my_h_string *out_string_value));

  @ingroup group_components_services_inventory

  A service to fetch the query attribute null flag

  Use in conjuntion with mysql_query_atrributes_iterator service

  @sa mysql_component_mysql_query_attributes_imp
  Checks if the parameter value is a null

  @param iterator query attributes iterator handle.
  @param[out] out_null set to true if the value is NULL. false otherwise.
  @return status of operation
  @retval false success. out_null valid
  @retval true Invalid iterator no current element
                    (mysqlh_query_attributes_iterator iter, bool *out_null));


UDF signature of the new query_attributes UDF (as implemented into the query_attributes component):

CREATE FUNCTION mysql_query_attribute_string(attribute_name varchar charset utf8mb4) returns varchar charset utf8mb4; -- can return NULL too.

Note that, for simplicity, the UDF will return the value of the first attribute with a given name. One can use the component service to get access to all values if need be.