WL#1803: Prepared Statements: overview of tasks

Affects: Server-7.1   —   Status: Assigned

This is a meta worklog task for design flaws, bugs, feature requests in prepared
statements. It's split into two parts: outstanding and closed issues.
An issue may have or have not a separate worklog task, depending on its size.
If a task exists, a reference to it is present. All issues are described
shortly, in order to allow ease of maintenance: once a work on a problem 
is started, it's provided with an elaborate description.

-------------------------------------------------------------------------------
Table of contents.
-------------------------------------------------------------------------------
1. Outstanding issues.
1a. References to open worklog tasks.
2. Solved issues.
2a. References to closed worklog tasks and design documents.

-------------------------------------------------------------------------------
1. Outstanding issues.
-------------------------------------------------------------------------------
Binary protocol without prepare
===============================

Add a way to execute any statement using the prepared statements
API and get results back in the binary format without having to
prepare  the statement first. In other words, the benefits of the
binary protocol should be available without having to prepare the
query. This is WL#4627 "Prepared Statements: add C API call
mysql_stmt_execute_immediate()"

Prepare any SQL
===============

Total list of SQL commands not supported in prepared statemnts mode is:

ALTER_DB, ANALYZE, ASSIGN_TO_KEYCACHE, BACKUP_TABLE, CHANGE_DB
CHANGE_MASTER, CHECK, CHECKSUM, COMMIT, CREATE_DB, CREATE_FUNCTION, 
CREATE_INDEX, DROP_DB, DROP_FUNCTION, DROP_INDEX, DROP_USER
FLUSH, GRANT, HA_CLOSE, HA_OPEN, HA_READ, HELP, KILL, LOAD
LOAD_MASTER_DATA, LOAD_MASTER_TABLE, PRELOAD_KEYS, PURGE,
PURGE_BEFORE, REPAIR, REPLACE_SELECT, RESET, REVOKE, REVOKE_ALL
SQLCOM_ROLLBACK, ROLLBACK_TO_SAVEPOINT, SAVEPOINT, SET_OPTION
SHOW_BINLOG_EVENTS, SHOW_BINLOGS, SHOW_CREATE, SHOW_ERRORS
SHOW_INNODB_STATUS, SHOW_MASTER_STAT, SHOW_NEW_MASTER, SHOW_SLAVE_HOSTS
SHOW_SLAVE_STAT, SHOW_WARNS, SLAVE_START, SLAVE_STOP, TRUNCATE, UPDATE_MULTI
RESTORE_TABLE

The problem is aggravated by the fact that most commands not supported
in prepared statements not function in stored procedures either.

This is WL#2871 Prepare any SQL

Named placeholders
==================
We need to support named placeholders:
SELECT :hello, :world

Insufficient result set metadata for SHOW WARNINGS/ERRORS
=========================================================
- Georg notes in 2003: we dont' send result set metadata for SHOW commands at
  prepare stage.

DECISION:
SQL_SHOW_ commands must be fixed.
Most of them were fixed with introduction of INFORMATION_SCHEMA.
SHOW WARNINGS/ERRORS and HELP are the remaining violators.
This issue also pops up in BUG#2812.

Piecewise blob retrieval
========================
We need to explicitly state that currently mysql_fetch_column doesn't
perform piece-by-piece LOB fetching and is just a stub for future
extensions. Probably we should provide user with pointer to internal LOB
storage to prevent excessive data copying.

Georg says: DbSl requires that we're able to send blobs bigger than
max_allowed_packet.

DECISION:
We plan to support piecewise data fetching at some point.

Array execution of statements
=============================
We need to have ability to bind array of columns (Mark Matthews,
Georg Richter, Guenter Drach), example:

stmt= mysql_stmt_prepare("insert into foo (id) values (?)");
int values[100];
bind.buffer= (char *) values;
mysql_execute(stmt, 100); // all values are sent in one packet

The same principle should be applicable for result set data.

DECISION:
MYSQL_BIND structure was designed with this in mind.
We need to extend it to support binding of arrays soon.
Monty proposed sample extensions to MYSQL_BIND structure and mysql_execute
call.
We added support in binary protocol for this item before MySQL 4.1.2 got out.

Support parameter markers universally in SQL grammar
====================================================
We need to support placeholders inside character set specifiers:
SELECT _utf8 ?


Each SQL type needs an own type code
====================================
VARBINARY datatype needs typecode in MySQL C API.

A more efficient binlogging for PS execute
==========================================
Currently prepared statements are always binlogged in text format,
with all parameter markers inlined (statement-based replication).
It is in theory possible to only binlog the statement id
and parameter data.

Increase the maximum number of parameter markers per PS from 65535
==================================================================
Client-server protocol wise, a prepared statement can have no more than
65535 placeholders (only two bytes are reserved for placeholder count).
This should be fixed by extending the protocol.

Rich column metadata
====================
Column metadata should support three distinct fields for each
text column: 
character length, display length and byte length


Connector/C: Prepared statement execute overwrites connection status
====================================================================
Each prepare or execute of a prepared statement overwrites the
global connection status in Connector/C.

DECISION:
Factor out the execution API from cli_advanced_command().

Connector/C: insufficient test coverage for errors
==================================================
We need to check that all sequences of new API calls are SIGSEGV robust
and reasonable error messages are given in case of misuse of API.

Connector/C: better tests for fetching variable-length types
============================================================
* for variable length types we need to be able to bind pointer to type
    length, instead of providing length by value - this is the only way
    values of variable length types can be reused without performing
    rebind between executions.

    Example:

    bind_array[position].buffer_type= MYSQL_TYPE_STRING;
    bind_array[position].buffer= (char *) value;
    bind_array[position].buffer_length= strlen(value);

    mysql_bind_param(stmt, bind_array);
    mysql_execute(stmt);

    /* value is changed and has a new string */
    /* oops, we need to rebind it
        because value length has chanded too! */

Monty explains:
This is why we have both 'length' and 'buffer_length' variables in MYSQL_BIND.
For character strings you should set length to the pointer, containing
length of the string, for fixed lenght types you can use buffer_length.
This explaination should be added to the manual.

TODO: add tests of this item into client_test.c

Set a bound column to a default value
=====================================
'set default' dedicated placeholder value in addition to 'set
null' in PS binary protocol

Metadata of SQL prepared statements
===================================
Support ALLOCATE DESCRIPTOR, DESCRIBE, GET DESCRIPTOR, DEALLOCATE DESCRIPTOR -
embedded to dynamic sql statements that allow you to describe dynamic sql
statement metadata;



-------------------------------------------------------------------------------
1a. References to open WL tasks.
-------------------------------------------------------------------------------

Other open worklog tasks for prepared statements:
WL#945  -- Extensions to server/client protocol
WL#1055 -- Prepared statements - Shared Global Cache
WL#1085 -- Prepared Statements and the General Query Log
WL#2661 -- Prepared Statements: Dynamic SQL in Stored Procedures
WL#2793 --Prepared Statements: convenient syntax of SQL prepared statements for
use in Dynamic SQL

Architecture bugs
=================
BUG#12567 Wrong result of mysql_real_query (EXPLAIN metadata is garbled,
compared to normal SELECT, in case of an error)
BUG#8550 -- needs a new system so that user requests like that one do not arise
BUG#2812 wrong value for stmt->field_count
BUG#20169 and others, it's confusing now to use
the same set of typecodes for host language types and for sql types
BUG#1382
BUG#23385
BUG#19621 Character set introducer before parameter substitution causes
syntax error


For the list of closed WL tasks, see the end of this document.


-----------------------------------------------------------------------------
2. Solved items.
-----------------------------------------------------------------------------
PROBLEM:
- we need to be able to bind output parameters of stored procedures in
prepared mode. BUG#17898
- stored procedures can produce several result set. We need to have support
for it in prepared mode.

DECISION:
Yes, we do (Saint-Petersbourg stored procedures session).


PROBLEM (SOLVED):
- We need to implement support of prepared statements in query cache.
DECISION:
This is WL#1569 "Query cache for prepared statements"

PROBLEM (SOLVED):
- mysql_stmt_bind_param(): why require is_null to be a pointer? why not use it
  as a boolean variable inside MYSQL_BIND structure?
Monty explains:
- if your data is always NULL, use MYSQL_TYPE_NULL to bind it.
- if your data is always NOT NULL, set is_null= (my_bool *) 0
- in all other cases you should provide pointer to a my_bool variable
and change value referenced by pointer between executions.
This explaination should be added to the manual.

(communicated to the docs team).

PROBLEM (SOLVED)
- we need to support placeholders in LIMIT clause
- there are other places in our grammar where we don't support placeholders
  unlike other RDBMS. One of them is explicitly requested by Guenther (SAP):
  LIKE ? ESCAPE ? clause.
DECISION:
 This is WL#1785 "Prepared statements: implement support for placeholders in
 LIMIT clause" which is in Konstantin's sprint now.

(WL#1785 is complete)


PROBLEM (SOLVED)
After Guilhem patch with implementation of SET NAMES commands family is
pushed we need to verify that character sets conversion works in prepared
statements.


PROBLEM (SOLVED)
There should be statistic variables for COM_PREPARE and COM_EXECUTE,
as there are for COM_QUERY.
See also:
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=5&mail=28611

Solution: WL#2379 "Prepared Statements: add status variables", which
was implemented.

PROBLEM (SOLVED)
Add a variable that limits the total number of prepared statements in the
server. This is necessary to control statement leaks in application
and protect the server against DoS attack when statements are created
without being freed.

Solution: BUG#16365 which was fixed in 4.1.19 and 5.0.20.

PROBLEM (SOLVED)
Logging of prepared statements to the general query log should be extended.
Currently when a prepared statement is executed, the log gets only 'Com_execute'
message and nothing else (no statement id, text, placeholder values).
This is a security breach as the general query log is used for audit 
purposes.

An expanded query is now always written to the general and slow logs.
(WL#1085)


PROBLEM (SOLVED)
Client-server protocol wise, mysql_stmt_prepare() has no means to report
warnings. This should be fixed by extending the protocol.

Implemented by Monty in 5.0 by extending COM_STMT_PREPARE reply packet.

PROBLEM (SOLVED)
- logging of prepare and execute commands was added to the general
  query log (--log) in 4.1.9 and 5.0.3.
  This is mainly needed for audit and performance monitoring purposes.

PROBLEM (SOLVED):
We need to report truncations when data is converted from binary format
to client side buffers by the client library.
VERSION: 5.0

Conversion takes place when a value of one type is saved in a buffer
of another type. For any result set column a user can provide
buffer of any host language type, hence the client library has to support
any conversion sequence.
Possible host language buffer types are: unsigned and signed char, short, 
int, long, long long; float and double; char * and MYSQL_TIME.
Possible column types include all columns types allowed by the server,
where each column has its' native representation in a buffer of 
host language type. For example for MYSQL_TYPE_LONG the native
representation is a 32-bit signed integer.
When the client side buffer doesn't strictly correspond to the native type
of a column, a truncation is possible.
Reasons of truncation are:
- data is out of range of the target type. E.g. on attempt to save a long 
number in 16-bit integer buffer
- loss of precision when saving data. E.g. on attempt to save a long long 
value in a buffer of type 'float'.
- data is not conversion compatible. E.g. on attempt to save a string
in a numeric buffer, or convert a date to time.
- overflow or underflow when converting a string representation of 
a number to its binary representation.
All these truncations should be reported by the client library.
Note: the current exception is string -> floating point conversion, 
where we have no means to track down underflow or overflow.

Implementation.

MYSQL_BIND structure, which is used by the user to describe target buffers
of result set data, should be extended with my_bool *error member.
If at least one error pointer of the mysql_stmt_bind_result bind list is not 
0, the client library should report truncations happened during row fetch.
Reporting is performed from the client API call mysql_stmt_fetch(),
which gets additional return value MYSQL_DATA_TRUNCATED.
If no 'error' buffer of the bind list is set, mysql_stmt_fetch() behaves as
before (which is: returns 0 for successful fetch, MYSQL_NO_DATA for
 after-last position, and 1 otherwise).

UPDATE: The patch for this item has been submitted for review.
bk commit - 4.1 tree (konstantin:1.2154)
UPDATE: Pushed into 5.0 tree, is available since 5.0.3

PROBLEM (SOLVED)
We need call 'mysql_stmt_field_count'. 
Added in 4.1.3,  bk commit - 4.1 tree (konstantin:1.1895)

PROBLEM (SOLVED):
- some calls of new API have mysql_ prefix, while some have mysql_stmt_,
  The reason is that there are similar calls in the old API, but operating on
  the entire connection. There is no easy way to remember prefix for a call,
  and  now you have to look at docs each time to find out if a call doesn't
  have '_stmt_' in its name.  What if we rename all calls to have mysql_stmt_
  prefix? This will make clear distinction between old and new APIs and make
  all calls of new API easy to remember.
  Brian says: we probably should call mysql_fetch mysql_cursor_fetch, because
  in future it will be used to fetch from cursor.
  Georg says: lots of drivers already use our API.

DECISION:
We will rename all new calls to have mysql_stmt_ prefix, including
mysql_fetch. Konstantin will make patch for source and documentation trees
ASAP.
This was done and pushed into 4.1.2

PROBLEM (SOLVED):
- we need mysql_stmt_execute_direct() - with that call we can claim that the
  new prepared statement API is complete replacement of old API. This is a
  replacement of mysql_real_query, with such advantages as usage of binary
  protocol to send data to the client, and ability to supply output bind
  parameters without calling mysql_prepare.

  The problem is that with this call we still need to be able to set execution
  flags before statement execution, to open cursors. An example:

   stmt= mysql_prepare(mysql, query, query_length);
   mysql_stmt_set_attr(stmt, MYSQL_STMT_OPEN_CURSOR);
   stmt= mysql_execute(stmt);

   (prepare/execute/fetch of this and other statements)
   with mysql_stmt_execute_direct we don't have place to set statement
   attributes.

DECISION:
We split mysql_prepare call in two:
stmt= mysql_stmt_init(mysql);
mysql_stmt_prepare(stmt, query, query_length);
This was done and pushed into MySQL 4.1.2

PROBLEM (SOLVED):
- we need mysql_stmt_insert_id  (Georg Richter: we need to copy
  all data of 'OK' packet to the statement).
This was done and pushed into 4.1.2.
However better support in the client library is needed, this is not
yet described anywhere, but mysql_stmt_insert_id doesn't work as documented
and does not always behave in the same way as mysql_insert_id.

PROBLEM (SOLVED):
- we need to send types of placeholders to the client
  in reply to COM_PREPARE command. This is a requirement to be able to do
  strict type checking on client side. (Sun request)

  Georg Richter says: in PHP (probably also in Perl) we have the same problem
  - we have to specify types of marker bound variables explicitly.
  ODBC has calls to access this data too.

DECISION: we need to reserve place for this data in the protocol before
MySQL 4.1 goes beta. At the moment we will just send MYSQL_TYPE_UNKNOWN
for all placeholders.
It was verified that current client library will malfunction or get SIGSEGV if
server replies with packet in new format. The library needs to be fixed.

Support for this was added to the binary protocol. However types
of placeholders are not tracked now.

PROBLEM (SOLVED):
  * we should check for missing C types we can bind via mysql_bind_param:
    unsigned types and long double can't be bound without truncation at
    the moment.
    If unsigned types can be bound as if they were the same as their signed
    counterparts, it should be stated in the manual.

DECISION:

No decision's been made for 'long double' type yet.
For all unsigned types we added 'is_unsigned' member to MYSQL_BIND
structure.

PROBLEM (SOLVED):
  * do we really need to call mysql_stmt_reset just to reset long data
    state of the statement on server side? If we don't need
    mysql_stmt_reset for long data, do we need it at all, because if an
    error has happened on server side, we can just force the user to
    close this statement and create a new one?

DECISION:
We should not require calling mysql_stmt_reset just to reset long
data state of placeholders.
If long data for one of placeholders wasn't supplied, its values will
be taken from MYSQL_BIND array.
To implement it we will ensure that calls 'mysql_stmt_execute' and
'mysql_stmt_reset' reset long data state of placeholders on both client
and server sides.
We should keep mysql_stmt_reset because it's purpose is more generic
than to reset long data errors, though there is only this use of it
at the moment.
We should require mysql_stmt_reset to always send reply to the client.

All of the above was done.


- mysql_send_long_data:

PROBLEM:
  * we don't send any reply to this command. In case of error the report will
    be deferred till mysql_execute. This is not very convenient.

DECISION:
  We won't change it. Reply to each packet slow downs communication too much
  in case when packet size is small. Monty says: common size of packet in ODBC
  is 1K.


PROBLEM:
  * mysql_send_long_data should be able to send long
    data for any placeholder, if placeholder type is compatible with long
    data. An example to clarify the problem:

    INSERT INTO two_blobs_table (id, lob1, lob2) VALUES (?, ?, ?);

    In this case it should be possible to send long data both for lob1 and
    lob2.

DECISION:
Yes, it should be possible already. If it is not, it's a bug.

It was verified that it's possible.


PROBLEM:
  * mysql_stmt_send_long_data doesn't support character set conversions
    and doesn't have/make distinction between BLOBS and CLOBS.

DECISION:
This is a bug. We need to make conversion of long data to the server character
set at execution time. To distinguish BLOBs and CLOBs we will use parameter
type value sent to server in MYSQL_BIND array.
When discussing this item it was noted, that implementation of
logging of prepared statements is very inefficient and should
be rewritten.

PROBLEM
Query cache in prepared statements

Note added by Trudy Pelzer, 2007-05-25
At the Dev-MT Offsite meeting in Santa Cruz, Brian and Monty 
made the following time estimate for WL#1569:
- prepared statements (WL#1569 and others); 4+ mths
* This is 4 tasks:
** WL#1569 "Prepared Statements: implement support of Query Cache"
** WL#? include all SQL
** WL#? update server for max ps statements
** WL#? make ps able to recreate itself if it doesn't exist
* Brian: Prepared statements aren't useful until this 4-part
task is complete.
* Current estimate: Update for qcache: 1 mth. Update client library 
to recreate ps if it goes missing: 1 week. Update server to know 
about max number of statements it remembers: 2 weeks. Include all 
sql: 2 months (this includes needed discussion but not the extra
time QA needs to adjust mysqltest for the new feature).

On May 25th, Kostja and Guilhem agreed that sufficient work has been 
done (fixing BUG#735 and BUG#26842) that WL#1569 can be considered
complete. The other 3 tasks that Monty and Brian determined should
be done to make prepared statements fully useful will still be 
deferred so this note has been added to this  WL#1803; it now contains
a complete list of work that still needs to be done to make prepared 
statements full-featured.


-------------------------------------------------------------------------------
2a. References to closed worklog tasks and design documents.
-------------------------------------------------------------------------------

A very early spec for prepared statements on Wiki:
https://intranet.mysql.com/secure/wiki/Client-Server-Protocol-Enhancement

There also exists several other WL entries for prepared statements:
WL#1564 -- Intensive test of prepared statements via 'mysql' [Client-Sprint]
 - complete
WL#1014 -- Port client_test.c to normal test library [Server-Sprint]
(Matthias) - complete
WL#1622 -- SQL Syntax for Prepared Statements [Server-Sprint]
(SergeyP) - complete
WL#1781   -- Testing and profiling prepared statements [Benchmarks-Sprint]
(Walrus)
WL#1569 -- Prepared statements and query cache

WL#4435: Support OUT-parameters in prepared statements
BUG#17898 No straightforward way to deal with output parameters