WL#3565: Functions call with IGNORE_SPACE

Affects: Server-5.1   —   Status: Complete

This Work Log is to track an incompatible change introduced by the fix for
BUG#21114.

The current implementation of SQL_MODE=IGNORE_SPACE is causing undesirable
side effects in the parser, as noted in BUG#21114.

To fix the bug correctly, the parser need to *not* honor IGNORE_SPACE,
but to rely on context so that the following syntaxes :
- "foo" "("
- "foo"  "("
will both refer to the same function when a function call is expected,
or will both refer to the proper construct (like "CREATE TABLE foo (..."
or "REFECENCES foo (" when a function call is not expected,
regardless of the SQL_MODE used (with or without IGNORE_SPACE)

In the long term, the SQL_MODE IGNORE_SPACE needs to be obsoleted,
but this is not the scope of the present Work Log.

For the functions impacted by the implementation of BUG#21114,
the fix introduces the following incompatible change:
for code that :
- would run only with IGNORE_SPACE=FALSE
- would use the "foo"  "(" syntax to refer to a stored function foo,
  as opposed to the native MySQL function foo
- would fail to run in IGNORE_SPACE=TRUE mode (ANSI, TRADITIONAL, all
  the other vendors)

Such code will need to be changed to use a db.foo() syntax to refer to the
stored function, and foo() to refer to the native function (unless SQL PATH is
implemented, which is another Work Log).
Fixing this work log consist of limiting the effect of the implementation
of SQL_MODE=IGNORE_SPACE.

There is impact on:
- the lexical parser
- the syntactic parser
- the item creation process used by the parser (item_create.cc)

In sql/lex.h, the array :
static SYMBOL sql_functions[] = {
  { "ADDDATE",          SYM(ADDDATE_SYM)},
  { "BIT_AND",          SYM(BIT_AND)},
  ...
must be reduced.

The lexical parser must *not* return different tokens for functions,
and must return an "identifier" token instead.

The bison grammar must be changed to treat UDF, Stored Functions and Native
functions with the *same* rule, since the *syntax* of the call is identical.
The logic involved on the action is responsible for creating the correct item
based on the function name.

Today, the action must resolve functions in this order:
- Native functions
- User Defined Functions
- Stored Functions
to maintain the current behavior.
SQL PATH (WL#2128) may change that name resolution when implemented.

BUG#21114 fixes *most* functions but not *all* of them.

Fixed by Bug21114: 191 native functions
Remaining issues : 34 entries in sql_functions[] in lex.h

Documentation impact for WL#3565
================================

1) Incompatible change

All the functions listed in
sql/item_create.cc, in the following array:

static Native_func_registry func_array[] = ...
(191 entries)

do *not* honor IGNORE_SPACE,

For any user running with SQL_MODE=IGNORE_SPACE, which is the default
in most cases, no changes are needed.

For any user running with the SQL_MODE IGNORE_SPACE *not* set,
any user code that would declare a stored function using a name in
this list (example: ABS, X, Y), and call this stored function using the
syntax "select abs (...)" need to:
- either rename the stored function, and change all calls
- or change the invocation to use the syntax "select db.abs(...)"

2) SQL_MODE=IGNORE_SPACE

This mode now only affect a very limited list of functions names.

The functions affected are listed in
sql/lex.h, in the following array :

static SYMBOL sql_functions[] = ...
(34 entries)

Note that the previous documentation for SQL_MODE=IGNORE space was
incorrect, since some functions even before the change were not honoring
this mode in the parser.

3) Sections in the doc

This list does not intend to be complete,
only the known area affected are listed.

http://dev.mysql.com/doc/refman/5.1/en/reserved-words.html
http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html
The following describes the changes introduced by BUG#21114

Before the change,
A) MySQL native functions
Depending on the exact name of the function, 1 or more of the following could
happen for a given name:
- "foo" "(" was always interpreted by the parser as a function call
- "foo"  "(", depending on the sql_mode (IGNORE_SPACE), was:
  - sometime a native function call,
  - sometime a UDF,
  - sometime a stored function in the current  database.
For example, see connection_id()

- "foo" "(" or "foo"  "(", regardless of the sql_mode, was interpreted
as a function call.
For example, see database()

B) Pollution of the DDL (Data Definition Language)
Each identifier name recognized at the lexical level (sql_lex.h) would be
returned to the parser as a function (FUNC_ARG0, ...), regardless of the
parsing context. This in particular caused incompatibilities when a valid
identifier followed by a '(' would collide with a MySQL *non* reserved
function name, in a context where no functions calls are expected.
In particular,
   CREATE TABLE foo ( ...
   REFERENCES FOREIGN KEY foo (...
would not parse for valid table names.

C) Pollution of the grammar
Some functions, using *non* reserved names, were implemented as a token
in the grammar. Doing so in effect makes the name reserved, and can break
user code.
See for example benchmark()

D) Risk of regressions and incompatibilities
Each time a new function is implemented for a feature, a risk exist that
implementing the function following the existing design pattern will
introduce new incompatibilities, and break otherwise valid user code.

After the change,
A) MySQL native functions
- spaces are not significant, so that "foo" "(" and "foo"  "("
always refer to the same object, regardless of the current sql_mode.

B) Non pollution of the DDL (Data Definition Language)
- an identifier followed by '(' is a function call *only* if it's
found when parsing what is expected to be an expression.

C) Pollution of the grammar
- some tokens have simply been removed, preventing any pollution.

D) Risk of regressions and incompatibilities
This risk has been reduced.
The current design enforces that new function names are introduced
in item_create.cc, which does not affect the overall grammar or the lexical
parsing.

The change consist of
1) removing any CREATE_FUNC builder from sql/lex.h sql_functions[]
2) documenting reserved / non reserved keywords in sql/sql_yacc.yy
3) reorganizing the grammar to better classify and document rules
4) change the CREATE_FUNC factory/builder pattern from a C-style
function pointer call back to a C++ virtual function call.
This is critical since the pattern has been generalized to take into
account functions with a variable number or arguments,
which in turn is critical to avoid even naming a function like EXPORT_SET
in the bison parser (see point C).

The builder pattern works with the following class structure:
Create_func
  - Create_func_arg0
    - Create_func_connection_id, ...
  - Create_func_arg1
    - Create_func_cos, ...
  - Create_func_arg2
    - Create_func_addtime, ...
  - Create_func_arg3
    - Create_func_maketime, ...
  - Create_func_export_set, ...

Impact on the bison parser
The complexity of the generated code has decreased significantly.
The net change (in BEFORE --> AFTER (DELTA) format) is:

/* YYLAST -- Last index in YYTABLE.  */
47613 --> 42003 (-5610)

/* YYNTOKENS -- Number of terminals.  */
612 --> 570 (-42)

/* YYNNTS -- Number of nonterminals.  */
814 --> 818 (+4)

/* YYNRULES -- Number of rules.  */
2344 --> 2284 (-60)

/* YYNRULES -- Number of states.  */
4183 --> 3918 (-265)

Incompatible change
Previous user code that would:
- run only in IGNORE_SPACE=false sql_mode
- define stored functions with a name that collide with a MySQL built in
function
- call the stored function using "foo"  "("
will now parse differently, interpreting the call as a call to a native
function.
Note that this historical behavior is not standard (see below),
and that the same user code would execute differently or fail
with sql_mode IGNORE_SPACE enabled.