WL#6030: The BEFORE triggers are not processed for NOT NULL columns

Affects: Server-5.7   —   Status: Complete

This WorkLog originated from BUG#6295 (TRIGGERS ARE NOT PROCESSED
FOR NOT NULL COLUMNS).

The problem is that if a column is declared as NOT NULL, it is not
possible to do INSERT NULL (or UPDATE to NULL) even though
there is associated trigger, setting NOT-NULL value.

For example:

  - Table 't1' with a NOT NULL column 'c1';

  - The table has BEFORE INSERT trigger which sets the 'c1' column
    to not null value (SET NEW.c1 = 1);

  - INSERT INTO t1 VALUES(NULL)
    or
    UPDATE t1 SET c1 = NULL
    fail with the following error:
    ERROR 1048 (23000): Column '' cannot be null 

The cause for the current (wrong) behavior is that the column constraints
are checked before triggers are executed. 

This is against The Standard, which requires that column constraints 
are checked at the end of an SQL statement.
Background:

  - Records are stored as row buffers

  - The row format is defined by a collection of Field objects

  - Every Field object specifies
    - meta-data for its column
    - a pointer to the data in the row buffer (record buffer)
    - a pointer to the bit (a pointer and a bitmask) representing
      NULL value;

  - NOT NULL columns do not have ability to store NULL value;

  - The record format (row buffer format) is tightly linked to
    Storage Engines. The server can not change the row format without
    affecting SE API.

The general idea of the WL task is pretty straightforward:

  1. allow NOT NULL columns to be NULL temporary.
     That means, every NOT NULL column must be able to store NULL value;

  2. at the end of INSERT / UPDATE statement (after executions of all
     associated triggers) check every NOT NULL in the record to ensure
     the NOT NULL constraint is valid;

  3. if there is NULL among NOT-NULL fields, report an error.
     Otherwise, pass the record to SE.

The main difficulty is (1). There is currently close to no abstractions in
dealing with the record buffer - code all over the place depend on how data
is represented internally. For example, Field::null_ptr, Field::null_bit
and (record - table->record[0]). So there's very little flexibility in
changing data representations without breaking stuff here and there.

The following things should be also considered:

  - it should be possible to check nullability of a NOT-NULL field inside
    trigger body, and it should work as if the field is nullable.

    For example, it should be possible to have expressions like

      (NEW.col IS NULL)

  - a NOT-NULL field can be a part of complex SELECT statement, in this
    case it still should be treated as NOT NULL, otherwise the optimizer
    might choose wrong plan.

  - a NOT-NULL field can be a part of the SET and WHERE parts of an UPDATE
    statement, and it should be treateda as nullable in the SET part, and
    as NOT NULL in the WHERE part.

The following approaches have been considered:

1. Keep everything in Field
---------------------------

  - Introduce two new attributes into Field:
    - Field::is_tmp_nullable -- a flag, indicating if this particular field
      can accept NULL even though it is declared as NOT NULL;
    - Field::is_tmp_null -- a place to store NULL value in case of NOT NULL
      field.

  - Keep the record [buffers] intact;

  - Do not introduce extra Field objects;

  - Introduce a new Field::is_tmp_nullable() and use it in rare cases of
    expression evaluation

  - Change Field::is_null() so that it takes into account is_tmp_nullable
    and is_tmp_null

  The main problem with this approach is that Field is starting to store
  "more data" than it used to be, and that Field is unable to handle
  tmp-null-flag for any record.

  The later is not a problem in this specific case because:
    - we get into tmp-nullable mode only when we're executing a trigger;
    - in that mode, we deal with only one row.

2. Store tmp-NULL-flag in row (record buffer)
---------------------------------------------

  - When we allocate memory for record buffer, allocate extra bytes to
    store tmp-NULL-values of NOT NULL fields;

  - Introduce new attribute Field::tmp_null_ptr, pointing at those extra
    bytes in the record buffer.

  - Do not introduce extra Field objects.

  It's hard to expect that this approach can be successful. The thing is
  that it's expected that a record has specific length, and that
  expectation/knowledge is spread out in the server. So, adding extra bytes
  to the row leads to various mysterious glitches. Also, we're allocating
  record buffers not in a single place, meaning all those places should be
  found and updated.

3. Extra row + extra Field objects
----------------------------------

  The idea here is:

    - To have a separate set of Field objects which are nullable + buffer
      for the associated row, and use this Fields/buffer for evaluation of 
      new values for columns + during trigger execution.

    - Then at some point, before passing anything to engine, we should have
      copied data from these Field's/buffer to Field's/buffer which are used
      for writing/reading into the storage engine.

  This was the idea of the original patch back in 2010/2011.

  Unfortunately, it is got complicated because:

    - We tried to optimize and to avoid unnecessary copy.

    - Making decision about which set of Field objects/buffer to use in
      various parts of statement turned out to be non-trivial.

Functional requirements
=======================

1. Attempt to INSERT NULL value into NOT NULL column must be successful if:
  a. there is a BEFORE INSERT trigger for the table
  b. after the trigger execution the column is NOT NULL, i.e. the column
     was set to NOT NULL using pseudo column NEW.
2. Attempt to assign NULL value to NOT NULL column during execution of UPDATE
statement must be successful if:
  a. there is a BEFORE UPDATE trigger for the table
  b. after the trigger execution the column is NOT NULL, i.e. the column 
     was set to NOT NULL value using pseudo column NEW.
3. Attempt to INSERT NOT NULL value into NOT NULL column must fail if:
  a. there is a BEFORE INSERT trigger for the table
  b. after the trigger execution the column is NULL, i.e. the column was set
     to NULL value using pseudo column NEW.
4. Attempt to assign NOT NULL value to NOT NULL column during execution of
UPDATE statement must fail if:
  a. there is a BEFORE UPDATE trigger
  b. after the trigger execution the column is NULL, i.e. the column was set
     to NULL value using pseudo column NEW.
5. During execution of BEFORE INSERT trigger the predicate IS NULL must return
correct result for pseudo column NEW.c (even if c is declared as NOT NULL),
i.e.:
  a. if column was set to NULL inside trigger body using pseudo column NEW
     then IS NULL must return TRUE
  b. if column was set to NOT NULL inside trigger body using pseudo column NEW
     then IS NULL must return FALSE
  c. if statement INSERT sets column to NULL then IS NULL must return TRUE
     in case when column isn't set to NOT NULL inside trigger body
  d. if statement INSERT sets column to NOT NULL then IS NULL must return FALSE
     in case when column isn't set to NULL inside trigger body.
6. During execution of BEFORE UPDATE trigger the predicate IS NULL must return
correct result for pseudo column NEW.c (even if c is declared as NOT NULL),
i.e.:
  a. if column was set to NULL inside trigger body using pseudo column NEW
     then IS NULL must return TRUE
  b. if column was set to NOT NULL inside trigger body using pseudo column NEW
     then IS NULL must return FALSE
  c. if statement UPDATE sets column to NULL then IS NULL must return TRUE
     in case when column isn't set to NOT NULL inside trigger body
  d. if statement UPDATE sets column to NOT NULL then IS NULL must return FALSE
     in case when column isn't set to NULL inside trigger body.
7. During execution of BEFORE INSERT trigger the NULL value could be assigned
to NOT NULL column temporary using alias NEW. It means that for the some
NOT NULL column it is possible to make assignment NEW.column = NULL and latter
reset this column to some NOT NULL value.
8. During execution of BEFORE UPDATE trigger the NULL value could be assigned
to NOT NULL column temporary using alias NEW. It means that for the some
NOT NULL column it is possible to make assignment NEW.column = NULL and latter
reset this column to some NOT NULL value.

Changes in handling of warnings
===============================
  This worklog also introduces the dynamic checking that during execution of 
INSERT/REPLACE/INSERT SELECT/CREATE SELECT all table's columns are set
either explicitly in SQL-statement or as part of execution of
BEFORE-INSERT trigger or implicitly since column has default value.
  Additionally, for the strict sql_mode this WL adds grouping by column
the warning messages for the following error code:
  ER_BAD_NULL_ERROR, ER_NO_DEFAULT_FOR_FIELD, ER_NO_DEFAULT_FOR_VIEW_FIELD.
  It means that any such warning message will be shown at most once for
every column during handling of statement INSERT/REPLACE/INSERT SELECT/
CREATE SELECT. In the strict sql_mode server reports an error for first
occurrence of correspond violation and terminates statement execution.
  Before this WL server wrote warning message conrresponded to error codes
mentioned above for every row that provoked this error. So, for example,
if user executed the statement INSERT SELECT for a table that had NOT NULL
column and for this column user set value NULL then server generated
as many warnings as there were rows that violated NOT NULL constraint.
  For LOAD STATEMENT the behaviour left the same as before since it makes sense
to know what lines of imported file generates each line of warning.