WL#6614: Define and reimplement IGNORE

Affects: Server-5.7   —   Status: Complete   —   Priority: Medium

This worklog aims to define what the IGNORE keyword actually should mean
and to reimplement it so that it is consistent across the supported
statements and more easy to maintain.

The current IGNORE implementation has resulted in quite a number of bugs:
BUG#30191, BUG#49539, BUG#55421, BUG#54543, BUG#54106, BUG#49534, BUG#47788,
BUG#46539, and BUG#46425.

IGNORE is supported for the following statements:

DELETE - http://dev.mysql.com/doc/refman/5.6/en/delete.html
"The IGNORE keyword causes MySQL to ignore all errors during the process of
deleting rows. (Errors encountered during the parsing stage are processed in the
usual manner.) Errors that are ignored due to the use of IGNORE are returned as
warnings."

INSERT - http://dev.mysql.com/doc/refman/5.6/en/insert.html
"If you use the IGNORE keyword, errors that occur while executing the INSERT
statement are treated as warnings instead. For example, without IGNORE, a row
that duplicates an existing UNIQUE index or PRIMARY KEY value in the table
causes a duplicate-key error and the statement is aborted. With IGNORE, the row
still is not inserted, but no error is issued.

IGNORE has a similar effect on inserts into partitioned tables where no
partition matching a given value is found. Without IGNORE, such INSERT
statements are aborted with an error; however, when INSERT IGNORE is used, the
insert operation fails silently for the row containing the unmatched value, but
any rows that are matched are inserted. For an example, see Section 17.2.2,
“LIST Partitioning”.

Data conversions that would trigger errors abort the statement if IGNORE is not
specified. With IGNORE, invalid values are adjusted to the closest values and
inserted; warnings are produced but the statement does not abort. You can
determine with the mysql_info() C API function how many rows were actually
inserted into the table."

UPDATE - http://dev.mysql.com/doc/refman/5.6/en/update.html
"With the IGNORE keyword, the update statement does not abort even if errors
occur during the update. Rows for which duplicate-key conflicts occur are not
updated. Rows for which columns are updated to values that would cause data
conversion errors are updated to the closest valid values instead."

LOAD DATA INFILE - http://dev.mysql.com/doc/refman/5.6/en/load-data.html
"If you specify IGNORE, input rows that duplicate an existing row on a unique
key value are skipped. If you do not specify either option, the behavior depends
on whether the LOCAL keyword is specified. Without LOCAL, an error occurs when a
duplicate key value is found, and the rest of the text file is ignored. With
LOCAL, the default behavior is the same as if IGNORE is specified; this is
because the server has no way to stop transmission of the file in the middle of
the operation."

LOAD XML - http://dev.mysql.com/doc/refman/5.6/en/load-xml.html
(not explicitly documented)

ALTER TABLE - http://dev.mysql.com/doc/refman/5.6/en/alter-table.html
"IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works
if there are duplicates on unique keys in the new table or if warnings occur
when strict mode is enabled. If IGNORE is not specified, the copy is aborted and
rolled back if duplicate-key errors occur. If IGNORE is specified, only the
first row is used of rows with duplicates on a unique key. The other conflicting
rows are deleted. Incorrect values are truncated to the closest matching
acceptable value."

It has been decided to depricate "ALTER IGNORE" in 5.6 and remove it in 5.7.
See WL#7395 for more details.

CREATE TABLE ... IGNORE SELECT ... -
http://dev.mysql.com/doc/refman/5.6/en/create-table.html
(not explicitly documented)
Functional requirements:
------------------------

F-1: IGNORE keyword should affect only specified set of errors.
     The list of errors is :

      ER_SUBQUERY_NO_1_ROW
      ER_ROW_IS_REFERENCED_2
      ER_NO_REFERENCED_ROW_2
      ER_BAD_NULL_ERROR
      ER_DUP_ENTRY
      ER_DUP_ENTRY_WITH_KEY_NAME
      ER_DUP_KEY
      ER_VIEW_CHECK_FAILED
      ER_NO_PARTITION_FOR_GIVEN_VALUE
      ER_NO_PARTITION_FOR_GIVEN_VALUE_SILENT
      ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET

      These errors occur for each record.
     
F-2: The errors affected by IGNORE keyword should be downgraded
     from error to warning when IGNORE keyword is used in the
     supported statements.

F-3: Changing (Downgrading) an error to warning will cause the statement to 
     skip to the next row instead of aborting the statement. 

F-4: Warning message should be displayed when an error is downgraded to warning.
This worklog consist of two parts:

1) Define what IGNORE should mean
2) Re-implement IGNORE


1) Define IGNORE
================
As a basis, consider Kostja's quote in BUG#30191:
"The purpose of this clause is to insert as many records as possible, skipping
records that violate various constraints (i.e. taking no action on those
records).The same principle applies to DELETE IGNORE, UPDATE IGNORE."
 
IGNORE should work by downgrading error to warning, causing a record to be
skipped without aborting the statement. It should be defined exactly which
errors (and/or classes of errors) are affected. Based on the study of the 
existing implementation and existing behavior,  the errors affected by IGNORE
are listed below:

ER_SUBQUERY_NO_1_ROW                       1242
  "Subquery returns more than 1 row"
ER_ROW_IS_REFERENCED_2                     1451
  "Cannot delete or update a parent row: a foreign key
   constraint fails (%.192s)"
ER_NO_REFERENCED_ROW_2                     1452 
  "Cannot add or update a child row: a foreign key constraint fails (%.192s)"
ER_BAD_NULL_ERROR                          1048
  "Column \'%-.192s\' cannot be null"
ER_DUP_ENTRY                               1062
  "Duplicate entry \'%-.192s\' for key %d"
ER_DUP_ENTRY_WITH_KEY_NAME                 1586
  "Duplicate entry \'%-.64s\' for key \'%-.192s\'"
ER_DUP_KEY                                 1022
  "Can\'t write; duplicate key in table \'%-.192s\'"
ER_VIEW_CHECK_FAILED                       1369
  "CHECK OPTION failed \'%-.192s.%-.192s\'"
ER_NO_PARTITION_FOR_GIVEN_VALUE            1526
  "Table has no partition for value %-.64s"
ER_NO_PARTITION_FOR_GIVEN_VALUE_SILENT     1591
  "Table has no partition for some existing values" 
ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET  1748
  "Found a row not matching the given partition set"

These errors occur for each record. 


2) Re-implement IGNORE
======================

Existing implementation:
-----------------------
The current implementation of IGNORE sets the ignore flag in the object of
LEX class for the statements which support IGNORE keyword. We set it to true if
we want to downgrade errors to warnings for specific statements. From the object
of LEX class, 'ignore' flag is passed as bool argument in some functions.

This 'ignore' flag is used to:
- Downgrade an error to warning if 'ignore' flag is set.
- pass to a lower level function, which will again do the same.
- Set flags like COPY_INFO->ignore, SELECT_LEX->no_error.
 
New Implementation :
--------------------
In order to get consistent behavior of IGNORE across different 
statements which support IGNORE keyword, IGNORE should be re-implemented using
Internal_error_handler.
- A class Ignore_error_handler should be created which inherits from
  Internal_error_handler class.
- This error handler for IGNORE will change the severity level of specified set 
  of errors to warning level in case IGNORE keyword is specified in the
  statement.
- The Ignore_error_handler object should be pushed only once for one statement.
- The error handler for IGNORE should be pushed if IGNORE keyword is used in 
  the supported statement.
- Ignore flag should be eliminated from all possible places in the code.

Behaviour Changes
------------------

There are some behavior changes that should be done in this worklog to make the
behavior of the IGNORE more consistent.

a. In the current behaviour, some handler errors ( returned from storage 
   engines) are ignored by IGNORE keyword  but no warning is shown instead. 
   This is because Ignore flag is used to skip print_error() which maps handler 
   errors to MySQL errors.

   The errors are :

   HA_ERR_FOUND_DUPP_KEY -> ER_DUP_ENTRY 
                         -> ER_DUP_ENTRY_WITH_KEY_NAME
     When UNIQUE KEY constraint is violated.

   HA_ERR_ROW_IS_REFERENCED -> ER_ROW_IS_REFERENCED_2
      When Foreign key constraint on parent table fails.
 
   HA_ERR_NO_REFERENCED_ROW -> ER_NO_REFERENCED_ROW_2
      When Foreign key constraint on child table fails on insert / update. 

   HA_ERR_NOT_IN_LOCK_PARTITIONS -> ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET
     When a row does not matches the given partition set.

   HA_ERR_NO_PARTITION_FOUND: -> ER_NO_PARTITION_FOR_GIVEN_VALUE 
                                 ER_NO_PARTITION_FOR_GIVEN_VALUE_SILENT
     When a table has no partition for a given row.

   A warning message should be displayed when above errors are ignored.
   In STRICT mode, errors will not be downgraded to warnings.

   So, we have behavior changes for handler errors mentioned above as:
   
     Current behavior : 
     ..................              

     Default                    -Error. 
     in SRICT mode              -Error (same as Default behavior). 
     With IGNORE                -No Error / Warning.
                                 Statement skips to the next row instead of 
                                 aborting.
     With IGNORE in STRICT Mode -No Error / Warning.
                                 Statement skips to the next row instead of 
                                 aborting.

     New Behavior :
     ..............

     Default                    -Error. 
     in SRICT mode              -Error (same as Default behavior). 
     With IGNORE                -Warning.
                                 Statement skips to the next row instead of 
                                 aborting.
     With IGNORE in STRICT Mode -Warning.
                                 Statement skips to the next row instead of 
                                 aborting.

b. For a view with CHECK constraint, when the CHECK constraint is violated it
   should give an error. Using IGNORE keyword should downgrade errors to 
   warnings. In current behavior, inserting single row which violates CHECK 
   constraint resulted in error even if IGNORE keyword is used. Inserting 
   multiple rows with IGNORE, in which one or more rows violate CHECK 
   constraint results in warnings.

   This behaviour was introduced by the fix of bug : 
   http://bugs.mysql.com/bug.php?id=5995
   This is inconsistent behaviour. If a statement affects single record and 
   IGNORE is used, errors should be downgraded to the warnings.

c. The proposed behavior change in IGNORE + STRICT mode is mentioned in the 
   section below along with the current behavior of IGNORE + STRICT.

Behavior with STRICT mode
--------------------------

Following behavior should be followed for the statements :

a. When the default behavior of the statement is error. 
   Default                    : Error
   in SRICT mode              : Error (same as Default behavior). 
   With IGNORE                : Warning
   With IGNORE in STRICT mode : Warning

  This behavior is same as the behavior in the trunk. It should be followed for 
  all the errors mentioned above.
  ER_BAD_NULL_ERROR error behaves differently which is described below. 

b. When the default behavior of the statement is warning.
   Default                    : Warning
   With IGNORE                : Warning (same as Default behavior).
   in STRICT mode             : Error
   With IGNORE in STRICT mode : Warning
  
   This behavior should be followed for the following list of errors :

   ER_TRUNCATED_WRONG_VALUE
   ER_WRONG_VALUE_FOR_TYPE
   ER_WARN_DATA_OUT_OF_RANGE
   ER_DIVISION_BY_ZERO 
   ER_TRUNCATED_WRONG_VALUE_FOR_FIELD
   WARN_DATA_TRUNCATED
   ER_DATA_TOO_LONG
   ER_BAD_NULL_ERROR
   ER_NO_DEFAULT_FOR_FIELD
   ER_NO_DEFAULT_FOR_VIEW_FIELD
   ER_CUT_VALUE_GROUP_CONCAT
   ER_DATETIME_FUNCTION_OVERFLOW
   ER_WARN_TOO_FEW_RECORDS
   ER_WARN_NULL_TO_NOTNULL
   ER_INVALID_ARGUMENT_FOR_LOGARITHM
   ER_WRONG_ARGUMENTS

   Note : - This list is based on the current behavior checked on trunk.
          - This list of errors is compiled from the errors mentioned in 
            WL#6891.
          - WL#6891 may change some behavior to make STRICT mode more 
            consistent which should be documented there.
          - It is a subset of the errors which are affected by STRICT mode as
            in the current behavior, IGNORE keyword does not downgrade all 
            errors (which were warning by default but upgraded to error because 
            of STRICT mode) to warnings in the STRICT mode.
          
c. The case with ER_BAD_NULL_ERROR.

   There will be no behavior change in the worklog for NOT NULL constraint  
   violation.
    
Effect of behavior changes on Replication:
------------------------------------------

   If we have a scenario where an error which is being converted to warning now 
   but will remain error after this worklog, the replication may break for the 
   case :  Old master server --> New slave server(with reimplemented IGNORE)
 
   Besides this, New implementation of IGNORE and above behavior changes should 
   not affect replication.

Effect of behavior changes on Upgrade/Downgrade:
-------------------------------------------------

This worklog will not affect upgrade / downgrade scenarios.
 

Places in the code to remove ignore flag check for errors
---------------------------------------------------------

a. From the definition of functions 

   mysql_alter_table()
   mysql_insert()
   mysql_update()
   mysql_multi_update()
   mysql_load()
   view_check_option()
   fill_record_n_invoke_before_triggers()
   fill_record()
   check_record()
   select_create()
   select_insert()
   multi_update()
   mysql_delete()
   JOIN::exec()
   read_fixed_length()
   read_sep_field()
   read_xml_field()
   write_execute_load_query_log_event()
   copy_data_between_tables()
   
b. From the Class variable
   
   COPY_INFO->ignore
   Load_log_event->ignore

   Note: From the discussion with Replication team, Load_log_event is pre 5.1 
         GA. It is kept to maintain backward compatibility in replication.

Other use cases of Ignore Flag :
--------------------------------

I.   To decide if Internal_error_handler for IGNORE and STRICT should be pushed.
II.  To print the Info about the number of rows affected and number of 
     duplicates found.
III. Pass HA_EXTRA_IGNORE_DUP_KEY flag to handler.
IV.  Set IGNORE_FLAG for Load_log_event object.
V.   To use row based logging in case of unsafe statements. Unsafe statements 
     are:
     - UPDATE IGNORE.
     - CREATE TABLE...IGNORE/REPLACE SELECT...
     - INSERT...SELECT...ON DUPLICATE KEY UPDATE/REPLACE SELECT/
       INSERT...IGNORE...SELECT , unless ORDER BY PRIMARY KEY
       clause is used in SELECT statement.
VI.  To set ignore flag of class Alter_inplace_info. It is later used by 
     storage engine. ( to be removed in 5.7 as ALTER IGNORE will be removed by 
     WL#7395).
VII. To check if check_read_removal() function should be called.
VIII.Return VIEW_CHECK_SKIP or VIEW_CHECK_ERROR in view_check_option() 
     depending on the ignore flag.
IX.  To push suitable warning message based on IGNORE keyword and STRICT mode.