WL#3915: Fewer columns on slave

Affects: Server-5.1 — Status: Complete

Description
High Level Architecture
Low Level Design

Requirement
===========

It should be possible to replicate a table that has more columns on the master
than on the slave, i.e., there are M columns in the table on the master and N
columns in the table on the slave and M > N.

When replicating in this manner, the contents of the extra columns on the master
is lost, but replication will accept the situation and keep running.


Implementation
==============

Slave needs to discard columns that are not part of the slave table 
definition.


Restrictions
============

All extra columns on master is at the end of the table.


Related work
============

This work is related to WL#3259 (RBR with more columns on slave than on master)
in that implementation of this worklog is necessary to support on-line table
definition changes (a.k.a., "schema upgrade") when using circular replication.

----------------------------------------------------------------------------
Below is for reference only, not part of this WL.

CONFIGURATION
- Add "CHANGE MASTER SET SLAVE_IGNORE_EXTRA_COLUMNS=true"
  (and internally add corresponding variable to mi)

OFFICIAL DESCRIPTION
Ability to introduce new columns without impacting ongoing traffic. This
is required to introduce new attributes to existing tables.

There will only be the ability to add columns in this first release.
Add column is only possible on “dynamic tables”.  Starting with
NDB-6.1.7, tables are dynamic by default (NDB-6.1.6 tables are
not). Please note that it is only possible to add columns at the end
of a table and still maintain compatibility. Online disk-based columns
are not supported. New columns must have default values, or the
replication stream needs to be synchronized by the DBA with the
change.

Effects and behavior of this feature to replication in multi-master mode
must be considered. DDL changes will not be replicated, instead
individually applied. Support for replicating to and from tables with
more or less columns will be supported.

•       Master (T) replicating to Slave (T+1)
•       Master (T+1) replicating to Slave (T)

NOT NULL columns should have a default value assigned to them when
replicating.

For restore using ndb_restore, schemas must be the same, ndb_restore
cannot handle adding columns.

The basic constraint of this work is that a number of additional columns on
slave can appear at the end of the table definition. This is what application
must respect because the slave server's replication code does
not check whether that's true or not.

This feature does not affect statement-based replication at all.

Master side
===========

The master should deliver necessary info about the extra fields the slave is to
skip when it parses a row. That info should be sent to the slave per table basis
for efficiency.

The master should deliver necessary info for the slave to be able to decide
whether a nullable extra column has any data or the data is NULL and thereafter
the values is presented an a turned-ON null bit.



Slave side
==========

the slave must be able to discard all data corresponding the extra columns to 
satisfy the basic constraint like to reach the end of the event's data upon
passing though skipping algorithm.

The skipping algorithm must be robust to work with any type of replication events 
on any type of tables incl NDB, whose events carry only modified columns.

Changes to ``unpack_row()``
===========================

Changes in unpack_row should take care of stopping fields unpacking when
slave's version of the record got filled up "prematurely". I.e the main loop of
unpacking of regular fields continuation condition should be refined and no work
for filling slave's extra columns is needed (by this task definition).

Such simplistic approach bases on either ability (feasibility) of instantiating
of a Field class given the type of the field delivered with Table map event or
on a non-existed yet info on the size of an individual row in Rows_log_event chunk.
In the former case, to skip the extra masters columns would be done via calling
Field::unpack(); in the latter, to skip it would be simply ignore the trailing
part starting from the last common field unpacking addvanced pointer till the
value of the size of the row.


  			 last common   	    master's row
  			 field unpacking      length
  			    |  	       	        |
       +--------------------+-------------------+
       |                    | Extra fields      |
       |                    | on master         |
       +--------------------+-------------------+


There is the third alternative to exploit `field_length_from_packed'. That'd
an light equivalent to the instantiation.

This solution was chosen for implementation where the function above was
replaced by 

     table_def::get_field_size()

provided by WL#3228.

To unpack a nullable extra column which value is NULL it's necessary
to know the value of
field::maybe_null().

That info is per-table and is included as part of additional information to be
sent by the master for WL#3228.

The extra columns skipping alg is as the following

At the end of the regular columns unpacking loop the last field counter is stored.
In the loop starting from the counter + 1 all the extra columns are processed
to advance unpack_row():pack_ptr in the column presented in Rows_log_event
m_cols bitmap respecting possible absence of the value for a nullable column:

    if (!((null_bits & null_mask) && tabledef->maybe_null(i)))
         pack_ptr+= tabledef->get_field_size(i, (uchar *) pack_ptr);

The caller has a guard checking how well the final pack_ptr value fits to the
end of the event's data in the end.