WL#3629: Replication of Invocation and Invoked Features

Affects: Server-5.1   —   Status: Complete

Replication of invocations and invoked features such as Events, User-Defined 
Functions (UDFs), and Stored Procedures must be redesigned. Specifically, 
replication of these features requires modification to the code and a unified 
design for handling the creation, modification, and deletion of the features 
and the effects of the invocations themselves. The basic principles are:

• The effects of the features should always be replicated. 
• Replicate the CREATE, ALTER, and DROP statements (using SBR) but set the 
event to SLAVESIDE_DISABLED on the slave regardless of the state specified 
(except for DROP).
• The feature implementation must reside on the slave in a renewable state so 
that if the master fails, the slave can be used as the master without loss of 
event processing.

This worklog can satisfy, in whole or in part, the following bug reports:

• BUG#16421: Events: Replication of effects doesn't work
• BUG#17857: Events: Binary logging of events does not work (duplicate of 16421)
• BUG#17671: UDF: CREATE FUNCTION binlogging problems
• BUG#20384: Events: CREATE EVENT should be blocked on slave 

Suggested plan to move forward
1. Document triggers, SP, SF, UDF, Events, views in a table.  How they
   are replicated, when they are disabled on slave or not, is the trigger
   executed on the slave or not?  Before pushing WL#3629, check what
   Kostja, Lars thinks about this design.

2. Think about definition of what is a server id.  What things are
   controlled today by the server id?  Should it be allowed 
   to change it on slave when it becomes new master.  I think not.

3. Can we use server id or do we need hostname, ip.  I think
   originating server id in the event table.

4. Write up recommended procedure for doing fail-over, i.e. how the
   slave should be setup as the new master.

5. Slaveside_disabled.  document that it means auto-disabled.

See also WL#3727.

RELATED BUGS: BUG#16421, BUG#17671, BUG#20384
In studying this problem, I asked myself what behavior I would expect if I were 
setting up replication for a server that used invoked features. By way of 
example, I considered several cases of using invoked features. I will examine 
each of these cases and present a use case for the behavior of replication 
under the stated conditions. While most of the examples are events, the use 
cases can apply to UDFs and Stored Procedures. In all cases, I assume the 
master is executing the invoked features successfully.

Case A: Features the Generate Data that is Saved in the Database
The data in the database is partially generated by some condition or external 
event. An example of this is shown below. The example shows an event that 
stores the row count of another table along with a timestamp. This event could 
be used to monitor the growth of a table over time. While somewhat trivial, the 
event shows that it is possible for users to store data created at specific 
time intervals.


  INSERT INTO t1 VALUES (Now(), Get_Count());

Replication of the data generated by the event is needed in order to keep the 
master and slave tables in synch. However, replication of the execution of this 
event is not necessary or desired. In this case, the data inserted into table 
t1 can be replicated using row-based replication (RBR). Since it is the data 
that is important, the execution of the event is not necessary. Furthermore, as 
stated in the MySQL documentation, the use of time data in events can result in 
different values being generated on the slave. If the latency of replication 
(e.g., under heavy load) is 3 seconds, it is possible for the slave to execute 
an event 3 seconds later than the master. This would clearly generate different 
values for the timestamps.

This use case applies to UDFs and Stored Procedures. However, there are three 
kinds of UDFs; 1) UDFs that are created using the normal CREATE FUNCTION syntax 
that include an implementation clause, 2) UDFs that are compiled in a loadable 
module (library) and instantiated using the CREATE FUNCTION with the SONAME 
option, and 3) UDFs that are compiled in the server code (called a native 
function). UDFs that meet the first type can be replicated as described above. 
UDFs that meet the second type can be replicated, but if the loadable library 
is not on the slave, replication will stop. It may be possible to implement a 
mechanism that the user can override this behavior, but the default should be 
to stop replication and produce an error. UDFs that meet the third type cannot 
be replicated and are the responsibility of the user to ensure both master and 
slave contain the native functions.

Case B:  The events trigger some performance and/or maintenance routines
This is a case of using events to trigger stored procedures to handle routine 
maintenance of a database. For example, if you had a table used for lookups you 
could gain a considerable performance increase by using the HEAP storage 
engine. However, you may also have updates to this table that you want to make 
permanent. So instead of trusting the hardware not to fail or saving the data 
using triggers which fire for every insert or delete, you could periodically 
write the data to disk using an event (there are other methods to do this, but 
let’s stay with the event concept). This can save some processing time by doing 
the save as a single pass. It also gives the user the ability to schedule the 
event when the server is less busy.

CREATE PROCEDURE Save_Lookup_Table ()

  CALL Save_Lookup_Table()

Replication of this operation can be achieved using statement-based replication 
(SBR). If SBR is used, the queries inside the stored procedure will be written 
to the binary log and consequently copied to the relay log on the slave and 
then executed.

It is interesting to note that while the stored procedure should also be 
replicated to the slave, it should not be disabled on the slave. If you 
consider that the stored procedure requires a user to invoke it, having it 
enabled on the slave is not a concern.

Once again, the argument can be made that one should not want the event to 
execute on the slave. This would create a copy of the m1_saved table on the 
slave that may be out of synch (too many or too few rows) with the master.

This case applies to UDFs (types 1 and 2) and Stored Procedures. 

Case C: I Want My Slave(s) to Respond to Queries (SELECT) Returning the Results 
of Data Generated by the Features
This is where a user creates a slave not as a backup but rather as a read-only 
server in a multi-server environment. Users who use this technique are 
primarily interested in increasing query efficiency for a large number of 
connections. Thus replication is for distributed processing rather than data 
protection (it should be noted that it is possible to do both).

Replication in this scenario, provided the effects of the features executed on 
the master are replicated, has little bearing on whether the events need to be 
enabled. In fact, I cannot think of a reason a user would want or need the 
events enabled on the slave in this scenario. 

Case D: The Master Crashes and I Want to Change My Slave to the New Master
This situation is why we replicate! It’s all about the data. The slave should 
have a recoverable copy of all of the data on the master (recoverable since it 
is possible for the slave to fall behind the master during heavy use). This 
copy should include everything necessary to make a slave into a master should 
the need arise. I believe most enterprise customers will use replication for 
this purpose. While it is possible to have several slaves each replicating a 
portion (e.g., several databases) of the data on the master, it is more likely 
users will setup replication for hot swapping or backup capabilities.

Switching a slave to a master is a simple process. If we consider disabling 
events on the slave, an additional step will have to be added to enable the 
features on the slave. Since this operation (converting a slave to a master) is 
a rare occurrence rather than a daily activity, the need for automation isn’t a 
high priority. Therefore it would be acceptable to create an easy way to ensure 
the replicated features on the slave can be enabled when needed. 

However, if the user has created a great number of features where some are 
disabled by choice, it may be difficult for the user to determine which should 
be enabled if the master is no longer accessible. There are several ways that 
this could be enhanced. The server could keep track of the features that are 
replicated (and thereby disabled). When the user converts a slave to a master, 
she can issue a query against the appropriate table and use that information to 
enable all of the disabled replicated features. Another method would be to add 
another state to the Status column for features. A SLAVESIDE_DISABLED state 
could be added thereby making it very easy for the user to enable all of the 
replicated features. This new state indicates the feature is disabled and was 
replicated from the master.

Note: it can be argued that this third state is an internal-use only state and 
that it should not be settable by the user. However, it can also be argued that 
through the due course of troubleshooting and testing users may need to be able 
to set an event as SLAVESIDE_DISABLED. Thus, the sql_yacc.yy file (and 
associated token declarations) will also have to be changed to accommodate the 
changes to the CREATE EVENT and ALTER EVENT commands.

In addition, A small change to RBR behavior was added. Replication of event 
timing changes are not replicated to the master. Replication of event timing 
changes was being binlogged whenever the master updated the timing of an event. 
This caused (in RBR only) a row update for the mysql.event table to be written 
to the binary log. Since the master had a status of 'ENABLED' for the row, the 
processing of the binlog on the slave was overwriting the status 
of 'SLAVESIDE_DISABLED' on the slave. The latest patch disables RBR of the next 
event in the Event_queue_element::update_timing_fields() method. The code used 
to do this is:

  if (thd->current_stmt_binlog_row_based)

This allows the master to update the timing field values and not write those to 
the binlog.

The above monologue was necessary to explain the premise for the following 
design. The following changes shall be made to the server:

• The effects of features shall be replicated using RBR. Note: this may not 
require code changes to ensure this.
• The possible status values for features (located in several source files) 
shall be modified to include the value SLAVESIDE_DISABLED. This may require 
changing several code files as the ENABLED enumeration is duplicated across the 
source code (i.e., see event_data_objects.* and sql_show.cc). This will be not 
be written to the binlog, rather it will be handled by the server code.
• All features created on the master and replicated to the slave shall be set 
to SLAVESIDE_DISABLED and shall not execute on the slave. This change can be 
effected by adding code to the SQL thread on the slave.
• The CREATE, ALTER, and DROP commands shall be replicated using SBR. Minor 
changes may be necessary to implement this.

The SLAVESIDE_DISABLED enum will require modifying the mysql.event table as 

ALTER TABLE mysql.event MODIFY COLUMN status 

If this is not accomplished, the server will issue errors on startup:

[ERROR] (event) Expected field status at position 12 to have type enum
[ERROR] SCHEDULER: Table mysql.event is damaged. Can not open