WL#3776: Online backup: Consistent snapshot backup for MVCC

Affects: Server-6.0   —   Status: Complete

SUMMARY
-------
Implement the same kind of backup as mysqldump is using to create a
consistent snapshot.

This will be used to backup InnoDB and probably Falcon and NDB (when
they are not using their native backup driver).

This is a logical backup and has the benefits of being easy to
implement/generic and the drawback of consuming a lot of resources
during restore.

OPEN ISSUES
-----------
On Thu, Oct 26, 2006 at 03:19:50PM +0200, Guilhem Bichot wrote:
> By the way, Lars mentioned a
> "generic snapshot scan as we will develop for InnoDB". I don't know
> exactly what it is, but if it is a consistent-read based snapshot
> (just starting a read-only transaction on InnoDB like mysqldump does),
> it will have a problem with prepared transactions.
> Binlog/InnoDB interaction works like this when executing COMMIT:
> 
> - prepare in InnoDB
> - write transaction to binlog (== commit in binlog)
> - instant T
> - commit in InnoDB
> 
> If the backup starts at instant T, that is, binlog position is read at
> T and InnoDB's consistent read is started at T, the consistent read
> will not see the committed data, but the binlog's position will be
> after the transaction. So the backup's data will be broken.
> A physical backup of data+logs would not have this problem (because
> the prepared transaction is in the engine's log at T).
> With a consistent-read backup, the problem can be solved by ensuring
> that no prepared transactions exist in the relevant engine.
The consistent snapshot driver will be implemented to mimic the behavior of the 
consistent read mechanism. For example, the drive shall backup data using a 
similar mechanism as the following SQL statements:

START TRANSACTION WITH CONSISTENT SNAPSHOT;
SELECT * FROM t1;
COMMIT;

Thus, the driver is a logical read-based algorithm that reads a row at-a-time 
and returns it to the backup kernel via the backup kernel interface.
When designing the algorithm for this worklog, it became clear that there are 
several ways to initiate a consistent read, For example, one could execute the 
SQL statements shown in the high level specification or the following SQL 
commands:

SET AUTOCOMMIT = 0;
SELECT * FROM t1;
SET AUTOCOMMIT = 1;

When researching how to implement the algorithm, three different methods of 
implementation were explored. These included:

1) Use handler-level methods via the TABLE object
2) Use API-level methods similar to the ones used in do_select()
3) Use SQL commands inside the driver and intercept the output

The design implemented is a combination of 1 & 2. The following pseudo code 
describes the basic algorithm for implementing the consistent snapshot 
algorithm.

     thd->lex->sql_command= SQLCOM_SELECT;
     thd->lex->start_transaction_opt|= MYSQL_START_TRANS_OPT_WITH_CONS_SNAPSHOT;
     begin_trans(thd);
     if (open_and_lock_tables(thd, all_tables))
     {
       DBUG_RETURN(backup::ERROR); 
     }
     TABLE *table= all_tables->table;
     table->file->ha_rnd_init(1);
     do
     {
       res= table->file->rnd_next(table->record[0]);
     } while ((res != HA_ERR_END_OF_FILE) && (res == 0));
     table->file->ha_rnd_end();
     close_thread_tables(thd);
     end_active_trans(thd);

The first line of code sets the command type to mimic that of a SELECT command 
so as to not confuse the locking methods. The next line of code turns on the 
option for a consistent snapshot and the call to begin_trans() initiates 
the "START TRANSACTION WITH CONSISTENT SNAPSHOT" action. The middle portions of 
the code uses the handler-level calls for a table scan. The last line of code 
ends the transaction turning off the consistent snapshot.