BACKUP BOOT CAMP: PRIORITY = P1 DATE: BLOCK: Was partially blocked by BUG#36571, a duplicate of BUG#36624 (now fixed) RATIONALE --------- Test of online backup with large database. DESCRIPTION ----------- Perhaps something like this (can be modified): 1. Create generator of data, perhaps 10GB of data in 30,000 tables. 2. Do backup 3. Compute some checkum of the data 4. Shutdown system 5. Clear database 6. Start system 7. Restore data 8. Verify that checksum is correct
NOTE: as much of this worklog overlaps with WL#4406, the scope of this worklog is reduced to touch on 'basic' scenarios that will on one side prevent overlap with WL#4406 and on the other will provide a 'faster' turnaround. As such, for this 'reduced' phase of implementing this task, the flghtstat database will be used with the following permutations: - Database loaded to the different engines (MyISAM, InnoDB, Falcn) - Backup/Restore of idle system - Backup while activity is performed on the database (concurrency might be limited in the case of MyISAM as getting a read lock has a lower priority (see [30 Apr 16:08] Chuck Bell in BUG#44568)
Phase 1 ------- - Tests will be performed on the 'Flight Stat' database (10 tables. total of 40M rows). The Flight Stat' database (http://flightstats.us/) is a collection of flight information collected by the FAA over a number of years until 2005. - The schema will be augmented with triggers/SP in order to increase concurrent load on the system while backups are being taken (the focus is not to try and scale n the number of objects - this will be done as part of WL#4406 - 'ontime' table (37M rows) will be partitioned (24 partitions) - Database schema will be created using MyISAM. Innodb and Falcon databases - Backups will be taken when the system is idle and when there is activity against the server - Activity will be generated using the mysql-stress-test framework and will include up to 100 concurrent users performing random insert/update/delete scenarios using direct sql and stored procedures. * Insert: A random row will be selected from ontime_1week and will be inserted into 'online_all' table using a stored procedure * Update: A random row will be selected from 'ontime_all' and its distance value will be incremented by one using direct sql. * Delete: A random row will be deleted from the 'online_all' table - Additional insert activity will be generated using triggers on the 'ontime_all' table that will insert rows to a log table following each insert/update/delete operation on 'ontime_all'. - Backup integrity will be checked by comparing a restore of the the backup snapshot on a separate server and comparing it to a slave replica taken from the master to the file/position of when the backup was taken: - System running with binlogging and IUDS activity - Backup initiated (binlog file and position recorded) - Slave replicating form the master up to the file and position of the backup - Backup restored on a new server - Restored database compared to Slave database Overview of test: 1) Start a server 2) Load base FlightStat database 3) Make a copy of the database to be used for the slave (see ) 4) Start client lad activity 5) After a short wait start a backup 6) Start replication to a slave stopping at the file/position of the Backup 7) Restore Backup on new system 8) Compare content of the 'Flight Stat' database in the slave replica and the restored Backup (although the HLD mentioned a checksum comparison, comparing dumps of both databases is a valid comparison as well). . As running these tests involves large about of disk space and are currently limited to a specific machine (ml-370.mysql.com), a lower priority is given to writing the tests in a portable way. Phase #2 -------- (This phase is written after a set of email discussions with development (Rafal) following feedback/questions from tests done in phase 1) - Tests will be performed on an existing copy of the TPC-H database (160G - MyIsam / 360G) Innodb - Some of the table structure will be altered / duplicated to allow for a combination of both myisam and innodb table in a single database - Backup / Restore / verification of restore will be performed on an 'idle' database. - Work running backup under activity will continue under WL#4922 Overview of test: 1) One time setup 1.1) Setup a vardir directory with existing innodb TPC- database 1.2 Start the server 1.3) Create 'myisam' tables from some fo the existing tables 1.4) Backup the vardir for future use 2) Running 2.1) Run a backup 2.2) Start a second server 2.3) Restore the backup file to the second server 2.4) Compare the content of the original database backed up with the restored one (although the HLD mentioned a checksum comparison, comparing dumps of both databases is a valid comparison as well) Note: It is not clear how long the restore of the TPC-H database will take. Based on initial backup TPC-H databse (47 min) and backup restore time of flight-stat database in phase #1 (4 min / 9.5 hours), the restore is estimated to take around five days (~114 hours) Implementation details ----------------------- IMPORTANT NOTE: Only phase #2 (backup/restore on an idle database) has been implemented. The script has been pushed into mysql-test-extra tree: mysql-test-extra/mysql-test/test_tools/scripts/backup/backup_restore.pl Usage: ./backup_restore.pl --datadir=<stored database> --backup-file=<backup file destination> --vardir=<vardir for restore, tmp for backup will also be placed there> --mysql-basedir=<MySQL server location> --mtr-build-thread=<integer> --data-volume=<integer>[K|B|G] datadir: (MANDATORY) location of the datadir for the 1st server (where the database is stored locally) backup-file: (MANDATORY) target location for the backup file -- if the file already exists, it will be deleted vardir: (MANDATORY) location of the datadir for the 2nd server (which is empty at the startup, but grows as the restore is being performed). If the folder exists, all contents will be removed! mysql-basedir: (MANDATORY) location of MySQL binaries directory (containing mysql-test, bin, etc.) mtr-build-thread: (OPTIONAL) MTR_BUILD_THREAD value defining port number for servers, default 300 (port 13000) data-volume: (OPTIONAL) approximate volume of the datadir; it is used to tune innodb* parameters; if not set, some medium values will be used, which might be not optimal help: print usage and exit How to invoke: $> perl ./backup_restore.pl --datadir=/data1/backup_data \ --backup-file=/data0/20090923.backup --vardir=/data0/var --mysql-basedir=/data0/mysql-6.0 \ --mtr-build-thread=10 --data-volume=500G Please make sure that you have enough space for the backup file and server vardir. Since creating a database of desirable size takes too long to be done each time, some manual work is required for running the test. The database should be pre-created and stored locally. The script takes the database location as a parameter, starts a server with the defined datadir, runs BACKUP, stores table checksums, stops the server, starts a server with an empty datadir, runs RESTORE, compares new checksums with the stored ones, and stops the server. The test passes if BACKUP and RESTORE finished successfully, and all corresponding checksums are equal. Note: while starting servers, the script attempts to adjust some server parameters to optimize restore performance. If --data-volume option is set incorrectly or is missing, the values might be totally wrong.