MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
MySQL NDB Cluster Backup/Restore Challenge

Hey, dolphins! Ready to test your NDB backup and restore skills?

Q1: You have a large database which takes 3 hours to back up. Insert/update/delete traffic will run during the backup. How do you run a backup so that none of the inserts/updates/deletes which are executed after the start of the backup are reflected in the backup files?

Q2: Your backup fileset contains 10 tables, t0 to t9. You wish to restore them to a database which already contains a table named t0. How do you restore all the data minus the contents of t0? Without modifying the database, of course.

Q3: You have the same schema as above, but you do want to restore t0 from the backup to the database. The table definitions are identical except for the datatypes of one column – the backup table has a VARCHAR and the database table has a TEXT. What do you do? Again, without modifying the database.

Q4: You want to start a backup, but you also want to handle read/write traffic while the backup is running. How do you limit the backup parallelism so as to leave the data nodes with ample capacity to handle queries?

Q5. How do you set up your restore to run as fast as possible, if you have the hardware resources to handle a higher load?

Q6: You follow excellent security practices, so you’ve decided to encrypt your backup. How do you set the backup encryption password?

Q7: You have a backup with a table t0 that contains 100 tuples, with ‘id’ column values from 1 to 100. Your database also has an identical table t0, but with ‘id’ values from 1 to 1000. How do you append the backup table t0’s tuples to the database table t0 without overwriting the first 1000 rows?

Q8: You have taken a backup from a 4-node cluster. You want to restore this backup to an 8-node cluster. What extra steps do you need to take to restore this backup so that the tuples in the backup are evenly distributed among the 8 nodes?

Scroll down for answers

. . .

. . .

. . .

. . .

. . .

A1: start backup snapshotstart

Every backup has a startGCP and a stopGCP. The startGCP establishes a point-in-time where the backup has just started, but not yet written any tuples to file. The stopGCP establishes a point-in-time where the backup has finished writing all the data and is ready to report completion. A SNAPSHOTSTART backup will capture the data in the cluster at the startGCP. The default option is SNAPSHOTEND, which captures the data at the stopGCP.

A2: ndb_restore –exclude-tables

ndb_restore has options to exclude individual tables with –exclude-tables, and also to exclude databases using –exclude-databases. If you prefer to specify tables/databases to include instead, the –include-tables and –include-databases options can be used instead.

https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-programs-ndb-restore.html#option_ndb_restore_exclude-tables

A3: ndb_restore –promote-attributes

ndb_restore can perform type conversions while restoring. These type conversions may be promotions from a smaller data type in the backup to a larger data type in the restored DB, or conversions which can result in data loss, i.e. from a larger data type in the backup to a smaller data type in the restored DB. Promotions can be enabled by adding the –promote-attributes flag to ndb_restore, while data-loss conversions can be enabled by adding –lossy-conversions.

https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-programs-ndb-restore.html#option_ndb_restore_promote-attributes

A4: EnableMultithreadedBackup=0 in config.ini

Backup is normally run in parallel across nodes. Within a node, the backup is multi-threaded (versions 8.0.16 and up). The multi-threading can be switched off in the configuration by setting the configuration parameter EnableMultithreadedBackup=0. Don’t forget to do a rolling restart after setting the parameter, so that all the nodes pick it up!

A5: ndb_restore –parallelism=<higher_value>

On 8.0.16 and up, ndb_restore runs a multithreaded restore to restore data.

The –parallelism parameter controls the load placed upon each ndb_restore thread. By default, this is throttled to consume the same resources as a single-threaded restore. You can increase the parallelism by specifying a higher value than the default value 128.

https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-programs-ndb-restore.html#option_ndb_restore_parallelism

A6: ndb_mgm -e ‘START BACKUP ENCRYPT PASSWORD=”<password>”’

Backup encryption is supported on 8.0.22 and up. The password can be included in the START BACKUP command, provided the ENCRYPT keyword is used.

https://dev.mysql.com/doc/mysql-cluster-excerpt/8.0/en/mysql-cluster-backup-using-management-client.html

A7: ndb_restore –remap-column=test.t0.id:offset:1000

The –remap-column option can be used to specify an offset value. In the above example, the backup tuples with values 1 to 100 will be offset by 1000 so that they are restored as values 1001 to 1100.

https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-programs-ndb-restore.html#option_ndb_restore_remap-column

A8: None!

The standard restore procedure can be used to restore data from a different-sized cluster. No extra steps needed.

So how many questions did you get right?

6-8: Congratulations, you aced the challenge – your backup/restore knowledge is impressive!

3-5: Well done – check out this deep dive to improve your already solid skills!

http://messagepassing.blogspot.com/2020/11/mysql-cluster-backup-and-restore.html

0-2: That was a valiant effort, don’t give up just yet – try this beginner-friendly introduction to backup and restore!

https://clustertesting.blogspot.com/2021/01/mysql-ndb-cluster-backup-restore-in.html