WL#2354: Online backup: NDB native driver
Affects: Server-6.x — Status: Un-Assigned — Priority: Low
In WL#1613 Online Backup (all storage engine) a generic backup interface will need to use ndbcluster backup. -- Some ideas from Stewart (copied from an email): Okay, we should really decide what we want to do regarding NDB in: - initial implementation - final, complete implementation Considering that some work may have to be done in NDB. I don't think that anybody is really going to mind (at least initially) if NDB is not consistent with other storage engines. Just that it's consistent within itself (which it is). However, we should be able to make NDB consistent with others. at least in the future. The lock on the binlog is so that transactions don't complete, right? as in you expect running txns to wait on that mutex before being committed? This isn't true for NDB as committed rows come back from the storage nodes via the event API asynchronously. We can, however, use the cluster GCP as a synchronisation point. As this point is where the NDB online backup synchronises too, where we restore to after system failure and a single, identifiable transaction in the binlog (at least currently).  In the future, to support replication with higher transaction loads for cluster, we may end up having to have several masters and several slaves replicating different parts of the the db. how we integrate this with backup will be fun. We may have to push down the "are we all at the same point in binlogging" to the engine so it can decide with the other masters. Does mysqlbackup just start transactions and dump? or does it lock and copy table files? The former will work for something like federated, latter, won't. Although, should federated tables be backed up locally? I'd debate that it should be an option... but everybody is welcome to say i'm being stupid (diplomatically). DDL is also a storage engine issue - e.g. with cluster (and federated probably). We lock out DICT operations during backup, but it's possible (would have to think about it more) that we need that lock for longer? If so, we need some NDB code. Shouldn't be too hard though... (as long as you're willing to loose that lock on your node failure). Also, NDB can abort the backup due to node failure. So it's possible that the backup needs to be restarted for NDB. I'm guessing other engines could have this too? (abort backup under extreme load). A interesting (and good) feature of the current way NDB does backup is that the backups are written to local disk of the nodes. This means we don't impose interfering network traffic and collectively have more disk bandwidth. We possibly want to have a way to configure which network interface (or just address) we should get data from cluster regarding the backup. For NDB, arguably the best way is to backup to local disk FIRST and then pull the contents of this over the wire into the MySQL Streaming Backup. This way we get the cluster backup done at the maximal rate (smaller logs, shorter replay time, faster restore) and can then have control over how we interfere with network traffic (at a well defined rate, or over a separate link). Either way requires some NDB code, but nothing too invasive. Although writing backup directly to the MySQL server and not to local disk on all nodes could mean the backup never completes on any reasonably loaded cluster. There is almost no doubt that people WILL (want to) USE this generic backup interface just to get the cluster backup in one place on disk.  Replication currently doesn't work on these either. It could be a good idea to put global metadata next to the server-id. In theory then, in the future, we could back up the (possibly different) metadata for all MySQL servers connected to the cluster. By running restore on the different server-ids, each would restore their own set of metadata (and any local tables) but only the cluster tables once. I imagine the implementation of this would use the backup streaming stuff to talk to other MySQL servers in the cluster. (something for the future though). Personally, i think per-table checksums could be useful too. No doubt we'll get somebody with a partially corrupted backup that just wants to get *something* out of it. Maybe we can then look like miracle workers :) In Parallelism, NDB can be in parallel even with a single thread - as all the data nodes can be doing work in the background and blocks can just be received whenever. So it's possible to interleave this with other engines. In regard to log backup - treating as engine sounds good. seeing as logs-in-csv-tables is around... could be neat.
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.