WL#4005: checkpoint and backup to Amazon s3 from Cluster

Affects: Server-5.2 — Status: In-Progress

Description
High Level Architecture

Amazon s3 is a fault-tolerant distributed storage service. When applications are
deployed on the Amazon ec2 Clustered environment, s3 is the only persistent
storage. However, s3 operates as a web service, not as a filesystem. Because of
this, normal database usage on ec2 is a fraught with peril. Most databases
expect to be able to write to a local file, and subsequently expect that once
that file is written, their work is done. An enterprising admin could take
frequent dumps and inject them into s3, but there is a long lag inherent to this
that might be unacceptable. There is also a FUSE implementation that can mount
s3 as a filesystem, but here the latency associated with a disk write would
likely also be unacceptable.

NDB divorces individual transaction latency from disk latency, and itself has a
concept of Asynchronous writes to disk. Adding the capability to the Ndbfs
implementation of writing directly to and reading from s3 could allow for
interesting deployments on ec2, and perhaps elsewhere as well. 

The current implementation will focus on adding behavior to the AsyncFile
object, so that based on file path information, it will either write files to s3
or to the local filesystem.

s3 specifies storage in terms of buckets and objects. Information about S3
itself can be found at http://docs.amazonwebservices.com/AmazonS3/2006-03-01/gsg/

A bucket is similar to a directory or namespace. An individual s3 user may have
up to 100 buckets, which may have names up to 255 bytes in length. The overall
bucket namespace is global, so care must taken to create a unique bucket name
for each cluster. The bucket will be chosen by the user and added to the cluster
configuration file. If the bucket does not exist during an initial restart, it
will be created. If it does not exist during a normal restart, it will result in
node failure. 

Within buckets, objects are placed. The object namespace per bucket is flat, but
the naming keys can contain any UTF-8 character. So while there cannot be
"subdirectories" there is nothing preventing an object from being named
"ndb_1_fs/D10/DBLQH/S22.FragLog". An object can be up to 5GB in size. The
objects only support GET, PUT and DELETE. A bucket has no limit on the number of
objects stored. 

Objects do not support file-like seeking or appending. The data stored in an
object may only be read or written in total. The reading uses HTTP GET and the
writing uses HTTP PUT. To support the buffered reading and writing and seeking
that occurs, individual ndb files will be split into multiple objects per block
to be stored of the form:
"ndb_1_fs/D10/DBLQH/S22.FragLog.1" with an addition object
"ndb_1_fs/D10/DBLQH/S22.FragLog" stored which contains information about how
many blocks have been stored for that file. 

File storage locations will be extended to accept URI form locations, this way,
one could choose to store backups on S3 and data files locally, or any
combination thereof. Local file storage would look like: 

DataDir=/var/lib/mysql-cluster
or 
DataDir=file:///var/lib/mysql-cluster

while S3 storage would be: 

DataDir=s3://mybucketname

Authentication to S3 is via shared secret in the form of a Secret Key and a Key
ID. Configuration options will be added to contain the AWS Secret Access Key and
the AWS Access Key ID.