WL#4005: checkpoint and backup to Amazon s3 from Cluster
Affects: Server-5.2
—
Status: In-Progress
Amazon s3 is a fault-tolerant distributed storage service. When applications are deployed on the Amazon ec2 Clustered environment, s3 is the only persistent storage. However, s3 operates as a web service, not as a filesystem. Because of this, normal database usage on ec2 is a fraught with peril. Most databases expect to be able to write to a local file, and subsequently expect that once that file is written, their work is done. An enterprising admin could take frequent dumps and inject them into s3, but there is a long lag inherent to this that might be unacceptable. There is also a FUSE implementation that can mount s3 as a filesystem, but here the latency associated with a disk write would likely also be unacceptable. NDB divorces individual transaction latency from disk latency, and itself has a concept of Asynchronous writes to disk. Adding the capability to the Ndbfs implementation of writing directly to and reading from s3 could allow for interesting deployments on ec2, and perhaps elsewhere as well. The current implementation will focus on adding behavior to the AsyncFile object, so that based on file path information, it will either write files to s3 or to the local filesystem.
s3 specifies storage in terms of buckets and objects. Information about S3 itself can be found at http://docs.amazonwebservices.com/AmazonS3/2006-03-01/gsg/ A bucket is similar to a directory or namespace. An individual s3 user may have up to 100 buckets, which may have names up to 255 bytes in length. The overall bucket namespace is global, so care must taken to create a unique bucket name for each cluster. The bucket will be chosen by the user and added to the cluster configuration file. If the bucket does not exist during an initial restart, it will be created. If it does not exist during a normal restart, it will result in node failure. Within buckets, objects are placed. The object namespace per bucket is flat, but the naming keys can contain any UTF-8 character. So while there cannot be "subdirectories" there is nothing preventing an object from being named "ndb_1_fs/D10/DBLQH/S22.FragLog". An object can be up to 5GB in size. The objects only support GET, PUT and DELETE. A bucket has no limit on the number of objects stored. Objects do not support file-like seeking or appending. The data stored in an object may only be read or written in total. The reading uses HTTP GET and the writing uses HTTP PUT. To support the buffered reading and writing and seeking that occurs, individual ndb files will be split into multiple objects per block to be stored of the form: "ndb_1_fs/D10/DBLQH/S22.FragLog.1" with an addition object "ndb_1_fs/D10/DBLQH/S22.FragLog" stored which contains information about how many blocks have been stored for that file. File storage locations will be extended to accept URI form locations, this way, one could choose to store backups on S3 and data files locally, or any combination thereof. Local file storage would look like: DataDir=/var/lib/mysql-cluster or DataDir=file:///var/lib/mysql-cluster while S3 storage would be: DataDir=s3://mybucketname Authentication to S3 is via shared secret in the form of a Secret Key and a Key ID. Configuration options will be added to contain the AWS Secret Access Key and the AWS Access Key ID.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.