WL#4630: ST: MySQL Backup Client Program - Milestone 1

Affects: Server-6.0   —   Status: Complete   —   Priority: Medium

Milestone 1:
* Display the metadata contained in the backup image (i.e, the SQL statements).
* List objects contained in the backup image.
* Display statistics about the backup image (e.g., compression algorithm, etc.).
* Client should be a platform independent command-line utility 
  where features are selected using options and parameters
  (e.g. mysqldump, mysql, etc.).
* Search the backup image for data and display the object if found. 
  Note: This may be limited to certain field and object types.
* Search the backup image for a given object and display its metadata.
* In case of problems with reading the image provide as much information as
  possible. E.g., the position of the failure.
The client program shall be called "mysqlbackup".

For milestone 1 it shall have the following usage:

Usage: mysqlbackup [options] backup-image-file

  -?, --help          Display this help and exit.
  -#, --debug[=name]  Output debug log.
  -V, --version       Print version and exit.

  --catalog-summary   Print summary from the database objects catalog.
  --catalog-details   Print details from the database objects catalog.
  --metadata-statements 
                      Print SQL statements to create the database objects.
  --metadata-extra    Print extra meta data for the database objects.
  --snapshots         Print information about snapshots contained in the backup
                      image.
  --data-chunks       Print length of every data chunk contained in the backup
                      image.
  --data-totals       Print length of data contained in the backup image for
                      each object.
  --summary           Print summary information from end of the backup image.
  --all               Print everything except snapshots and data-chunks.
  --exact             Print exact number of bytes instead of human readable
                      form.
  --search=name       Search object in the backup image. Name can be object or
                      database.object. Quoting of database and/or object with
                      ", ', or ` is allowed. Wildcards % and _ are available.
                      Use with --metadata-* options to see meta data. Plain
                      name finds global objects, name1.name2 finds per db
                      objects.


The object types to be recognized by milestone 1 are:

  Character sets
  Users
  Tablespaces
  Databases
  Tables
  Views
  Stored procedures
  Stored functions
  Events
  Triggers
  Privileges

Information that is not requested by options shall not be read from the
image to save reading time. If seeking on the image is not possible, or
the distance is not known, reading shall stop when no further
information is requested.

Times shall be printed in ISO format: YYYY-mm-dd HH:MM:SS
Number shall be printed human readable by default. That is by using
multipliers like KB, MB, GB, ...

The output shall be well human readable. If possible it should also be
parsable. The former has higher priority when in doubt.

Sample output:
==============

mysqlbackup --catalog-summary --catalog-details --data-totals --summary

Image path:          'mysql-test/mysqlbackup-test.bak'
Image size:          8458 KB
Image compression:   none
Image version:       1
Creation time:       2008-11-16 11:51:40 UTC
Server version:      6.0.9 (6.0.9-alpha)
Server byte order:   little-endian
Server charset:      'utf8'

Catalog summary:

  Databases:              3
  Tables:                 9
  Other per db objects:   63

Catalog details:

  Tablespace 'mysqltest_ts1'
  Tablespace 'mysqltest_ts2'
  Database  'mysqltest1'
    Table     'mysqltest1'.'t1'
    Table     'mysqltest1'.'t2'
    Table     'mysqltest1'.'t3'
    Sproc     'mysqltest1'.'p1'
    Sproc     'mysqltest1'.'p2'
    Sfunc     'mysqltest1'.'f1'
    Sfunc     'mysqltest1'.'f2'
    View      'mysqltest1'.'v1'
    View      'mysqltest1'.'v2'
    Event     'mysqltest1'.'e1'
    Event     'mysqltest1'.'e2'
    Privilege 'mysqltest1'.''bup_user1'@'%' 00000008'
    Privilege 'mysqltest1'.''bup_user2'@'%' 00000009'
    Privilege 'mysqltest1'.''no_user'@'%' 00000010'
    Privilege 'mysqltest1'.''no_user'@'%' 00000011'
    Privilege 'mysqltest1'.''no_user'@'%' 00000012'
    Privilege 'mysqltest1'.''no_user'@'%' 00000013'
    Privilege 'mysqltest1'.''no_user'@'%' 00000014'
    Privilege 'mysqltest1'.''no_user'@'%' 00000015'
    Privilege 'mysqltest1'.''no_user'@'%' 00000016'
    Privilege 'mysqltest1'.''no_user'@'%' 00000017'
    Privilege 'mysqltest1'.''no_user'@'%' 00000018'
    Privilege 'mysqltest1'.''no_user'@'%' 00000019'
    Privilege 'mysqltest1'.''no_user'@'%' 00000020'
    Privilege 'mysqltest1'.''no_user'@'%' 00000021'
    Privilege 'mysqltest1'.''no_user'@'%' 00000022'
    Privilege 'mysqltest1'.''no_user'@'%' 00000023'
    Privilege 'mysqltest1'.''no_user'@'%' 00000024'
    Privilege 'mysqltest1'.''no_user'@'%' 00000025'
    Privilege 'mysqltest1'.''no_user'@'%' 00000026'
    Privilege 'mysqltest1'.''no_user'@'%' 00000027'
    Privilege 'mysqltest1'.''bup_user2'@'%' 00000028'
    Privilege 'mysqltest1'.''bup_user2'@'%' 00000029'
    Privilege 'mysqltest1'.''bup_user1'@'%' 00000030'
  Database  'mysqltest2'
    Table     'mysqltest2'.'t1'
    Table     'mysqltest2'.'t2'
    Table     'mysqltest2'.'t3'
    Sproc     'mysqltest2'.'p1'
    Sproc     'mysqltest2'.'p2'
    View      'mysqltest2'.'v1'
    View      'mysqltest2'.'v2'
    Event     'mysqltest2'.'e1'
    Event     'mysqltest2'.'e2'
    Trigger   'mysqltest2'.'r1'
    Trigger   'mysqltest2'.'r2'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000008'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000009'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000010'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000011'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000012'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000013'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000014'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000015'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000016'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000017'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000018'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000019'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000020'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000021'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000022'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000023'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000024'
    Privilege 'mysqltest2'.''bup_user1'@'%' 00000025'
  Database  'mysqltest3'
    Table     'mysqltest3'.'t1'
    Table     'mysqltest3'.'t2'
    Table     'mysqltest3'.'t3'
    Sfunc     'mysqltest3'.'f1'
    Sfunc     'mysqltest3'.'f2'
    View      'mysqltest3'.'v1'
    View      'mysqltest3'.'v2'
    Trigger   'mysqltest3'.'r1'
    Trigger   'mysqltest3'.'r2'

Data totals:

  Backup has 939 KB for table 'mysqltest1'.'t1'
  Backup has 937 KB for table 'mysqltest1'.'t2'
  Backup has 937 KB for table 'mysqltest1'.'t3'
  Backup has 937 KB for table 'mysqltest2'.'t1'
  Backup has 937 KB for table 'mysqltest2'.'t2'
  Backup has 938 KB for table 'mysqltest2'.'t3'
  Backup has 937 KB for table 'mysqltest3'.'t1'
  Backup has 938 KB for table 'mysqltest3'.'t2'
  Backup has 937 KB for table 'mysqltest3'.'t3'

Summary:

Creation time:       2008-11-16 11:51:40 UTC
Validity time:       2008-11-16 11:51:41 UTC
Finish   time:       1900-01-00 00:00:00 UTC
No binlog information
mysqlbackup will consist of two modules:

- The main program
- The backup stream reader

The main program will be written in C++: mysqlbackup.cc. Its tasks will be:

- Initialize mysys components
- Load default values from configuration files
- Parse the command line
- Drive the backup stream reader module
- Provide an error message report function to the stream reader module
- Search functionality
- Print information retrieved from the stream,
  depending on the command line options
- Free resources and return status

The backup stream reader will be written in C: backup_stream.c.
Its tasks will be:

- Drive the backup stream library
- Provide call-back function for the stream library (see below)
- Build an item catalog from the items received by the call-backs
- Provide a convenient interface for the main program (see below)

The structure of the catalog and the functions to drive its build are
specified in backup_stream.h:

/*
  Catalog.

  The dynamic arrays hold pointers to items of the following types:

  struct st_backup_charset              cat_charsets
  struct st_backup_database             cat_databases
  struct st_backup_snapshot             cat_snapshots

  note: cat_header must be first element in st_backup_catalog.
*/
struct st_backup_catalog
{
  struct st_bstream_image_header        cat_header;     /* must be 1st */
  const char                            *cat_zalgo;
  const char                            *cat_image_path;
  my_off_t                              cat_image_size;
  DYNAMIC_ARRAY                         cat_charsets;
  DYNAMIC_ARRAY                         cat_databases;
  DYNAMIC_ARRAY                         cat_snapshots;
};

/*
  Meta data.
*/
struct st_backup_metadata
{
  struct st_blob                        md_query;
  struct st_blob                        md_data;
};

/*
  Character set.

  note: cs_item must be first element in st_backup_charset.
*/
struct st_backup_charset
{
  struct st_bstream_item_info           cs_item;        /* must be 1st */
};

/*
  Per database objects, e.g. views.

  note: perdb_item must be first element in st_backup_table.
*/
struct st_backup_perdb
{
  struct st_bstream_dbitem_info         perdb_item;       /* must be 1st */
  struct st_backup_metadata             perdb_metadata;
};

/*
  Table.

  note: tbl_item must be first element in st_backup_table.
*/
struct st_backup_table
{
  struct st_bstream_table_info          tbl_item;       /* must be 1st */
  struct st_backup_metadata             tbl_metadata;
  ulonglong                             tbl_data_size;
};

/*
  Database.

  The dynamic array holds pointers to items of the following type:

  struct st_backup_table                db_tables
  struct st_backup_perdb                db_perdbs

  note: db_item must be first element in st_backup_database.
*/
struct st_backup_database
{
  struct st_bstream_db_info             db_item;        /* must be 1st */
  struct st_backup_metadata             db_metadata;
  DYNAMIC_ARRAY                         db_tables;
  DYNAMIC_ARRAY                         db_perdbs;
};

/*
  Snapshot.

  Tables belong to databases. But in the table data chunks they are
  numbered by snapshot number and table number. The table number is
  relative to the snapshot. To find the table item within its database
  we need an index from the table number (pos) within the snapshot
  to the table item.

  For every snapshot there is a struct st_backup_snapshot with an
  array that has a reference per table of that snapshot.

  The dynamic array holds pointers to items of the following type:

  struct st_backup_table                snap_index_pos_to_table
*/
struct st_backup_snapshot
{
  DYNAMIC_ARRAY                         snap_index_pos_to_table;
};

In the dynamic arrays, we store pointers to catalog items only.
Some items reference others. These pointers would become invalid
when the array is reallocated on insert of a new element.
Each item is allocated before its pointer is inserted in an array.
Before deleting the array, all elements must be freed.

The simplified work flow as used by the mysqlbackup client looks like so:

  backup_catalog_allocate()        // initialize catalog
  backup_image_open()              // open image and read header
  backup_read_catalog()            // read and build catalog
  backup_read_metadata()           // read meta data and add to catalog
  do {
       backup_read_snapshot()      // read a table data chunk
  } while more table data follows
  backup_read_summary()            // read summary section
  backup_image_close()             // close image
  backup_catalog_free()            // free all catalog resources

The steps backup_read_catalog() until backup_read_summary() can be
skipped, when the user does not request information from it. But if
information from any later section is required, all former needs to
to be read too.

Call-backs for the stream library that must be implemented for reading
backup streams:

/**
  Allocate given amount of memory and return pointer to it.

  @param[in]    size            amount of memory to allocate

  @return       pointer to allocated memory
*/
bstream_byte* bstream_alloc(unsigned long int size);


/**
  Free previously allocated memory.

  @param[in]    ptr             pointer to allocated memory
*/
void bstream_free(bstream_byte *ptr);


/**
  Read from the stream/image.

  @param[in,out]    strm        stream handle, updating position
  @param[in,out]    data        data container, updating contents and ptrs
  @param            envelope    not used

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     BSTREAM_EOS     end of stream
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.
*/
static int
str_read(struct st_stream *strm, struct st_blob *data,
         struct st_blob envelope __attribute__((unused)));


/**
  Skip part of the stream/image.

  @param[in,out]    strm        stream handle, updating position
  @param[in,out]    len         number of bytes to skip, skipped

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.
*/
static int
str_forward(struct st_stream *strm, size_t *len);


/**
  Clear catalogue and prepare it for populating with items.

  @param[in]    hdr             catalog reference

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.

  @note This is empty because backup_catalog_allocate() initializes
  the catalog properly.
*/
int
bcat_reset(struct st_bstream_image_header *hdr __attribute__((unused)));


/**
  Close catalogue after all items have been added to it.

  This allows for finalizing operations. It is not meant for
  deletion of the catalog. There is no "open" action. The
  approximate counterpart to bcat_close() is bcat_reset().

  @param[in]    hdr             catalog reference

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.

  @note This is empty because there is no finalization required.
*/
int
bcat_close(struct st_bstream_image_header *hdr __attribute__((unused)));


/**
  Add item to the catalog.

  For items that belong to a database, the base.db element points
  to the databases' catalog item. The stream library evaluates
  that pointer using an iterator provided by bcat_iterator_get().

  The item name is allocated by the stream library and must be freed
  by the application later.

  @param[in,out]    hdr         catalog ref, updating catalog
  @param[in]        item        item reference

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note item->pos should be set to indicate position of the item in the
  catalogue. This is a global position per item type. Items that belong
  to a database are not numbered relative to the database.

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.

  @note Global, per-table and per-database items can have independent
  address spaces. Thus item belonging to a database is identified by its
  position inside that database's item list. Similar for items belonging
  to tables.
*/
int
bcat_add_item(struct st_bstream_image_header *hdr,
              struct st_bstream_item_info *item);


/**
  Create global iterator of a given type.

  Possible iterator types.

  - BSTREAM_IT_CHARSET: all charsets
  - BSTREAM_IT_USER:    all users
  - BSTREAM_IT_DB:      all databases

  The following types of iterators iterate only over items for which
  some meta-data should be saved in the image.

  - BSTREAM_IT_GLOBAL: all global items in create-dependency order
  - BSTREAM_IT_PERDB: all per-db items except tables which are enumerated by
                      a table iterator (see below)
  - BSTREAM_IT_PERTABLE: all per-table items in create-dependency orders.

  @param[in]    hdr             catalog reference
  @param[in]    it_type         iterator type

  @return       pointer to the iterator
    @retval     NULL            error
*/
void*
bcat_iterator_get(struct st_bstream_image_header *hdr, unsigned int it_type);


/**
  Return next item pointed by iterator.

  @param[in]    hdr             catalog reference
  @param[in]    iter_arg        iterator reference

  @return       pointer to catalog item
    @retval     NULL            error
*/
struct st_bstream_item_info*
bcat_iterator_next(struct st_bstream_image_header *hdr __attribute__((unused)),
                   void *iter_arg);


/**
  Free iterator resources.

  @param[in]    hdr             catalog reference
  @param[in]    iter_arg        iterator reference

  @note
  The iterator can not be used after call to this function.
*/
void
bcat_iterator_free(struct st_bstream_image_header *hdr __attribute__((unused)),
                   void *iter_arg);


/**
  Create database object from its meta-data.

  When the meta-data section of backup image is read, items can be created
  as their meta-data is read (so that there is no need to store these
  meta-data). This functions stores them in the catalog instead of
  creating database objects. So the application can make different use
  of the data.

  @param[in]    hdr             catalog reference
  @param[in]    item            item reference
  @param[in]    query           query string
  @param[in]    data            data blob

  @note The item has set the 'type' element only. No item name nor
  a catalog position is provided. Let alone a reference to a database.

  @note Either query or data or both can be empty, depending
  on what was stored in the image.

  @note The blob provided by query and/or data is not guaranteed to
  exist after the call. It must be copied to become part of the catalog.

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.
*/
int
bcat_create_item(struct st_bstream_image_header *hdr,
                 struct st_bstream_item_info *item,
                 struct st_blob query,
                 struct st_blob data);


========
The following call-back functions must be present for linkage, but will
be empty:


/**
  Create iterator for items belonging to a given database.

  @param[in]    hdr             catalog reference
  @param[in]    db              database item reference

  @return       pointer to the iterator
    @retval     NULL            error

  @note Not used when reading a backup stream.
*/
void*
bcat_db_iterator_get(struct st_bstream_image_header *hdr
                     __attribute__((unused)),
                     struct st_bstream_db_info *db
                     __attribute__((unused)));


/**
  Return next item from database items iterator

  @param[in]    hdr             catalog reference
  @param[in]    db              database item reference
  @param[in]    iter_arg        iterator reference

  @return       pointer to catalog item
    @retval     NULL            error

  @note Not used when reading a backup stream.
*/
struct st_bstream_dbitem_info*
bcat_db_iterator_next(struct st_bstream_image_header *hdr
                      __attribute__((unused)),
                      struct st_bstream_db_info *db
                      __attribute__((unused)),
                      void *iter_arg
                      __attribute__((unused)));


/**
  Free database items iterator resources

  @param[in]    hdr             catalog reference
  @param[in]    db              database item reference
  @param[in]    iter_arg        iterator reference

  @note Not used when reading a backup stream.
*/
void
bcat_db_iterator_free(struct st_bstream_image_header *hdr
                      __attribute__((unused)),
                      struct st_bstream_db_info *db
                      __attribute__((unused)),
                      void *iter_arg
                      __attribute__((unused)));


/**
  Produce CREATE statement for a given item.

  Backup stream library calls this function when saving item's
  meta-data. If function successfully produces the statement, it becomes
  part of meta-data.

  @param[in]    hdr             catalog reference
  @param[in]    item            item reference
  @param[out]   query           query string

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.

  @note Not used when reading a backup stream.
*/
int
bcat_get_item_create_query(struct st_bstream_image_header *hdr
                           __attribute__((unused)),
                           struct st_bstream_item_info *item
                           __attribute__((unused)),
                           bstream_blob *query
                           __attribute__((unused)));


/**
  Return meta-data (other than CREATE statement) for a given item.

  Backup stream library calls this function when saving item's
  meta-data. If function returns successfully, the bytes returned become
  part of meta-data.

  @param[in]    hdr             catalog reference
  @param[in]    item            item reference
  @param[out]   data            data blob

  @return       status
    @retval     BSTREAM_OK      ok
    @retval     otherwise       error

  @note The return value is specified as 'int' in stream_v1.h
  though only values from enum_bstream_ret_codes are expected.

  @note Not used when reading a backup stream.
*/
int
bcat_get_item_create_data(struct st_bstream_image_header *hdr
                          __attribute__((unused)),
                          struct st_bstream_item_info *item
                          __attribute__((unused)),
                          struct st_blob *data
                          __attribute__((unused)));


========
The program will have a catalog reader module, that hides the stream
library details with its call-back functions from the application. It
will provide the following functions:


/**
  Allocate a backup catalog.

  @return       catalog reference
    @retval     NULL            error
*/
struct st_backup_catalog*
backup_catalog_allocate(void);


/**
  Free a backup catalog.

  @param[in]    bup_catalog             catalog reference
*/
void
backup_catalog_free(struct st_backup_catalog *bup_catalog);


/**
  Open a backup image.

  @param[in]    filename                file name
  @param[in]    bup_catalog             catalog reference

  @return       image handle reference
    @retval     NULL                    error
*/
void*
backup_image_open(const char *filename, struct st_backup_catalog *bup_catalog);


/**
  Close a backup image.

  @param[in]    image_handle            image handle reference
*/
void
backup_image_close(void* image_handle);


/**
  Read backup image catalog.

  @param[in]    image_handle            image handle reference
  @param[in]    bup_catalog             catalog reference

  @return       status
    @retval     BSTREAM_OK              ok
    @retval     != BSTREAM_OK           error
*/
enum enum_bstream_ret_codes
backup_read_catalog(void* image_handle, struct st_backup_catalog *bup_catalog);


/**
  Read backup image meta data.

  @param[in]    image_handle            image handle reference
  @param[in]    bup_catalog             catalog reference

  @return       status
    @retval     BSTREAM_OK              ok
    @retval     != BSTREAM_OK           error
*/
enum enum_bstream_ret_codes
backup_read_metadata(void *image_handle, struct st_backup_catalog *bup_catalog);


/**
  Read backup image table data.

  @param[in]    image_handle            image handle reference
  @param[in]    bup_catalog             catalog reference

  @return       status
    @retval     BSTREAM_OK              ok
    @retval     != BSTREAM_OK           error
*/
enum enum_bstream_ret_codes
backup_read_snapshot(void *image_handle,
                     struct st_backup_catalog *bup_catalog
                      __attribute__((unused)),
                     struct st_bstream_data_chunk *snapshot);


/**
  Read backup image summary.

  @param[in]    image_handle            image handle reference
  @param[in]    bup_catalog             catalog reference

  @return       status
    @retval     BSTREAM_OK              ok
    @retval     != BSTREAM_OK           error
*/
enum enum_bstream_ret_codes
backup_read_summary(void* image_handle, struct st_backup_catalog *bup_catalog);


/**
  Locate a database object by catalog coordinates.

  Catalog coordinates for databases are:

      pos           database position in catalog

  @param[in]    bup_catalog     catalog reference
  @param[in]    pos             position in catalog's database array

  @return       database reference
*/
struct st_backup_database*
backup_locate_database(struct st_backup_catalog *bup_catalog,
                       uint pos);


/**
  Locate a table object by catalog coordinates.

  Catalog coordinates for tables are:

      snap_num      snapshot position in catalog
      pos           table position in snapshot

  @param[in]    bup_catalog     catalog reference
  @param[in]    snap_num        position in catalog's snapshot array
  @param[in]    pos             position in snapshot's table index array

  @return       table reference
*/
struct st_backup_table*
backup_locate_table(struct st_backup_catalog *bup_catalog,
                    uint snap_num, uint pos);


/**
  Locate a perdb object by catalog coordinates.

  Catalog coordinates for perdb items are:

      db_pos        database position in catalog
      pos           perdb item position in database

  @param[in]    bup_catalog     catalog reference
  @param[in]    db_pos          position in catalog's database array
  @param[in]    pos             position in database's perdb array

  @return       perdb item reference
*/
struct st_backup_perdb*
backup_locate_perdb(struct st_backup_catalog *bup_catalog,
                    uint db_pos, uint pos);