WL#7524: Import from serialized meta data files

Affects: Server-8.0   —   Status: Complete

Introduction

This WL covers import for non-transactional storage engines that do not store dictionary information separately. This means e.g. MyISAM. The current (as of 5.7) main way to do export/import (disregarding backup/restore) is by copying .frm files along with the data files (e.g. .MYD and .MYI for MyISAM). The mechanisms available in 5.7 will be replaced by new or adapted SQL statement(s) which use the DD, and provide better usability and uniformity.

Contents


Functional and Non-Functional Requirements

Restrictions and Limitations

EXPORT

  1. Export shall be supported only for non-temporary user tables with a name visible at the SQL layer.
  2. FLUSH TABLES ... WITH READ LOCK must be used to ensure a consistent snapshot.
  3. User must manually copy relevant files (e.g. .MYD, .MYI and .sdi).
  4. UNLOCK TABLES must be used to unlock the exported tables when they have been copied.
  5. Access to the exported tables is read-only while the tables are being exported.
  6. Subject to all restrictions for FLUSH TABLES ... WITH READ LOCK

IMPORT

  1. Import shall be supported only for non-temporary user tables. Generally, all information stored in the set of closely connected components (see WL#7069), and which are serialized together with the table and placed within its json object, is restored by an import. Note! Triggers are not serialized and will not be restored by import!
  2. The .sdi files refer to the schema by name, so the schema name must exist in the DD prior to import. Import of schema .sdi files is not supported. If the schema in which tables will be imported does not exist, the user must create it with CREATE SCHEMA|DATABASE with the desired default collation id. Alternatively, the .sdi file could be edited to reference a different schema name.
  3. The data and index files must be located in the file system directory corresponding to the schema into which the tables are to be imported, unless the table(s) being imported was created with the DATA DIRECTORY or INDEX DIRECTORY table option, in which case the data and index files must be placed in the specified locations, and the necessary symbolic links created. Alternatively the the .sdi file can be edited manually. If the data and index files cannot be found, import will be aborted.
  4. The .sdi files listed in the IMPORT-statement is subject to the same limitations as LOAD DATA INFILE ... (*), except that the semantics implied by the LOCAL keyword (with its security implications) is not supported.
  5. If the .sdi file is stored in the schema directory it needs to be writable by the server, and will be replaced by a new file written as part of import. If it is stored in the secure_file_priv directory it needs only be readable by the server and will not be modified as part of import.
  6. Character set and collation ids are imported verbatim and must exist and have the same meaning in the importing DD.
  7. The IMPORT statement shall assume all files and file system directories to be endpoints, not symbolic links. There is no explicit check for this. Relative paths are resolved relative to datadir as described in (*). Absolute paths are vetted against secure_file_priv but could point back into datadir. If they do they will not be removed after successful import.
  8. Import will not try to open the imported table to check for corruption.

(*) If LOCAL is not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file:

  • If the file name is an absolute path name, the server uses it as given.
  • If the file name is a relative path name with one or more leading components, the server searches for the file relative to the server's data directory.
  • If a file name with no leading components is given, the server looks for the file in the database directory of the default database.

In the non-LOCAL case, these rules mean that a file named as ./myfile.txt is read from the server's data directory, whereas the file named as myfile.txt is read from the database directory of the default database. [...]

For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by the user account used to run the server. Also, to use LOAD DATA INFILE on server files, you must have the FILE privilege. See Section 7.2.1, “Privileges Provided by MySQL”. For non-LOCAL load operations, if the secure_file_priv system variable is set to a nonempty directory name, the file to be loaded must be located in that directory.

Requirements

EXPORT

EX-1
Syntax (no change from existing implementation):
 > FLUSH TABLES ... WITH READ LOCK
 $ cp hr.sdi $BACKUPDIR 
 $ cp hr/employees.* $BACKUPDIR
 $ cp hr/rules* $BACKUPDIR 
 > UNLOCK TABLES

IMPORT

IM-1
Syntax:
 $ mysqld ... --secure-file-priv=/tmp/import ...
 > CREATE SCHEMA hr COLLATE X;
 # Copy sdi files to privileged dir
 $ cp $EXPORTDIR/hr/employees.sdi /tmp/import
 $ cp $EXPORTDIR/hr/rules*.sdi /tmp/import
 
 # Copy index and data files into schema dir
 $ cp $EXPORTDIR/hr/employees.* $DATADIR/hr
 $ cp $EXPORTDIR/hr/rules* $DATADIR/hr
 > IMPORT TABLE FROM '/tmp/import/employees.sdi', '/tmp/import/rules*.sdi';
IM-POS-2
Import shall be supported from one or more table .sdi files, where the filename (basename) part can be a wildcard pattern (?*).
IM-POS-3
It must be possible to edit the .sdi files manually prior to IMPORT. Such an edit may be required, e.g. in order to import a table into a schema with a different name. Note! The user is responsible for only making changes that are compatible with the contents of the .MYD and .MYI files. When in doubt the user should perform a test import.
IM-POS-4
Import into a schema with a different collation id must be possible. The imported table retains its existing collation id, even if imported into a schema with a different default collation id. The schema's default collation id is not affected by this, so additional tables created in this schema after the import will get the same default collation id as before.
IM-POS-5
Verify that concurrent DROP SCHEMA|DATABASE, after the data and index files to import have been placed in the schema dir, fails because it is unable to delete the schema directory. Verify that manual intervention can fix the problem.
IM-POS-6
Verfy that it is possible, through editing the SDI file, to import a table into a server started with --lower-case-table-names.
IM-NEG-1
Verify that the absence of the schema referenced in the SDI file is detected and diagnosed appropriately.
IM-NEG-2
Verify that the prior existence of the table being imported is detected and diagnosed appropriately.
IM-NEG-3
Verify that a non-matching pattern for the SDI file is detected and diagnosed appropriately.
IM-NEG-4
Must obey the secure_file_priv setting, e.g. for .sdi files outside the schema directory.
IM-NEG-5
Must detect and diagnose corrupted SDI files appropriately.
IM-NEG-6
Must protect against concurrent modification of affected shared resources, which include the schema record, object and directory and the table records, objects and files. E.g. by obtaining proper MDL for the schema and table names.
IM-NEG-7
Verify that SDI version mismatch is detected and diagnosed appropriately. Import will adhere to the following compatibility rules:
  1. .sdi files from the same server version are always importable.
  2. If an attribute is removed from a DD-object, older .sdi file can still be imported.
  3. If an attribute is added to a DD-object, an older .sdi file can be imported iff the attribute has a suitable default, (the attribute will not be assigned a value during import and will have the default value).
  4. If an incompatible change is introduced the SDI version number must be increased.
  5. If the SDI version number does not match, import will fail and an (yet to be decided) error message will be returned.

Note that the question of whether to always support import from within the same major version, really is a question about what kind of DD changes we will allow within a major version.

IM-NEG-8
Must ensure that the user has the privileges required to add tables, modify meta data and access the file system.
IM-NEG-9
Failure to place the data and index files for the imported table in the correct locations must be detected and diagnosed.
IM-NEG-10
Import must try to open the newly imported table, but all resulting errors must be reported as warnings and not cause import to fail.

Contents


Interfaces

EXPORT

The existing FLUSH TABLES ... WITH READ LOCK statement must be used to ensure a consistent snapshot. After issuing the statement, it is the user's responsibility to copy the files (.sdi, index and data files) to a different location. When done, the user must use the UNLOCK TABLES statement to release locks so that normal operation can resume.

IMPORT statement

A new statement is needed for import. It will have the following syntax:

 IMPORT TABLE FROM <sdi_files>

Where <sdi_files> may be a list of file name patterns.

SDI json format

Currently the top-level JSON object is defined by the following rapidjson code:

 w.StartObject();
 w.String(STRING_WITH_LEN("sdi_version"));
 w.Uint64(sdi_version);
 w.String(STRING_WITH_LEN("dd_version"));
 w.Uint(Dictionary_impl::get_target_dd_version());
 w.String(STRING_WITH_LEN("dd_object_type"));
 w.String(dd_object_type, dd_object_type_size);
 w.String(STRING_WITH_LEN("dd_object"));
 dd_obj.serialize(&wctx, &w);
 w.EndObject();

Which would result in JSON object looking something like:

 {
   "sdi_version": 42,
   "dd_version": 800000,
   "dd_object_type": "Table",
   "dd_object": { ... }
 }

By changing dd_object to be a list [ ... ] it would be possible to group the SDI for multiple DD objects in a single JSON string which could be used during import. Adding the MySQL version number is another possibility. Both to be done as separate bug fixes.

Use Cases

Plain snapshot/restore

The example in the Requirements section shows how to create a consistent snapshot using FLUSH TABLES ... WITH READ LOCK and UNLOCK TABLES and how this snapshot can be imported into another server instance.

Import modified SDI

By editing the SDI it is possible to alter the metadata of a table, such as the names of columns, before it is imported. Great care must be taken when doing this - changes which are not compatible with what is stored in the .MYD and .MYI files, like changing the data type of a column, would likely be disastrous and must be avoided.

Import from older versions

It is not possible to import tables from older (<8.0) versions of MySQL as there will be no SDI files available. The safest thing to do in this case would be to upgrade the server and then export the tables. For cases where this is not possible, one could try the following approach:

  1. Extract the table definition using SHOW CREATE on the 5.7 server.
  2. Create a dummy table with this definition on the 8.0 server.
  3. Back up the resulting SDI file.
  4. Drop the dummy table.
  5. Import the table using the backed up SDI together with the 5.7 .MYD and .MYI files.

Disaster recovery

It is possible to try to import a table that has not been properly exported, and which may be corrupt. Import will try to open the table, but reports any errors as warnings and commits the meta data changes regardless. The user must then attempt to repair the table using the REPAIR TABLE statement.

Contents


EXPORT

The existing FLUSH TABLES ... WITH READ LOCK () provides the protection needed to create a snapshot which can later be used with IMPORT. Note that FLUSH TABLES ... FOR EXPORT only works for transactional storage engines, and will report an error if used with MyISAM tables.

UNLOCK TABLES Command

The existing UNLOCK TABLES will be used. New syntax matching a possible new EXPORT command may be added by Import for Innodb.

IMPORT Command

Parser Changes

The import statement must be added to the parser specification file sql_yacc.cc. By moving the IMPORT token from role_or_label_keyword to role_or_ident_keyword we avoid shift/reduce conficts. Many thanks to Gleb Schepa for providing the necessary the parser changes.

SQLCOM_IMPORT enum value

Extend the list of command codes and related data structures. This will require an ABI-checker change as well as a number of result file updates. Modify:

  • include/my_sqlcommand.h
  • include/mysql/plugin_audit.h.pp (ABI checker)
  • sql/mysqld.cc (SHOW_VAR com_status_vars[])
  • sql/sql_parse.cc ( sql_command_flags[SQLCOM_IMPORT]|= CF_NEEDS_AUTOCOMMIT_OFF;)

New SQL Command Class

Add a new class which extends Sql_command to handle the import command:

class Sql_cmd_import_table : public Sql_cmd
{
  const std::pair<const char*, size_t> m_sdi_filename; // Need to be a list to support a list of SDI files

public:
  Sql_cmd_import_table(const char *sdi_filename_str /* make this a list*/,
                       size_t sdi_filename_len);

  virtual bool execute(THD *thd);
  virtual enum_sql_command sql_command_code() const;
};

An instance of this class will carry information from the parser (the list of .sdi file names) and the execute() override will perform the import.

SQL Layer Changes

Extending command switch in mysql_execute_command(THD *thd, bool first_level) so that it handles the new code:

  case SQLCOM_IMPORT:
    res= lex->m_sql_cmd->execute(thd);
    break;

Sql_cmd_import_table::execute() will perform non-dd related parts of import before calling functions inside the dd namespace.

Privilege Checking

Of the currently available ACL privileges available in MySQL, it seems reasonable to have the IMPORT command require:

FILE_ACL
in order to legally access the SDI or tablespace file on the filestystem
CREATE_ACL
to "create" the table being imported in the DD.

Note that ALTER TABLE ... DISCARD/IMPORT ... does:

 if (check_access(thd, ALTER_ACL, table_list->db,
                  &table_list->grant.privilege,
                  &table_list->grant.m_internal,
                  0, 0))
   return true;
 
 if (check_grant(thd, ALTER_ACL, table_list, false, UINT_MAX, false))
   return true;

While a normal create checks privileges in

/**
 CREATE TABLE query pre-check.

 @param thd			Thread handler
 @param tables		Global table list
 @param create_table	        Table which will be created

 @retval
   FALSE   OK
 @retval
   TRUE   Error
*/

bool create_table_precheck(THD *thd, TABLE_LIST *tables,
                          TABLE_LIST *create_table)

All privileges for all tables being imported are checked before proceeding with import of the first table.

File Access Through Mysys

When the filename refers to an SDI file, that file must be opened, read and closed by the Runtime layer by using existing mysys functions:

  • my_dir() - to list files
  • my_wildcmp() - to compare file names to a wildcard pattern
  • mysql_file_open() - to open .SDI file
  • mysql_file_close() - to close .SDI
  • mysql_file_fstat() - to obtain size of SDI string
  • mysql_file_read() - to read SDI string into memory

DD Layer Changes

WL#7069 already provides a function for deserializing a dd object from the SDI string in the dd namespace. Note that the deserialization assumes that external objects referenced in the SDI already exist in the DD. Otherwise the deserialization will fail. For non-tablespace based SEs the only such external reference is the schema. There are no references to other tables. The object's own Object_id is not included in the SDI, and is consequently not restored during derserialization. Which means that the first store of the deserialized object will generate a new Object_id for it.

In addition a dd function which performs the actual import into the DD and DD cache is needed. This function will need to:

  • Invoke dd::Dictionary_client::store() on the deserialized object. This will add the new object to the DD and cache.
  • Invoke store_sdi() to store the new SDI for the imported table.

Resolving external references

References to the parent object, and to other objects in the SDI will be resolved during deserialization. References to other DD object not included in the SDI are resolved by name, meaning that they will refer to whatever object has that name in the DD being imported into. Character set and collation ids are not resolved, they retain whatever numeric value they had when the table was serialized.

Schema SDIs

Currently the dd::Schema_impl objects are serialized separately and their SDI stored in a file alongside the directory for the schema (database). The dd::Table_impl objects refer to the schema by its Object_id. During serialization this id is converted to the schema name. During deserialization this name is mapped to a new id in the DD doing the deserialization. Since the only relevant information in the schema .sdi is the default collation id, there is no import of schema SDIs, as it is just as easy to create the schema with CREATE SCHEMA prior to import. This could be revisited later if the SCHEMATA table is extended with more information.

Handling lower case table names

The issue here is how to import an uppercase table name, created on a server supporting this, into a server with lower_case_table_names != false. In this case the user must:

  • Be sure to rename the data and index files so that they have names which can be used on the new server.
  • Edit the .sdi so that its content match the new names given to the data and index files.

Handling invalidated views

Currently views are marked as invalid whenever one of the tables which the view depends on are deleted. When a table is created all invalid views are checked to see if the creation of the new table makes possible to make them valid again. The same check must be performed when importing a table, by calling:

/**
 Method to update metadata of views referecing "table" being renamed
 and views referecing(if there any) new table name "new_db.new_table_name".

 @param      thd                     Thread handle.
 @param      table                   Update metadata of views referencing this
                                     table.
 @param      new_db                  New db name set in the rename operation.
 @param      new_table_name          New table name set in the rename
                                     operation.

 @retval     false                   Success.
 @retval     true                    Failure.
*/

bool update_referencing_views_metadata(THD *thd, const TABLE_LIST *table,
                                       const char *new_db,
                                       const char *new_table_name);

MDL

The object being imported does not exist in the DD (or cache), so no MDLs are needed for it while the .sdi file is being read and its content deserialized. The deserialization will acquire an IX lock on the schema name in order to resolve the name to its id. But since the user must place the data and index files in the schema directory before this lock is taken, it is theoretically possible that another thread has started dropping the schema at this point. The drop schema statement would see a warning about not being able to delete the schema directory. When attempting the import the deserialization would fail to resolve the now dropped schema name.

When the object has been successfully deserialized it is necessary to obtain MDL_EXCLUSIVE for its name before it can be inserted into the DD. MDL will be acquired for all tables together before proceeding with import of the first table, to reduce the chance of deadlock.

Check for Existing Table

Before storing a table object an attempt at acquiring the same name is done to check if an object with that name already exists. If it does import fails and the changes to the DD are rolled back.

Transactions

All operations which modify the DD and cache must also trigger a rollback in case of failure:

 // Store info in DD tables.
 if (thd->dd_client()->store(tab_obj.get())
 {
   trans_rollback_stmt(thd);
   // Full rollback in case we have THD::transaction_rollback_request.
   trans_rollback(thd);
   return true;
 }
 
 return trans_commit_stmt(thd) ||
        trans_commit(thd);

Error Handling

Operations which could fail during import:

  • Syntax errors - handled by the parser
  • Lack of required privileges
  • File IO errors
  • JSON parsing errors
  • Reference resolving errors
  • MDL errors
  • DD interaction errors

Dmitry L reviewed the HLS and LLD.