WL#3713: Online backup: Object list (Document only)
Affects: Server-6.0 — Status: Complete — Priority: Low
SUMMARY ------ Make it possible to select what parts of meta data should be backed up or replicated using a new language. DESCRIPTION -------------- All the instance configuration data is stored in the mysql database. At the time of writing, the data is replicated by replicating the (administrative) statements that are used to manipulate the data, even under row-based replication. For statement that change tables in the database directly, they will be replicated row-based under row-based replication (and maybe under mixed mode as well). For both replication and for backup it is, however, essential to be able to control what configuration data is replicated and/or restored from backup, meaning that we need to develop a vocabulary for what configuration data should be replicated, e.g., through options to mysqld. Typical use cases are: - Replicate everything except privileges - Replicate only privileges - Replicate users Special situations: - Replicating privileges but not users does not make sense Note that instance configuration data is only one part of meta-data stored in the server. Other parts include: - table definitions (obviously should be always logged/replicated), - triggers (can be considered part of table definition), - stored procedures and functions. A usefull side effect of this worklog should be clarification of the definitions and compiling a complete list of meta-data stored in the server. Thinking about the whole data stored in a database instance one could think about the following levels of it: 1. The actual data stored inside tables. 2. Per database meta-data: table definitions, table triggers, table partitions?, ... 3. Per instance meta-data: db definitions, users, privileges, binlog state, replication state,... 4. Per host meta-data: shared libraries, external files, ... Where the borderline between data and meta-data lies should be clarified. NOTE The same problem needs to be solved in the backup system. A user might want to choose which parts of meta-data to restore from a backup image. Also possible to have options for including/excluding parts of meta-data when creating backup.
SUMMARY ------ This is a document describing the different server objects that we may or may not want to backup. DESCRIPTION ----------- All the instance configuration data is stored in the mysql database. At the time of writing, the data is replicated by replicating the (administrative) statements that are used to manipulate the data, even under row-based replication. For statements that change tables in the database directly, they will be replicated row-based under row-based replication (and maybe under mixed mode as well). For both replication and for backup it is, however, essential to be able to control what configuration data is replicated and/or restored from backup, meaning that we need to develop a vocabulary for what configuration data should be replicated, e.g., through options to mysqld. Typical use cases are: - Replicate everything except privileges - Replicate only privileges - Replicate users Special situations: - Replicating privileges but not users does not make sense Note that instance configuration data is only one part of meta-data stored in the server. Other parts include: - table definitions (obviously should be always logged/replicated), - triggers (can be considered part of table definition), - stored procedures and functions. A useful side effect of this worklog should be clarification of the definitions and compiling a complete list of metadata stored in the server. Thinking about the whole data stored in a database instance one could think about the following levels of it: 1. The actual data stored inside tables. 2. Per database metadata: table definitions, table triggers, table partitions?, ... 3. Per instance metadata: db definitions, users, privileges, binlog state, replication state,... 4. Per host metadata: shared libraries, external files, ... Where the borderline between data and metadata lies should be clarified. NOTE The same problem needs to be solved in the backup system. A user might want to choose which parts of meta-data to restore from a backup image. Also possible to have options for including/excluding parts of meta-data when creating backup. High-Level Specification The rule -------- BACKUP will back up all changeable but persistent data that belongs to a MySQL Server installation. Data is not "changeable" if it can't or won't be changed. Data is not "persistent" if it won't survive end-of-session. Data does not "belong to" a MySQL Server installation if it belongs exclusively to another MySQL Server installation, belongs to another package such as the operating system or PHP, or is not ordinarily controllable by any MySQL DBA. That's the whole rule. There are no exceptions or extensions. There are explanations and excuses, though, and that's why this document doesn't end here. Not Persistent = Not Backed Up ------------------------------ Some objects do not survive end-of-session, because they are created solely for the session's temporary objectives (e.g. session variables), or because they are transient will-o'-the-wisps that may change value during any operation at all. In this category are: Session variables. Non-metadata information in information_schema tables. Including:  GLOBAL_STATUS  PROCESSLIST  SESSION_STATUS  SESSION_VARIABLES,  all FALCON_* tables (if we can get rid of FALCON_TABLES soon),  all tables made for 'community',  and partial contents of other tables containing columns that are strictly for 'monitoring' purposes.  Temporary tables. NB: In standard SQL the descriptor of a temporary table, i.e. the metadata, *is* persistent. Someday MySQL might do WL#934 Temporary Tables In The Standard Way. At that point we must remember to backup temporary-table metadata too.  Prepared statements.  Caches. The reasons that BACKUP ignores "not persistent" data are; (a) the objective is solely to allow restoring data, not sessions (b) we don't expect that anybody would want to have BACKUP do that. Not Changeable (also known as 'static') = Not Backed Up ------------------------------------------------------- BACKUP will not back up the MySQL Server itself, or accompanying files or directories that are installed as part of the server installation, and ordinarily stay the same during the 'life' of the server. We define the server 'life' as: until the next upgrade or re-installation. In this category are:  Static files that come as part of the MySQL installation. So (Windows) mysqld-nt.exe is out, as well as (Linux) mysqld and anything on /usr/local/mysql/bin (also known as 'basedir') and /usr/local/mysql/libexec. NB: this means that users should do a final BACKUP before upgrading or re-installing MySQL. It is troublesome to restore "over" a re-install.  User programs.  Operating system shared libraries. For example windows.dll (I wonder if I remember the name right.) The reasons that BACKUP ignores "not changeable" data are; (a) it's easy to restore from elsewhere (b) since it doesn't change, only one backup is needed when it's put in. Not belonging = Not backed up ----------------------------- There are a few items that are used by MySQL, but belong exclusively to another package -- you can determine such things by asking yourself "if I was backing up (package X), would I include this object in package X or in MySQL?". In this category are:  Scripts for starting MySQL server when booting operating system  Environment initializations for $MYSQL_HOST or similar variables The reasons that MySQL ignores "not belonging" data are: (a) often such data fits in the "not changeable' category anyway (b) the UN charter says one country shouldn't invade another (c) two packages could back up the same objects, redundantly. Belonging but not exclusively belonging --------------------------------------- Some objects belong to a MySQL intallation, but might be shared by another MySQL installation, or might be "used exclusively for MySQL but maintained outside the DBMS". In this category are:  Option files, also known as configuration files. Typically the option files are named 'my.cnf' or 'my.ini'. Examples: (on Linux) /etc/my.cnf, /etc/[datadir]/my.cnf /etc/mysql-service-agent.ini /etc/init.d (on Windows) C:\Program Files\MySQL\MySQL Server 5.0\my.ini 6.2. The my.cnf/my.ini File Typically it's on $MYSQL_HOME/my.cnf /etc/my.cnf /etc/mysql-service-agent.ini /var/lib/mysql-cluster/config.ini /etc/init.d Saving the option files helps to guarantee that the server will 'start up the same' after RESTORE, unless the server startup uses options on the command line.  Plugins. The plugins are all the files that can be read in the directory that you see when you say SHOW VARIABLES LIKE 'plugin_dir'; for example /usr/local/mysql/lib/mysql.  Files in the basedir 'shared' directory. These files only change if users customize, for example by adding a new collation or translating an error message. Examples: /usr/local/mysql/share/mysql/charsets/*.xml /usr/local/mysql/share/mysql/charsets/*.conf /usr/local/mysql/share/mysql/english/errmsg.txt If an object is in this category, it is subject to BACKUP -- but RESTORE won't restore it! It shouldn't, because: (a) restoring will cause a change for every server instance, which might not be what's desired (b) files outside the basedir might be specially protected. Therefore the only thing that RESTORE will do is *read* the current files, if any. If a discrepancy is detected in either the date or the contents, then there will be a warning and the user doing the RESTORE will have to go to more trouble. Reasons for the belonging-but-not-exclusively-belonging exception are: (a) BACKUP and RESTORE privileges are for the instance of MySQL Server that the BACKUP is running in, no more. (b) The server might not have the necessary operating-system privilege to write to the object. (c) Oracle says that for a "whole database backup" they back up configuration information (server parameter file as well as archived redo logs), so there's a precedent for this. Admittedly, Oracle has an easier time, because it does configuration adjustments under DBMS control. But that's our own fault. (d) Lars said to Peter: "List all objects of the server that should be backed up. Consider that we want to take a so called "full" backup of the server and that the restore operations should bring it back to exactly the same." Well, we can't "bring it back exactly the same' if we don't RESTORE everything that it depends on. Global variables ---------------- There are currently 301 "global variables". Not all of them are changeable, for example 'version_compile_os' is always going to be the same for the life of the server. All of them are persistent. It will be simplest to copy them all during BACKUP, and restore them all during RESTORE. MySQL will lock all global variables during synchronization and save them, so the variables will be a consistent set. MySQL will not lock global variables while the backup is going on (that is, other sessions may change them during the copying). Since some changes do not take immediate effect, the settings that are 'restored to' are not necessarily the settings that were in effect at the time that the BACKUP began. Since some variables are read-only, RESTORE will fail to set them to new values. MySQL could silently ignore such cases, because it copies the option files that the server starts with. But Lars's decision is: RESTORE returns an error if any global variable value differs from its value at BACKUP time, except if: * global variable no longer exists or * there is a new global variable. This happens only with BACKUP ALL. There are also persistent global 'variables' visible with SHOW GLOBAL STATUS, for example Ssl_version, Ssl_cipher. MySQL will regard this as a bug, and will not back up global-status variables. But that's okay, the critical ones are not changeable anyway. In this category are:  Global variables except global-status variables Reasons for backing up global variables are: (a) Lars's instruction (something about backing up the whole thing so we'd be able to recover to the exact situation). Changeable and Persistent and Exclusively Belonging (datadir) --------------------------------------------------- The objects that fit all the backup criteria perfectly, including the objects that we usually call a 'database', will almost always be the largest and most important part of the backup. These objects usually exist in the datadir directory, for example /usr/local/mysql/var, or some subdirectory thereof. We'll call these the 'datadir' objects, and they'll be the subject of the next few sections. Datadir objects outside databases --------------------------------- These are sometimes called 'global' objects. I do not believe "global" is a good word, it makes me think of "global variables". Maybe "server"? "environment"? configuration"? I don't like "system" because that would include clients, tools, connectors. I don't like "server instance" though I guess, yes, technically it's not the server but the instance of the server. The standard also has "catalog" (container of schemas) and "cluster" (container of catalogs). For now I'll prefer the term 'server' objects. Objects in this category are:  *.info files for replication For example: [20-a] master.info [20-b] relay-log.info  Tablespaces. Tablespaces are not currently in databases. This causes a problem: if a tablespace has tables from database#1 and database#2, and BACKUP DATABASE database#1 is the statement, then we're being asked to back up 'half an object'. I have suggested that such tablespaces should be inside databases, watch email thread "Tablespace namespace" starting here: https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=21419  Logs. Some storage engines have their own log files. MySQL will not back them up, and will not restore them. The plan is that BACKUP will take place 'as of' a moment of validity. Uncommitted data at the validity-moment time is not restorable. Additional note: Driver developer is responsible for logging everything related to their functionality. For example, if an engine can support flashback, then the driver must back up everything in the logs that is necessary for it." The consistent snapshot driver will not support this in any case.  Binary log. Reasons for backing up *info files are: (a) MySQL supports replication. Datadir objects in mysql database --------------------------------- The mysql database is special. If a user says BACKUP DATABASE mysql; then MySQL must back up all data in all tables in the mysql database. But if a user says BACKUP DATABASE mysql,database1; then there is redundancy, because mysql has 'metadata' related to database1. The decision is that there will never be a need for a backup of mysql, because the contents are restorable in other ways provided the user does not change mysql tables directly. (The following paragraph is obsolete and will be deleted unless a change of opinion takes place.) This should be okay because, provided that the same 'validity moment' applies for all databases that appear in the same BACKUP DATABASE statement, the metadata in mysql and the metadata for database1 is the same. On the other hand, suppose that one backs up the mysql database but not database1, or vice versa. Then the metadata restoration will be unsynchronized. Therefore 'BACKUP DATABASE mysql...' should only be done if one is backing up all databases at the same time. "BACKUP DATABASE mysql;" is necessary for all the objects that are not in databases. I'll list them here, but I think there's nothing magic about them. Just back up the mysql database to save them. The items in this category are:  Users and global-level privileges Everything that's in the table mysql.user.  Plugins Everything that's in the table mysql.plugin. This does not include code. The code is in a different category, 'Belonging but not exclusively belonging'.  Connections for federated tables Everything that's in the table mysql.servers.  Help [27-a] Everything that's in the table help_category. [27-b] Everything that's in the table help_keyword. [27-c] Everything that's in the table help_relation. [27-d] Everything that's in the table help_topic.  Time zone [28-a] Everything that's in the table time_zone. [28-b] Everything that's in the table time_zone_leap_second. [28-c] Everything that's in the table time_zone_name. [28-d] Everything that's in the table time_zone_transition. [28-e] Everything that's in the table time_zone_transition_type.  Logs [29-a] general_log [29-b] slow_log  UDFs  LOGFILE GROUPS  mysql.host  mysql.db  mysql.host  mysql.ndb_binlog_index  mysql.online_backup Reasons for backing up mysql objects: (a) MySQL might support replication. (b) It's changeable and persistent data. datadir objects inside databases -------------------------------- Items in this category are: Database. "Databases" should be in the "per-database" category. In the standard a "schema" is not a "schema object" so a database is not a "database object", but BACKUP DATABASE will back up the database metadata along with the database objects, so the database itself is per-database. Persistent base table (metadata + contents)  View.  Stored Procedure.  Function.  Trigger.  Event.  Privilege (on databases, base tables, views, routines, ...)  Constraint (i.e. foreign-key or primary-key now, check constraints later) These objects may refer to other objects in other databases. That will cause a difficulty if we try to re-create the objects using CREATE statements. Reasons for backing up database objects: (a) It's changeable and persistent data. No exceptions for storage engines --------------------------------- In MySQL 5.1 the BACKUP TABLE statement fails with ARCHIVE, BLACKHOLE, FEDERATED, MEMORY, MERGE, or partitioned tables. (One gets the note "The storage engine for the table doesn't support backup"). But mysqldump won't complain. So what should BACKUP DATABASE do? (1) Back up metadata and all persistent contents. (2) Back up nothing, ignore the table. (3) Back up nothing, return an error if the database contains an unbackupable table. (4) Back up metadata only. The decision is (1). This should not be a problem for BLACKHOLE (it has no persistent data). Maybe there's room for argument for MERGE or FEDERATED (they can be looked at as mere descriptors). But MEMORY data is persistent, so backup is compulsory. Relative file names ------------------- For file names: what if a path is relative, that is, the user entered 't' instead of '/usr/local/mysql/var/t'? Should MySQL store the long form, which is known to the server at BACKUP DATABASE time, or should MySQL store the short form, which might make some RESTOREs easier? Answer: MySQL stores the short form. How would one RESTORE @@datadir? It shouldn't be possible for one statement to change a read-only global variable. But some statements use relative file names. So BACKUP DATABASE should change CREATE PROCEDURE p () BACKUP DATABASE d TO 'x'; to CREATE PROCEDURE p () BACKUP DATABASE d TO 'path+x'; But Lars's decision is: it won't. We will restore statements exactly as they are. This might be a feature. Dependencies #1 (Peter's comments on "strategy") ------------------------------------------------ This section is obsolete. We have decided to go about it without using ALTER. If that works, we'll remove this section. See next section "Dependencies #2" for reality. Some objects depend on other objects. That doesn't always matter -- for example, when we create a procedure, we don't check for the existence of all objects that the procedure operates on. But sometimes we do check. Example#1: CREATE VIEW v AS SELECT * FROM db.t; /* fails if there is no table t */ Example#2: CREATE TABLE t(a int, FOREIGN KEY (a) REFERENCES db.t(b)) ENGINE=INNODB; /* fails if there is no table db.t */ The full list of dependencies will have to come later, after the full list of objects is clear. I want though to comment on strategy. 1. Creations can be done with multiple CREATE and ALTER statements. 2. Some objects are unlikely to have dependencies. Therefore the recreation should be a series of ordered steps: All non-datadir objects All datadir-but-no-database objects, except users All CREATE USER statements All CREATE DATABASE statements All CREATE TABLESPACE statements All CREATE TABLE statements (with no partitions or constraints) ALTER TABLE statements to add partitions ALTER TABLE statements to add constraints All CREATE PROCEDURE and CREATE FUNCTION statements All CREATE VIEW statements (with only dummy SELECT clauses) ALTER VIEW statements to add real SELECT clauses All CREATE PROCEDURE and CREATE FUNCTION statements -- alternative position All CREATE EVENT statements All CREATE TRIGGER statements Doing it this way you avoid, for example, a view that references a table that doesn't exist -- impossible because all tables are created first. Doing it this way you avoid, for example, recursion difficulty caused by CREATE TABLE t1 (... REFERENCES t2) CREATE TABLE t2 (... REFERENCES t1) since they would become CREATE TABLE t1 CREATE TABLE t2 ALTER TABLE t1 REFERENCES t2 ALTER TABLE t2 REFERENCES t1 I realize you want to take care of that particular difficulty with @@foreign_key_checks=0, but I only illustrate a principle. Doing it this way you avoid, for example, recursion difficulty caused by CREATE TABLE t1 (...) CREATE VIEW v1 AS SELECT s1 FROM t1 CREATE VIEW v2 AS SELECT s1 FROM v1 since they would become CREATE TABLE t1 CREATE VIEW v1 AS SELECT 1 AS s1 CREATE VIEW v2 AS SELECT 1 AS s1 ALTER VIEW v1 AS SELECT s1 FROM t1 ALTER VIEW v2 AS SELECT s1 FROM t2 Notice that it makes no difference which view I make first. Dependencies #2 (Rafal's reality) --------------------------------- This is what we actually decided is the order, without need for ALTER (dependencies will be discovered first and CREATEs will happen in a dependency-free order). Ordering is approximate, since (for example) views may reference routines. There is also a January 12-14 email thread "List of database object dependencies" https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=505 and a slightly different list in WL#4211. tablespaces databases tables i.e. base tables stored routines i.e. procedures or functions views triggers events Rafal gave some reasons for the order in the email exchange "Re: Handling object dependencies in backup." https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=870 Future Objects -------------- There should be a mention of these objects which might be in some future MySQL version. They present no challenge for this task, but the implementor should spend a few seconds on each one deciding whether it's a distant-future problem.  Catalog WL#2073 or WL#942  Character Set ("dynamic") WL#744  Collation ("dynamic") WL#744  Constraint (CHECK) WL#929  Domain WL#933  Module WL#3416  Profile (as in Oracle) WL#3824  Role WL#988  Sequence WL#827  Privilege (new ones) WL#2227  Registry (of plugins) WL#4102  Repository (of backups) WL#3329  Synonym WL#1048  Trigger (DDL) WL#2418  Type (distinct) WL#933  Type (structured) WL#3862  Encryption Key WL#3556 Terminology ----------- Definition of "object": The standard-SQL definition is: "object (as in 'x object'): Any thing. An X object is a component of, or is otherwise associated with, some X, and cannot exist independently of that X. For example, an SQL object is an object that exists only in the context of SQL; an SQL-schema object is an object that exists in some SQL-schema." We only care about "server objects". We only care about "objects" that are not "session objects". We only care about "persistent" objects, maybe, that is, I see that there may be a desire to exclude MEMORY and "temporary". Definition of "metadata": I'm happy enough with http://en.wikipedia.org/wiki/Metadata but emphasize it's "data about data" i.e. "data about SQL-data" e.g. "data about data in the database" -- PERSISTENT data. Definition of "ephemeral": Data that I earlier referred to as will-o'-the-wisp. For example there are these information_schema views: GLOBAL_STATUS GLOBAL_VARIABLES PROCESSLIST PROFILING (community only) SESSION_STATUS SESSION_VARIABLES It was a mistake to put these tables in information_schema, they should be thrown away or shifted to performance_schema, and it should be emphasized that performance_schema data is ephemeral -- won't be backed up, won't be replicated, etc. Therefore this WL#4116 statement is wrong: "It has been decided that the information_schema views shall not be included in the first release and shall be added in a later release." Some "information_schema views" that aren't metadata don't belong in information_schema, so they should never be backed up in any release. References ---------- Private emails (Lars + Peter) "Re: Fwd: Backup of all objects" on 2007-11-15. dev-backup email thread WL#3713 Online backup: Selective rpl/backup of metadata https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=412 dev-backup thread "Backup: metadata items list" https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=184 Oracle11g list of schema objects: http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/schema.htm#CNCPT111
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.