WL#3713: Online backup: Object list (Document only)

Affects: Server-6.0   —   Status: Complete

SUMMARY
------
Make it possible to select what parts of meta data should be backed up 
or replicated using a new language.

DESCRIPTION
--------------

All the instance configuration data is stored in the mysql database. At the time
of writing, the data is replicated by replicating the (administrative)
statements that are used to manipulate the data, even under row-based
replication. For statement that change tables in the database directly, they
will be replicated row-based under row-based replication (and maybe under mixed
mode as well).

For both replication and for backup it is, however, essential to be able to
control what configuration data is replicated and/or restored from backup,
meaning that we need to develop a vocabulary for what configuration data should
be replicated, e.g., through options to mysqld.

Typical use cases are:

- Replicate everything except privileges
- Replicate only privileges
- Replicate users

Special situations:
- Replicating privileges but not users does not make sense

Note that instance configuration data is only one part of meta-data stored in
the server. Other parts include:

- table definitions (obviously should be always logged/replicated),
- triggers (can be considered part of table definition),
- stored procedures and functions.

A usefull side effect of this worklog should be clarification of the definitions
and compiling a complete list of meta-data stored in the server. 

Thinking about the whole data stored in a database instance one could think
about the following levels of it:

1. The actual data stored inside tables.
2. Per database meta-data: table definitions, table triggers, table partitions?, ...
3. Per instance meta-data: db definitions, users, privileges, binlog state,
replication state,...
4. Per host meta-data: shared libraries, external files, ...

Where the borderline between data and meta-data lies should be clarified.

NOTE

The same problem needs to be solved in the backup system. A user might want to
choose which parts of meta-data to restore from a backup image. Also possible to
have options for including/excluding parts of meta-data when creating backup. 
SUMMARY
------
This is a document describing the different server objects
that we may or may not want to backup.


DESCRIPTION
-----------

All the instance configuration data is stored in the mysql database. At the time
of writing, the data is replicated by replicating the (administrative)
statements that are used to manipulate the data, even under row-based
replication. For statements that change tables in the database directly, they
will be replicated row-based under row-based replication (and maybe under mixed
mode as well).

For both replication and for backup it is, however, essential to be able to
control what configuration data is replicated and/or restored from backup,
meaning that we need to develop a vocabulary for what configuration data should
be replicated, e.g., through options to mysqld.

Typical use cases are:

- Replicate everything except privileges
- Replicate only privileges
- Replicate users

Special situations:
- Replicating privileges but not users does not make sense

Note that instance configuration data is only one part of meta-data stored in
the server. Other parts include:

- table definitions (obviously should be always logged/replicated),
- triggers (can be considered part of table definition),
- stored procedures and functions.

A useful side effect of this worklog should be clarification of the definitions
and compiling a complete list of metadata stored in the server. 

Thinking about the whole data stored in a database instance one could think
about the following levels of it:

1. The actual data stored inside tables.
2. Per database metadata: table definitions, table triggers, table partitions?, ...
3. Per instance metadata: db definitions, users, privileges, binlog state,
replication state,...
4. Per host metadata: shared libraries, external files, ...

Where the borderline between data and metadata lies should be clarified.

NOTE

The same problem needs to be solved in the backup system. A user might want to
choose which parts of meta-data to restore from a backup image. Also possible to
have options for including/excluding parts of meta-data when creating backup. 




High-Level Specification

The rule
--------

BACKUP will back up all changeable but persistent data
that belongs to a MySQL Server installation.

Data is not "changeable" if it can't or won't be changed.

Data is not "persistent" if it won't survive end-of-session.

Data does not "belong to" a MySQL Server installation if it
belongs exclusively to another MySQL Server installation,
belongs to another package such as the operating system or
PHP, or is not ordinarily controllable by any MySQL DBA.

That's the whole rule. There are no exceptions or extensions.
There are explanations and excuses, though, and that's why
this document doesn't end here.

Not Persistent = Not Backed Up
------------------------------

Some objects do not survive end-of-session, because they
are created solely for the session's temporary objectives
(e.g. session variables), or because they are transient
will-o'-the-wisps that may change value during any
operation at all. In this category are:

Session variables.

Non-metadata information in information_schema tables.
Including:
[1] GLOBAL_STATUS
[2] PROCESSLIST
[3] SESSION_STATUS
[4] SESSION_VARIABLES,
[5] all FALCON_* tables (if we can get rid of FALCON_TABLES soon),
[6] all tables made for 'community',
[7] and partial contents of other tables containing columns
that are strictly for 'monitoring' purposes.

[8] Temporary tables.
NB: In standard SQL the descriptor of a temporary table,
i.e. the metadata, *is* persistent. Someday MySQL might do
WL#934 Temporary Tables In The Standard Way. At that point
we must remember to backup temporary-table metadata too.

[9] Prepared statements.

[10] Caches.

The reasons that BACKUP ignores "not persistent" data are;
(a) the objective is solely to allow restoring data, not sessions
(b) we don't expect that anybody would want to have BACKUP do that.

Not Changeable (also known as 'static') = Not Backed Up
-------------------------------------------------------

BACKUP will not back up the MySQL Server itself, or accompanying files
or directories that are installed as part of the server installation,
and ordinarily stay the same during the 'life' of the server.
We define the server 'life' as: until the next upgrade or
re-installation. In this category are:

[11] Static files that come as part of the MySQL installation.
So (Windows) mysqld-nt.exe is out, as well as (Linux) mysqld and anything
on /usr/local/mysql/bin (also known as 'basedir') and /usr/local/mysql/libexec.
NB: this means that users should do a final BACKUP before upgrading or
re-installing MySQL. It is troublesome to restore "over" a re-install.

[12] User programs.

[13] Operating system shared libraries.
For example windows.dll (I wonder if I remember the name right.)

The reasons that BACKUP ignores "not changeable" data are;
(a) it's easy to restore from elsewhere
(b) since it doesn't change, only one backup is needed when it's put in.

Not belonging = Not backed up
-----------------------------

There are a few items that are used by MySQL, but belong
exclusively to another package -- you can determine such
things by asking yourself "if I was backing up (package X),
would I include this object in package X or in MySQL?".
In this category are:

[14] Scripts for starting MySQL server when booting operating system

[15] Environment initializations for $MYSQL_HOST or similar variables

The reasons that MySQL ignores "not belonging" data are:
(a) often such data fits in the "not changeable' category anyway
(b) the UN charter says one country shouldn't invade another
(c) two packages could back up the same objects, redundantly.

Belonging but not exclusively belonging
---------------------------------------

Some objects belong to a MySQL intallation, but might be
shared by another MySQL installation, or might be "used
exclusively for MySQL but maintained outside the DBMS".
In this category are:

[16] Option files, also known as configuration files.
Typically the option files are named 'my.cnf' or 'my.ini'.
Examples:
  (on Linux) /etc/my.cnf, /etc/[datadir]/my.cnf
             /etc/mysql-service-agent.ini
             /etc/init.d
  (on Windows) C:\Program Files\MySQL\MySQL Server 5.0\my.ini
  6.2. The my.cnf/my.ini File
  Typically it's on 
  $MYSQL_HOME/my.cnf
  /etc/my.cnf
  /etc/mysql-service-agent.ini
  /var/lib/mysql-cluster/config.ini
  /etc/init.d
Saving the option files helps to guarantee that the server
will 'start up the same' after RESTORE, unless the server
startup uses options on the command line.

[17] Plugins.
The plugins are all the files that can be read in the
directory that you see when you say
SHOW VARIABLES LIKE 'plugin_dir';
for example /usr/local/mysql/lib/mysql.

[18] Files in the basedir 'shared' directory.
These files only change if users customize, for example
by adding a new collation or translating an error message.
Examples:
/usr/local/mysql/share/mysql/charsets/*.xml
/usr/local/mysql/share/mysql/charsets/*.conf
/usr/local/mysql/share/mysql/english/errmsg.txt

If an object is in this category, it is subject to BACKUP
-- but RESTORE won't restore it! It shouldn't, because:
(a) restoring will cause a change for every server instance,
    which might not be what's desired
(b) files outside the basedir might be specially protected.
Therefore the only thing that RESTORE will do is *read* the
current files, if any. If a discrepancy is detected in either
the date or the contents, then there will be a warning and
the user doing the RESTORE will have to go to more trouble.

Reasons for the belonging-but-not-exclusively-belonging exception are:
(a) BACKUP and RESTORE privileges are for the instance of MySQL Server
    that the BACKUP is running in, no more.
(b) The server might not have the necessary operating-system
    privilege to write to the object.
(c) Oracle says that for a "whole database backup" they back up configuration
    information (server parameter file as well as archived redo logs), so
    there's a precedent for this.
    Admittedly, Oracle has an easier time, because it does
    configuration adjustments under DBMS control.
    But that's our own fault.
(d) Lars said to Peter:
    "List all objects of the server that should be backed up.
    Consider that we want to take a so called "full" backup of
    the server and that the restore operations should bring it
    back to exactly the same."
    Well, we can't "bring it back exactly the same' if we don't
    RESTORE everything that it depends on.

Global variables
----------------

There are currently 301 "global variables". Not all of them
are changeable, for example 'version_compile_os' is always
going to be the same for the life of the server. All of them
are persistent. It will be simplest to copy them all during
BACKUP, and restore them all during RESTORE.

MySQL will lock all global variables during synchronization
and save them, so the variables will be a consistent set.
MySQL will not lock global variables while the backup is going
on (that is, other sessions may change them during the copying).
Since some changes do not take immediate effect, the settings that
are 'restored to' are not necessarily the settings that were
in effect at the time that the BACKUP began.

Since some variables are read-only, RESTORE will fail to set them to
new values. MySQL could silently ignore such cases,
because it copies the option files that the server starts with.
But Lars's decision is:
RESTORE returns an error if any global variable value differs
from its value at BACKUP time, except if:
* global variable no longer exists
or
* there is a new global variable.
This happens only with BACKUP ALL.

There are also persistent global 'variables' visible with
SHOW GLOBAL STATUS, for example Ssl_version, Ssl_cipher.
MySQL will regard this as a bug, and will not back up
global-status variables. But that's okay, the critical ones
are not changeable anyway.

In this category are:
[19] Global variables except global-status variables

Reasons for backing up global variables are:
(a) Lars's instruction (something about backing up the whole
    thing so we'd be able to recover to the exact situation).

Changeable and Persistent and Exclusively Belonging (datadir)
---------------------------------------------------

The objects that fit all the backup criteria perfectly,
including the objects that we usually call a 'database',
will almost always be the largest and most important part of
the backup. These objects usually exist in the datadir
directory, for example /usr/local/mysql/var, or some
subdirectory thereof.

We'll call these the 'datadir' objects, and they'll be
the subject of the next few sections.

Datadir objects outside databases
---------------------------------

These are sometimes called 'global' objects.
I do not believe "global" is a good word, it makes me think of
"global variables". Maybe "server"? "environment"? configuration"?
I don't like "system" because that would include clients, tools,
connectors.
I don't like "server instance" though I guess, yes, technically
it's not the server but the instance of the server.
The standard also has "catalog" (container of schemas) and
"cluster" (container of catalogs).
For now I'll prefer the term 'server' objects.

Objects in this category are:

[20] *.info files for replication
For example:
[20-a] master.info
[20-b] relay-log.info

[21] Tablespaces.
Tablespaces are not currently in databases.
This causes a problem: if a tablespace has
tables from database#1 and database#2, and
BACKUP DATABASE database#1
is the statement, then we're being asked to
back up 'half an object'.
I have suggested that such tablespaces should be inside
databases, watch email thread "Tablespace namespace" starting here:
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=21419

[22] Logs.
Some storage engines have their own log
files. MySQL will not back them up, and
will not restore them. The plan is that
BACKUP will take place 'as of' a moment
of validity. Uncommitted data at the
validity-moment time is not restorable.
Additional note: Driver developer is responsible for
logging everything related to their functionality. For example,
if an engine can support flashback, then the driver must back up
everything in the logs that is necessary for it." The consistent
snapshot driver will not support this in any case.

[23] Binary log.

Reasons for backing up *info files are:
(a) MySQL supports replication.

Datadir objects in mysql database
---------------------------------

The mysql database is special. If a user says
BACKUP DATABASE mysql;
then MySQL must back up all data in all tables
in the mysql database. But if a user says
BACKUP DATABASE mysql,database1;
then there is redundancy, because mysql has
'metadata' related to database1.

The decision is that there will never be
a need for a backup of mysql, because the
contents are restorable in other ways provided
the user does not change mysql tables directly.

(The following paragraph is obsolete and will be
deleted unless a change of opinion takes place.)
This should be okay because, provided that the
same 'validity moment' applies for all databases
that appear in the same BACKUP DATABASE statement,
the metadata in mysql and the metadata for database1
is the same.
On the other hand, suppose that one backs up the
mysql database but not database1, or vice versa.
Then the metadata restoration will be unsynchronized.
Therefore 'BACKUP DATABASE mysql...' should only be
done if one is backing up all databases at the same
time.
"BACKUP DATABASE mysql;" is necessary for all the
objects that are not in databases. I'll list them
here, but I think there's nothing magic about them.
Just back up the mysql database to save them.

The items in this category are:

[24] Users and global-level privileges
Everything that's in the table mysql.user.

[25] Plugins
Everything that's in the table mysql.plugin.
This does not include code. The code is in a different
category, 'Belonging but not exclusively belonging'.

[26] Connections for federated tables
Everything that's in the table mysql.servers.

[27] Help
[27-a] Everything that's in the table help_category.
[27-b] Everything that's in the table help_keyword.
[27-c] Everything that's in the table help_relation.
[27-d] Everything that's in the table help_topic.

[28] Time zone
[28-a] Everything that's in the table time_zone.
[28-b] Everything that's in the table time_zone_leap_second.
[28-c] Everything that's in the table time_zone_name.
[28-d] Everything that's in the table time_zone_transition.
[28-e] Everything that's in the table time_zone_transition_type.

[29] Logs
[29-a] general_log
[29-b] slow_log

[30] UDFs
[31] LOGFILE GROUPS

[32] mysql.host
[33] mysql.db
[34] mysql.host
[35] mysql.ndb_binlog_index
[36] mysql.online_backup

Reasons for backing up mysql objects:
(a) MySQL might support replication.
(b) It's changeable and persistent data.

datadir objects inside databases
--------------------------------

Items in this category are:

Database.
"Databases" should be in the "per-database" category.
In the standard a "schema" is not a "schema object" so a
database is not a "database object", but BACKUP DATABASE
will back up the database metadata along with the database
objects, so the database itself is per-database.

Persistent base table (metadata + contents)
[37] View.
[38] Stored Procedure.
[39] Function.
[40] Trigger.
[41] Event.
[42] Privilege (on databases, base tables, views, routines, ...)
[43] Constraint (i.e. foreign-key or primary-key now, check constraints later)

These objects may refer to other objects in other databases.
That will cause a difficulty if we try to re-create the
objects using CREATE statements.

Reasons for backing up database objects:
(a) It's changeable and persistent data.

No exceptions for storage engines
---------------------------------

In MySQL 5.1 the BACKUP TABLE statement fails with
ARCHIVE, BLACKHOLE, FEDERATED, MEMORY, MERGE, or
partitioned tables. (One gets the note "The storage
engine for the table doesn't support backup").
But mysqldump won't complain.
So what should BACKUP DATABASE do?

(1) Back up metadata and all persistent contents.

(2) Back up nothing, ignore the table.

(3) Back up nothing, return an error if the database
contains an unbackupable table.

(4) Back up metadata only.

The decision is (1). This should not be a
problem for BLACKHOLE (it has no persistent data).
Maybe there's room for argument for MERGE or FEDERATED
(they can be looked at as mere descriptors). But
MEMORY data is persistent, so backup is compulsory.

Relative file names
-------------------

For file names: what if a path is relative, that is, the user entered
't' instead of '/usr/local/mysql/var/t'? Should MySQL store the long
form, which is known to the server at BACKUP DATABASE time, or should
MySQL store the short form, which might make some RESTOREs easier?

Answer: MySQL stores the short form.

How would one RESTORE @@datadir? It shouldn't be
possible for one statement to change a read-only
global variable. But some statements use relative
file names. So BACKUP DATABASE should change
CREATE PROCEDURE p () BACKUP DATABASE d TO 'x';
to
CREATE PROCEDURE p () BACKUP DATABASE d TO 'path+x';
But Lars's decision is: it won't. We will restore
statements exactly as they are. This might be a feature.

Dependencies #1 (Peter's comments on "strategy")
------------------------------------------------

This section is obsolete. We have decided to go about it
without using ALTER. If that works, we'll remove this
section. See next section "Dependencies #2" for reality.

Some objects depend on other objects. That doesn't
always matter -- for example, when we create a
procedure, we don't check for the existence of all
objects that the procedure operates on. But sometimes
we do check.

Example#1: CREATE VIEW v AS SELECT * FROM db.t;
           /* fails if there is no table t */
Example#2: CREATE TABLE t(a int,
           FOREIGN KEY (a) REFERENCES db.t(b))
           ENGINE=INNODB;
           /* fails if there is no table db.t */

The full list of dependencies will have to come later,
after the full list of objects is clear. I want though
to comment on strategy.

1. Creations can be done with multiple CREATE and ALTER statements.
2. Some objects are unlikely to have dependencies. 

Therefore the recreation should be a series of ordered steps:
All non-datadir objects
All datadir-but-no-database objects, except users
All CREATE USER statements
All CREATE DATABASE statements
All CREATE TABLESPACE statements
All CREATE TABLE statements (with no partitions or constraints)
ALTER TABLE statements to add partitions
ALTER TABLE statements to add constraints
All CREATE PROCEDURE and CREATE FUNCTION statements
All CREATE VIEW statements (with only dummy SELECT clauses)
ALTER VIEW statements to add real SELECT clauses
All CREATE PROCEDURE and CREATE FUNCTION statements -- alternative position
All CREATE EVENT statements
All CREATE TRIGGER statements

Doing it this way you avoid, for example, a view that
references a table that doesn't exist -- impossible because
all tables are created first.

Doing it this way you avoid, for example, recursion difficulty
caused by
CREATE TABLE t1 (... REFERENCES t2)
CREATE TABLE t2 (... REFERENCES t1)
since they would become
CREATE TABLE t1
CREATE TABLE t2
ALTER TABLE t1 REFERENCES t2
ALTER TABLE t2 REFERENCES t1
I realize you want to take care of that particular difficulty
with @@foreign_key_checks=0, but I only illustrate a principle.

Doing it this way you avoid, for example, recursion difficulty
caused by
CREATE TABLE t1 (...)
CREATE VIEW v1 AS SELECT s1 FROM t1
CREATE VIEW v2 AS SELECT s1 FROM v1
since they would become
CREATE TABLE t1
CREATE VIEW v1 AS SELECT 1 AS s1
CREATE VIEW v2 AS SELECT 1 AS s1
ALTER VIEW v1 AS SELECT s1 FROM t1
ALTER VIEW v2 AS SELECT s1 FROM t2
Notice that it makes no difference which view I make first.

Dependencies #2 (Rafal's reality)
---------------------------------

This is what we actually decided is the order, without need
for ALTER (dependencies will be discovered first and CREATEs
will happen in a dependency-free order). Ordering is
approximate, since (for example) views may reference routines.
There is also a January 12-14 email thread
"List of database object dependencies"
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=505
and a slightly different list in WL#4211.

tablespaces
databases
tables i.e. base tables
stored routines i.e. procedures or functions
views
triggers
events

Rafal gave some reasons for the order in the email exchange
"Re: Handling object dependencies in backup."
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=870

Future Objects
--------------

There should be a mention of these objects which might be
in some future MySQL version. They present no challenge
for this task, but the implementor should spend a few seconds
on each one deciding whether it's a distant-future problem.
[44] Catalog                   WL#2073 or WL#942
[45] Character Set ("dynamic") WL#744
[46] Collation ("dynamic")     WL#744
[47] Constraint (CHECK)        WL#929
[48] Domain                    WL#933
[49] Module                    WL#3416
[50] Profile (as in Oracle)    WL#3824
[51] Role                      WL#988
[52] Sequence                  WL#827
[53] Privilege (new ones)      WL#2227
[54] Registry (of plugins)     WL#4102
[55] Repository (of backups)   WL#3329
[56] Synonym                   WL#1048 
[57] Trigger (DDL)             WL#2418
[58] Type (distinct)           WL#933
[59] Type (structured)         WL#3862
[60] Encryption Key            WL#3556

Terminology
-----------

Definition of "object":
The standard-SQL definition is:
"object (as in 'x object'):
Any thing. An X object is a component of, or is otherwise
associated with, some X, and cannot exist independently of
that X. For example, an SQL object is an object that exists
only in the context of SQL; an SQL-schema object is an object
that exists in some SQL-schema."
We only care about "server objects".
We only care about "objects" that are not "session objects".
We only care about "persistent" objects, maybe, that is, I
see that there may be a desire to exclude MEMORY and "temporary".

Definition of "metadata":
I'm happy enough with
http://en.wikipedia.org/wiki/Metadata
but emphasize it's "data about data" i.e. "data about SQL-data"
e.g. "data about data in the database" -- PERSISTENT data.

Definition of "ephemeral":
Data that I earlier referred to as will-o'-the-wisp.
For example there are these information_schema views:
GLOBAL_STATUS
GLOBAL_VARIABLES
PROCESSLIST
PROFILING (community only)
SESSION_STATUS
SESSION_VARIABLES
It was a mistake to put these tables in information_schema,
they should be thrown away or shifted to performance_schema,
and it should be emphasized that performance_schema data is
ephemeral -- won't be backed up, won't be replicated, etc.
Therefore this WL#4116 statement is wrong:
"It has been decided that the information_schema views shall
not be included in the first release and shall be added in a
later release." Some "information_schema views" that aren't
metadata don't belong in information_schema, so they should
never be backed up in any release.

References
----------

Private emails (Lars + Peter)
"Re: Fwd: Backup of all objects" on 2007-11-15.

dev-backup email thread
WL#3713 Online backup: Selective rpl/backup of metadata
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=412

dev-backup thread "Backup: metadata items list"
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=184

Oracle11g list of schema objects:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/schema.htm#CNCPT111