WL#7284: Implement common code for different DD APIs
Affects: Server-8.0
—
Status: Complete
WL#6380 defines basic principles and guidelines for implementing new Data Dictionary. In other words: *how* new API should look like and what principles should be followed. Different API implementations (table API, stored program API, ...) are based on some [hidden] core API and have common code. This WL specifies that core API. In other words: this WL is about *implementing* common code following the guidelines set in WL#6380.
NF1: No user visible changes.
Intro ===== This HLS is organized in the following chapters: - "Basics" providing technical information about DD API. - "Main concepts" outlining main ideas behind the DD API. This chapter should be of the main interest for DD API users. Look for 'DD API USERS' comments there. - "Implementation details" which specifies the logic under the hood of the DD API. This chapter should be of no interest for DD API users. - "For the record" that contains thoughts / ideas / alternatives which were tried and rejected. Basics ====== Foundation libraries -------------------- It's proposed to use STL containers and STL strings for quickly kick-starting new Data Dictionary. When proper new string tools are available, it should be pretty straightforward to switch from STL-strings to our custom strings. STL strings provide good level of encapsulation, thus it is pretty hard to abuse them. This is very important for the starting phase. NOTE: - STL strings will have to be replaced either by current-MySQL-strings or by new-MySQL-strings before new DD goes live. It's not clear how STL strings support all the character sets we support in the server. - The main problem with the integration of the DD implemented via STL-strings with the rest of the server seems to be "the lack of ownership". I.e. the current MySQL-strings are basically (char *), and might be modified somewhere down the call-stack. So, a simple cast from STL-string to (char *) is dangerous. - STL containers most likely can still be used in production, as the server already uses them in other components; Source file locations --------------------- All Data Dictionary files reside under sql/dd directory. Within the sql/dd directory there is the following hierarchy: - sql/dd/ Public interface files (except for the object types) are placed here. - sql/dd/types/ Public object type interface files are placed here. - sql/dd/impl/ All the implementation sources are placed under this directory. - sql/dd/impl/db/ "Database Access Layer" (see below) implementation files. - sql/dd/impl/types/ Object type implementation files. The high-level idea behind the hierarchy is that the implementation files must be visible separate from the interface files. Main concepts ============= The Life Cycle -------------- The Data Dictionary component must be properly initialized before use, and proporly shuted down before the server is down. The Data Dictionary life cycle (initialization / deinitialization) depends on the InnoDB, but it has nothing to do with InnoDB initialization or crash-recovery. DD initialization +++++++++++++++++ The DD initialization has the following steps: - initialize the Data Dictionary internal in-memory structures; - install metadata of the Data Dictionary tables in InnoDB (do the bootstrap), or load that metadata in memory; - load various server defaults from the Data Dictionary (such as server collation); DD shutdown +++++++++++ During the DD shutdown the following actions are taken: - destroy in memory objects. The Dictionary interface ------------------------ The Dictionary interface is the main point for accessing the new Data Dictionary. It's main responsibility is to start a new DD transaction. The DD user deals with one of the transaction interfaces (see below) to access or modifies DD objects. Global DD functions ------------------- New DD provides the following global functions in the "dd" namespace to be used by the rest of the server: - dd::init() This function performs the Data Dictionary initialization. It is intended to be called during the initialization of the MySQL server. - dd::shutdown() This function properly closes the use of Data Dictionary, freeing up allocated resources. The function is intended to be called during the shutdown of the server. - dd::get_dictionary() This is the main entry point of the Data Dictionary. The function returns a pointer to dd::Dictionary inteface. The function must be called after dd::init() and before dd::shutdown(). Calling it before dd::init() or after dd::shutdown() results in undefined behavior (firing an assert in debug builds, returning NULL in release ones). Database access layer --------------------- DD API USERS: this section describes DD internal component, which is not exposed through DD API and is not intended to be used directly by DD users. Every DD API (table API, store program API, ...) has to perform basic operations with the database, such as: - lookup a record by a key / index (SELECT) - insert a new record (INSERT) - update some columns of the previously found record (UPDATE) - delete previously found record (DELETE) There are the following main entities: - MySQL-TABLE wrapper This is a class containing (TABLE *) and providing operations to: - find a record (return found record object wrapper) - prepare a record for update (return prepared record object wrapper) - prepare a new record for insert (return new record object wrapper) - opens a record set (return record-set object wrapper) - delete a record - Record-set wrapper This class basically encapsulates one operation -- advance to the next record with the given criteria (primary key, index, ...). - Record wrapper This class provides convenient operations for storing and retrieving primitive data types (numbers, strings, ...) to/from particular field. - Key wrapper This is a simple structure, containing all the needed [low-level] information to pass to handler API in order to do read / write / search / ... Essentially, these primitives are just convenient wrapper on the MySQL fundamental objects (such as TABLE) and on MySQL Handler API. They are needed to provide ORM capabilities. These entities will reside in a separate directory: sql/dd/impl/raw. The classes will be named with the 'Raw_' prefix to highlight the low-level nature of them. The 'Raw_' primitives provide a way to access (read/write) a particular table in the database (e.g. dd.tables). Contrary, object tables encapsulate the specific of the Data Dictionary tables, such as a set of columns, indexes used and so on. Every DD-table has a corresponding class among Object-tables. That class defines the collection of columns (fields) and provides operations specific to this table such as: - create a key for a lookup - find a record by some criteria NOTE: several alternatives to the naming scheme were discussed, and the 'Raw_' prefix was chosen as the least evil. It is also viewed as a warning sign, which should prevent regular DD-user from dealing with such classes. NOTE: the classes are: - Raw_key - Raw_record - Raw_record_set - Raw_new_record - Raw_table Object key ---------- We should provide a uniform way to access rows of various dictionary tables without exposing details about physical layout of table indexes. Object_key interface provides a single operation to create an Access-key instance. This interface has different implementations for different table keys and indexes. Data Dictionary transactions ---------------------------- Every operation on the Data Dictionary must be performed in a context of "Data Dictionary transaction". In other words, a DD API user must have a Data Dictionary transaction context (i.e. DD transaction must be explicitly started by the DD API user at some point) for doing the following operations: - Read metadata for an object (open a table, ...) - Modify metadata for an object (alter table, ...) There are two final transaction interfaces used in Data Dictionary: - Transaction_ro -- Read-only DD transaction (processing DML and DDL statements); - Transaction_rw -- Read-write DD transaction (processing DDL statements); In other words: read-only DD transactions can not request a DD-object for modification. Transaction_ro and Transaction_rw derive from dd::Transaction interface, which specifies common operations for RO- and RW-transactions. The transaction interfaces serve as entry-points to retrieve global objects, such as schemas (or catalogs), character sets, collations. Also, these interfaces provide operations to retrieve objects by SE-private identifiers. In other words, the DD API user has to explicitly request the Data Dictionary to begin either read-only or read-write DD transaction. As a result of that request, the DD API user will receive a C++ pointer to a C++ object implementing requested interface / behavior. The Data Dictionary provides convenient auxillary classes ("transaction holders", empoying RAII technique) to make sure DD users don't forget to end the active DD transaction. A read-write DD transaction will have its own object cache for the DD objects modified in that transaction. See WL#8150 for more details about that. The long-term vision is to have a single DD transaction spanning for several SQL statements. This way it will be possible to achieve transactional DDL at some point. Overview of Data Dictionary object classes ------------------------------------------ Every Data Dictionary entity (table, column, index, foreign key, view, store program, ...) is represented in three class families: - So called "object" classes. - So called "object type" classes; - So called "object table" classes; Object classes ++++++++++++++ Object classes are probably the most obvious ones. Every DD-entity has a corresponding "object class", which represents one instance of that entity. For example: - one column is represented by one instance of dd::Column class; - one table is represented by one instance of dd::Table class. DD API USERS: Object classes are parts of DD API and will probably be most used classes while dealing with the Data Dictionary. Object type classes +++++++++++++++++++ An object-type-class serves as a meta-class for object-classes. Object type classes have the '_type' suffix. Object type classes are singletons (there is only one instance of each object type class). One of the responsibilities of object-type-class is to create a new object-class instance. For example: - all columns (all dd::Column instances) reference one instance of dd::Column_type. - all tables (all dd::Table instances) reference one instance of dd::Table_type. DD API USERS: Object type classes are parts of DD API. Object table classes ++++++++++++++++++++ An object-table-class is an interface to a particular DD-table (a physical table on disk in the Data Dictionary schema). Object-table-classes encapsulate all the knowledge about DD-tables, such as column indexes, index numbers, ways to create keys and so on. Object-table-classes are named after corresponding DD-tables. Apart from other DD-class, the object-table-classes reside in the dd::tables namespace. That's done intentionally to minimize name conflicts. For example: - dd::tables::Columns -- corresponds to dd.columns; - dd::tables::Tables -- corresponds to dd.tables. DD API USERS: Object table classes are parts of DD API. This design provides needed level of flexibility to handle different object types stored in one table (for example: dd::Table and dd::View objects are stored in dd.tables). Kinds of object classes ----------------------- There are different kinds of DD objects: - objects which don't have object-id (dd::Weak_object) - objects which do have object-id (dd::Entity_object) - objects which are cached by the Data Dictionary (dd::Dictionary_object) Weak_object +++++++++++ Weak_object is a single parent (top-most) interface for all Data Dictionary object classes. The name "Weak_object" comes from the database modelling language, where entities without primary ID key are named weak entities: http://en.wikipedia.org/wiki/Weak_entity Weak entity can not be uniquely identified by its attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The C++ class Weak_object is a simple class, which does not contain any operation that is significant to the DD API user. An example of such class is dd::Index_element (the corresponding DD table: INDEX_COLUMN_USAGE). It encapsulates the information about a column in an index. Entity_object +++++++++++++ Entity objects are the objects which have object ID. Entity objects are represented by the Entity_object interface derived from Weak_object. Entity_object extends Weak_object by bringing in more capabilities related to object ID. The C++ class Entity_object declares the following attributes: - Object ID - Object name Example of entity objects: - Table - Column - Index Dictionary_object +++++++++++++++++ Dictionary_object denotes objects, which can be accessed via the dd::Dictionary interface. In other words, Dictionary_object is the base interface for all object types, which are accessed via dd::Dictionary interface, which means they can be loaded / stored independently, and they can be referred to from other objects by Object-ID. Example of dictionary objects: - Charset - Collation - Schema - Table A dictionary object is an entity object (i.e. if an object can be retrieved using dd::Dictionary, the object must have ID). But not all entity objects can be retrieved independently via dd::Dictionary. This is elaborated below in more details. The C++ class Dictionary_object has the following members: - Object type attribute, which returns a reference to a Dictionary_object_type-based class (see 'Object type classes' above). - Operation to create the key (Object_key interface) identifying this particular instance. Handling object relations ------------------------- There are two types of references between objects: - "weak object reference" -- an object reference by object-id, meaning that the object might not be loaded yet. - "strong object reference" -- an object reference by C-pointer (C-reference) meaning that the object is surely exists (loaded) in the memory. Thus, there are "tightly-coupled" and "loosely-coupled" object classes. Here are current relations and their types: Tablespace - Tablespace_file : tightly-coupled Table - Column : tightly-coupled Table - Index : tightly-coupled Table - Foreign_key : tightly-coupled Column - Set_element : tightly-coupled Column - Enum_element : tightly-coupled Index - Index_column : tightly-coupled Foreign_key - Foreign_key_column : tightly-coupled Catalog - Schema : loosely-coupled Catalog - Tablespace : loosely-coupled Schema - Table : loosely-coupled Schema - Stored_routine : loosely-coupled Table - Tablespace : loosely-coupled Index - Tablespace : loosely-coupled Where: - "tightly-coupling" relations means aggregation; - "loosely-coupling" -- association. Note: - Foreign_key has actually two table references: the "from" and "to" tables. The tightly-coupled relation goes to the "from" reference (a table aggregates the foreign keys it has). The "to" table reference is named "referenced_table" and that reference is implemented using names. In other word, each Foreign_key object has 3 attributes specifying "referenced table" catalog name, schema name and table name. That kind of relations is neither tightly-coupled nor loosely-coupled as the DD API doesn't provide ways to deal with it. DD API users have to "manually" lookup "referenced_table" by the corresponding name attributes. This section will use use "Parent" and "Child" words to specify relations in general. For example: - in Catalog - Schema - Parent: Catalog - Child: Schema - in Table - Column - Parent: Table - Child: Column Resolving weak references +++++++++++++++++++++++++ In the persistent DB-layer a reference is represented by object-id of the object being referenced. DD API user should request a referenced DD-object by ID. See the "Object_reference" section in the "For the record" chapter below for more background information. Mapping of loosely-coupled classes ++++++++++++++++++++++++++++++++++ Background (quick summary): - Child has a weak reference to the Parent: class Child { ... Object_id m_parent_id; }; - Child has getter/setter allowing to get/modify the reference: class Child { ... Object_id parent_id() const; void set_parent_id(Object_id); }; The proposal to handle loosely-coupled objects: - Parent does not have in-memory collection (cache) of references to the children. - Instead, the Parent has an operation that queries the database and retrieves all the children for the given Parent: class Child_iterator { Child *next(); }; class Parent { ... Child_iterator *fetch_children(THD *thd, Functor *condition); }; fetch_children() basically opens a cursor on a table and fetching rows which match the given condition. - As we aim to create defensive API (against misuse), every Parent-class will have an operation to create an object of the Child-class: class Parent { ... Child *create_child(); }; The Parent::create_child() method creates a new instance of Child and does the basic initialization. At least, it sets Child::m_parent reference. Thus, the DD-user does not have a way to create a Child-object, which does not belong to any Parent (which is good). There is one limitation: a child can not be created on a non-persistent (newly created) parent. In other words, the following code will not work: Parent *p= xxx->create_parent(); // 'p' is an in-memory object, it is not stored in the persistent // database, p->id() is INVALID_OBJECT_ID. Child *c= p->create_child(); // 'c' will be NULL here. The Parent::create_child() call above // fails as 'p' is not persistent yet (p->id() is INVALID_OBJECT_ID). The reasoning for this limitation is as follows: - Parent -- Children references are uni-directional -- every child has a reference to the parent, the parent does not have a reference to its children. - If a child is created for the non-persistent parent, it's not clear how to store these objects: - If the child->store() is called, it will fail since the parent reference is not established (parent_id is INVALID_OBJECT_ID). - All children should be updated after parent->store(), but that's impossible since the parent doesn't know its children. This limitation might be lifted in future versions, but so far such feature does not seem to be needed. - Parent::drop() operation is "cascade", i.e. it drops all the children. - Child::drop() operation just removes child's association from the Parent, meaning that the next fetch_children() will not return the dropped child. - Any drop() operation drops the corresponding InnoDB records. That happens in the scope of the current DD transaction. The changes will be visible to other parties once the current DD transaction is comitted. - The Parent-class also provides needed lookup-operations, like lookup-by-name: class Parent { ... Child *lookup_child(THD *thd, const std::string &child_name); }; Mapping of tightly-coupled classes ++++++++++++++++++++++++++++++++++ In this case the Parent class has a collection of Child-objects. Something like this: class Parent { ... std::vectorm_children; }; That collection is loaded and stored when the Parent-object is loaded and stored. The Child class (as usual) has a pointer to the parent: class Child { ... Parent *m_parent; }; The problem is that in this case we have bidirectional references, thus every reference change must be updated on both sides. For example: Tablespace - Tablespace_file Let's say, we're moving a tablespace file from one tablespace to the other. The following references must be updated: - Old_tablespace::m_files - New_tablespace::m_files - Tablespace_file::m_parent We need to design error-proof API that minimizes chances for the user to make a mistake (update the reference on one side, and forget to update it on the other side). NOTE: as outlined in WL#6380, we aim for the iterators. I.e. we will have iterators instead of index-based lookup/traverse. See the "Alternatives for mapping tightly-related objects" section in "For the record" chapter for more information. The following use cases look appropriate from the DD-user view point: a) Iterating through all items: Child_iterator *it= parent->children(); while (true) { Child *child= it->next(); if (!child) break; ... } b) Adding new item: Child *child= parent->create_child(); c) Adding existing item: not supported. d) Drop existing item Child *c= parent->lookup_child(child_name); c->drop(); e) Change the parent: Child *c= parent->lookup_child(child_name); c->move(new_parent); Handling changes in child-object ++++++++++++++++++++++++++++++++ The thing is that we need to handle three types of child-object changes: - A new child is added - An existing child is dropped - An existing child is modified The main idea is that Weak_object::store() must be the only operation that writes changes to the database storage. In other words: add / drop / update operations will update in-memory state only, and until store() for the "main object" is called, these changes will not hit the persistent database storage. The main object for tightly-coupled relationship is the top parent object. For example: a table for the 'Table - Column - Set_element' relation -- until Table::store() is called, no changes to the table columns, or to any column element are stored. The main object for loosely-coupled relationship is the child object. For example: a new schema table will not be stored in the persistent database until Table::store() is called. An alternative could be to propagate changes to the persistent storage in each add / drop / update operation. It would simplify the development, but it would also introduce potential problems with implementing online ALTER TABLE, where an acquisition of the exclusive lock has to postponed as long as possible (till the very end of the operation). This alternative would mean, the exclusive lock had to be taken before the first add / drop / modify column operation. NOTE: it was also discussed if it is possible to hide store() operation from the DD-user, and make the user to operate in terms of DDL-transaction's commit or rollback. That should be possible, but not for the 1st increment. When we introduce DDL-transactions, we can consider keeping track of object changes in the transaction context, and "apply" those changes (call store() method on the corresponding objects) if needed. Postponing writes to the persistent storage means that every DD-object must be able to tell: - if it is a new or not; - if it has been modified in memory; - if it has been dropped. Properties overview ------------------- Several object types provide associated properties for every object instance. In other words, an object can have an associated set of key-value pairs, e.g., each table has "options" and "se-private-data". The Properties interface and corresponding implementation is provided as a common tool to implement those sets of key-value pairs in different object types. Handling null-able columns -------------------------- Many columns in the dd-tables are allowed to contain NULL value (null-able columns). This section outlines the relation between the DD API and null-able columns. First of all, most of the null-able columns are declared null-able because they are shown by INFORMATION_SCHEMA tables, which is specified by The SQL Standard. In other words, a column has to be null-able just because it's shown in an INFORMATION_SCHEMA table, where it's specified to be null-able. Thus, we can not get rid of null-able columns. Null-able columns are not exceptions, an estimation is that about 30-50% of all columns will be null-able. The proposed idea on the high-level is that generally the DD API must not expose column null-ability to the DD user. On a rare cases null-ability will be reflected in the API terms. The thing is that in most cases there are strict rules when a column must be set NULL and when it must have reasonable value. For example, dd.table.partition_expression must be NULL for non-partitioned tables. An alternative is to push the responsibility to enforce those rules to the DD-user side. That alternative seems to be error prone, since the code can be duplicated and not well maintained. It's better to have a single place (DD API implementation) where those rules are enforced. In rare case when we do need to expose column null-ability, it's proposed to stick to the following convention: - Let's say we have 'xxx' column, which is mapped to the 'xxx' attribute; - Ordinary operations are (getter/setter): const T &xxx() const; void set(const T &); where 'T' is the column/attribute type. It's proposed to use the same style for null-able and non-null-able attributes (i.e. use C++ references in both cases). - In addition to the operations above, it's proposed to have the following operations: bool has_xxx() const; -- returns 'false' if 'xxx' is NULL; -- in that case, the result xxx() is undefined. void set_xxx_null(); -- indicates that 'xxx' must be set to NULL; Handling SQL enums ------------------ WL#6379 defines several columns in the DD-tables to be of SQL ENUM type. SQL ENUM type is required there to be shown in the INFORMATION_SCHEMA tables. SQL-ENUM and C-enum types have a few important differences: - 0 has special meaning for SQL-ENUM and can not be used to identify regular constant; - SQL-ENUM can not have gaps in the numeric values. That means, there is no way to represent C-enum like the following: enum enum_field_types { MYSQL_TYPE_DECIMAL, -- The numeric value is 0. MYSQL_TYPE_TINY, MYSQL_TYPE_SHORT, ... MYSQL_TYPE_DATETIME2, MYSQL_TYPE_TIME2, -- Here is a "gap" in numeric values MYSQL_TYPE_NEWDECIMAL=246, -- between N and 246. MYSQL_TYPE_ENUM=247, MYSQL_TYPE_SET=248, ... MYSQL_TYPE_GEOMETRY=255 }; So, here is the problem discussed in this section: any C-enum can not be fully represented by SQL-ENUM. There are the following alternatives to solve the problem: 1. do not use SQL-ENUM, use SQL-INT to represent C-enum values. That's not suitable because INFORMATION_SCHEMA needs to show SQL-ENUM, and we don't want to add extra overhead by converting functions. 2. use both SQL-INT and SQL-ENUM (two columns) to store a single attribute. That means duplication and possible inconsistency. 3. have a mapping between C-enum values and SQL-ENUM values. That adds some overhead and additional logic into DD API. Another related problem is that in few cases enum values (both C-enum and SQL-ENUM) are a part of some API: - C-enum enum_field_types values are exposed by the client-server protocol and prepared statements. - dd.columns.type of SQL-ENUM is exposed via INFORMATION_SCHEMA, and what is even worse, several C-enum values must be shown as a single SQL-ENUM value. For example, MYSQL_TYPE_DECIMAL and MYSQL_TYPE_NEWDECIMAL C-enum-value must be represented by DECIMAL SQL-ENUM value. Sometimes that problem occurs for legacy elements, which are going to be deprecated (MYSQL_TYPE_DECIMAL in the example above). However, it's also anticipated that the enumeration values will evolve in the future (new items might be added as new types/algorithms/... are added, others might be deprecated and removed). So, it's expected that the situation we have today will also be common in the future. When a C-enum is not exposed to the users (is a part of some interface / contract), it should be possible to change the numerical values of the C-enum. However, given the nature of C-enum it's not a trivial task to especially get rid of the items with 0 enum value. The problem is that C-enum variables can be used in 'if' statements in the form of "if (x)" (i.e. just comparing against zero). Another side of this problem is that those variables can be initialized within the containing structures by using memset()-family. Thus, changing the numerical values of the C-enums is not a complex, but tedious task, which should be carefully verified and tested. The proposal is identify which enums can be converged eventually, but don't rush to converge them at this point in time. It is proposed to use (2) or (3) depending on the attribute. Implementation details ====================== Object_type interface --------------------- Every DD-object has a special meta-class, that implements the Object_type interface. The idea for introducing object-type class is that there are the following responsibilities associated with the object type, not with the object instance: - Create an instance of DD-object of a given object type; - Know which DD tables should be opened to store/restore an object instance; Every DD-object class contains a static constant reference to the instance of its type. For example: class Table { ... static const Object_type &TYPE; ... } This reference is initialized in the implementation code. Transaction interfaces ---------------------- As it was mentioned above, there are 3 transaction interfaces: - Transaction -- the base interface - Transaction_ro -- the interface of read-only DD transaction (used mostly in DML handling) - Transaction_rw -- the interface of read-write DD transaction (used for procession DDL statements) There are the following sets of operations. Create global objects +++++++++++++++++++++ virtual Schema *create_schema() = 0; virtual Tablespace *create_tablespace() = 0; Get a single object +++++++++++++++++++ Retrieve an individual object. It's proposed to use the "get_" prefix to designate such operations. These operations might use the internal dictionary cache. However, the cache interactions must be transparent for the user. The cache will only speed-up things, but not affect the semantic of the operation. In other words, the DD-user must get the same results with the cache turned on and off. If a "get"-operation fails, NULL pointer is returned. The operation might fail mainly because of the following reasons: - requested object does not exist; - an internal error happened, which prevented DD to complete the operation. This kind of errors is expected to be very rare. Internal errors make the THD::error flag to be set and trigger an error message to be put into the error log. Fetch a single object or a collection of objects ++++++++++++++++++++++++++++++++++++++++++++++++ The "fetch_" prefix designates operations, which involve an unconditional round trip to the persistent database. "fetch"-operations can return a single object as well as a collection of objects. If a "fetch"-operation returns a single object, it returns NULL-pointer in case of failure. It will be the DD user responsibility to handle that. If a "fetch"-operations returns a collection of objects, it returns a pointer to iterator over that collection. The pointer is NULL in case of failure. "Fetch" operations basically do the following steps: 1. lookup an object-id by some secondary key/index (object-name, se-private-id, ...); 2. "get" an object by that object-id. The 2nd step can involve the object cache. "Fetch" operations that deal with collection do a full-prefetch and return an iterator over the in-memory collection of loaded objects. Basically, what happens under the hood is: - go directly into the persistent database, opens necessary table. - find matching records. - read 'object-id' from the matched record. - "get" an object by that object-id (i.e. find it in the object cache, or restore from the persistent database) and put the object into the resulting collection. - close the table. - return an iterator over the collection. There are the following issues with the operation semantics: - If a new object matching the fetch-operation criteria has been created in-memory, but not yet stored in the database, the fetch operation will skip it. For example: // Let's say, there are 3 objects matching the criteria. print_out_children(trx->fetch_children()); // Now add two new children. Child *c4= c4->set(...); Child *c5= c5->set(...); // As the new children have not been committed to the database, // the following call will still print 1, 2, 3 print_out_children(trx->fetch_children()); // Store new children. trx->store_object(c4); trx->store_object(c5); // Now, fetch_children() sees new children, // so the output will be 1, 2, 3, 4, 5. print_out_children(trx->fetch_children()); - If an existing object matching the fetch-operation criteria has been modified in-memory, but not yet stored in the database, the fetch operation will return the *modified* object. For example: // Let's say, there are 3 objects matching the criteria. // Let's assume every object has object-id and a symbol // attribute. // So the function will print 1a, 2a, 3a print_out_children(trx->fetch_children()); // Now, let's modify the 2nd child. Child *c2 = c2->set_symbol(b); // fetch_children() will return the modified c2. // So the function will print 1a, 2b, 3a print_out_children(trx->fetch_children()); These issues are not a problem because of metadata locking system. The thing is that MDL will prevent modification of children while accessing parents. The MDL system and the Data Dictionary are on different levels, but the Data Dictionary relies on the proper MDL locking. Release object ++++++++++++++ When the user is done with the DD-object, the user must indicate that to the Dictionary by calling the release-object operation. After calling release_object(), the object pointer might be still valid, but the user must not use it. See WL#8150 for more details. Store object ++++++++++++ This operation stores a new or existing object into the database and also puts into the cache if needed. Under the hood this operations goes through the object cache in order to maintain its consistency. When a newly created object is stored, it's assigned a newly generated object-id after insert (the value of last-inserted-id from auto-increment column). So, the storing sequence of a newly created DD-object looks like the following: 1. [Create a new DD object] 2. Insert a new InnoDB record into the corresponding InnoDB table; 3. Get the last-inserted-id and assign it as the object ID for the DD-object. That happens in memory, so the DD-object is now in the updated state in memory. 4. Put the DD-object into the appropriate cache. NOTE: this operation will be hidden behind the DD transaction interface. All object modifications will be stored on a transaction commit. Drop object +++++++++++ This operation attempts to drop an existing object in the database, and releases the object (i.e. it becomes unavailable to the user). Calling drop operation for a newly-created objects does not make much sense (newly created objects exist only in memory), but it's a valid scenario. The user-visible effect in this case is just that the object is released and becomes unavailable to the user. It's been discussed whether the drop operation should release the object being dropped when the database essentially failed to drop it, and the decision is to release the object unconditionally. The rationale is as follows: - It is consistent for the DD-user (the object passed to drop_object() is always released); - The fact that the database failed to drop an object means that the object has already been dropped in the database. How that might happen is a different question (we rely on MDL). The thing is that the object in memory is inconsistent and it's good to force the DD-user to re-retrieve it. NOTE: this operation most likely will be moved to the Dictionary_object interface. Informational operations ++++++++++++++++++++++++ There might be additional operations, such as: const Collation &server_collation() const Support InnoDB specific use cases +++++++++++++++++++++++++++++++++ As it has been mentioned in WL#6380, we fully rely on the MDL subsytem. That means, that the Dictionary itself does not check if the object requested by the user can be accessed or modified. It is supposed that the user did all needed interactions with the MDL to lock objects. There will be assertions to ensure that a proper MDL-lock has been taken. NOTE: there two different MDL-lock types involved in the DD API: 1. MDL-locks on the dd-tables; This is just the nature of accessing (opening & locking) tables within the MySQL server. The Data Dictionary doesn't need these locks itself, but the server infrastructure does. 2. MDL-locks on the objects, requested from the dd-tables. The DD API takes care of (1), but relies on the user to take care of (2). InnoDB has special needs -- during the recovery, InnoDB needs to retrieve objects for which MDL-locks are not held. The idea is to provide a special interface for InnoDB to be able to retrieve unlocked objects. This interface must not be used by regular DD-users. The thing is that InnoDB can not take needed MDL-lock, just because InnoDB does not know object-name. All that InnoDB knows is se-private-id. If it could, we would not provide such a special interface, but force InnoDB to use common MDL-interface. But it can not. Thus, the DD API must provide an operation to: 1. lookup object-row by se-private-id (or any other index tuple) 2. get the object-name from the row 3. acquire MDL-lock on object name 4. lookup object based on se_private_id (or on object-name) again 5. check if the object name has changed. This step makes sure that there was no rename/drop operations between (2) and (3). Implementation notes ++++++++++++++++++++ From the DD implementation view point, the differences between Transaction_ro and Transaction_rw are as follows. These are implementation details and hidden from DD API users. - Read-only transactions use InnoDB attachable transactions. - Read-only transactions commit InnoDB attachable transaction in the end; - Read-write transactions use the user (outer) InnoDB transaction so that all changes are committed at once; - Read-only transactions share the object cache (use the global object cache); - Read-write transactions use their own object cache to store modified objects. See WL#8150 for more details regarding the transaction implementation. Vision of the future ++++++++++++++++++++ The vision is that a DD-transaction will have a long scope including the execution of a few SQL statements. That in particular means that: - a DD-transaction can not be seen "just" as a collection of THD-attributes to be saved/reset/restored. In fact, the THD-attributes must be reset/restored for a short time when the DD is dealing with the persistent storage (InnoDB) (reading / writing / dropping records). It's important that for the rest of time, the original values of the THD attributes are in effect. - there will be a need for upgrade a read-only DD transaction to a read-write DD transaction: the user transaction might start with a plain SELECT followed by ALTER TABLE. That in turn means that InnoDB attachable transaction must be upgrade to a regular read-write transaction. This is an unexplored question now. Properties ---------- The Properties interface provides: - convenient access to reading and modifying key-value pairs - an interface with the following core functions: - size() : Return the number of key=value pairs - exists(key) : Return true if the key exists - remove(key) : Remove the key=value pair - value(key) : Return the value for the given key - get(key, &value): Return (by parameter) the value - set(key, value) : Set/replace the value for the given key - iterators : Iterate over all key=value pairs - conversion functions (X in {int64, uint64, int32, uint32, bool}): - to_X(string, &value): Convert from string to X, return by parameter - from_X(value) : Convert from X to string, return string - set_X(key, value) : Convert from string to X, set value for key - get_X(key, &value) : Get value for key, convert to X, return by parameter - access to the raw string, which stores all those key-value pairs in some internal format (see below) to store that string in the DB; - a way to parse the raw string into a set of key-value pairs in order to load properties from the DB. NOTE: size_t is not among the supported types of the Properties interface. However, it might be re-considered later if needed. The internal format of the raw string is a semicolon separated list of key=value pairs. The list must conform to the following informal grammar: ::=
| ";" |
::= "=" ::= ::= ::= | ::= .-["\","=",";"] | "\\","\=","\;" ::= "" Thus, keys and values are separated by "=", and key=value pairs are separated by ";". A key must contain one or more characters, while a value can be an empty string. The "=" and ";" may be part of key or value, but then they must be escaped by "\". Hence, "\" itself must also be escaped. The key=value list supports UTF-8 strings. If other character sets are needed, the strings must be encoded. Internally, within the Properties implementation, the only characters we look for are the numbers '0'..'9', sign ('-') (this is actually handled in the mysql number conversion function), the semicolon ';', backslash '\' and equality '='. All of these are single byte characters in UTF-8. Single byte characters in UTF-8 start with a 0 bit (0xxxxxxx), while characters that are represented as multi byte sequences in UTF-8 have bytes starting with at least one 1 bit (1xxxxxxx, see http://en.wikipedia.org/wiki/UTF-8). Thus, a byte which is part of a multi byte character will never be equal to a byte which is by itself a single byte character, i.e., there will never be a byte which is part of a multi byte character that equals any of the characters we search for internally in the Properties implementation. Thus, UTF-8 strings are handled even though the iteration is byte by byte rather than character by character. This means that string parsing, unescaping and escaping works, which again means that both keys and values can be UTF-8 strings. The Properties interface also defines a set of auxiliary functions to parse string values into int32, int64, uint32, uint64, boolean types. The functions are static, and can be invoked through the interface name. The functions are defined as, e.g.: bool to_int64(const std::string &string, int64 *value); Here, the string argument is supposed to be a string containing the decimal value of the number. int64 x; bool status= Properties::to_int64("12345", &x); The boolean conversion function is defined along the same lines. The string argument is supposed to be "true" or a number != "0" (=> true) or "false" or "0" (=> false). bool maybe; bool status= Properties::to_bool("true", &maybe); There are also conversion functions converting the other direction, i.e., from primitive types to string: std::string from_int64(int64 value); These functions are also wrapped into variants of the set and get functions of the Properties interface in order to convert a value between a primitive type and a string and set or get this value for a given key, e.g.: bool set_int64(const std::string &key, int64 value); bool get_int64(const std::string &key, int64 *value); List of SQL enums in DD API --------------------------- - TABLES - type ENUM(BASE TABLE, VIEW, SYSTEM VIEW) Corresponding C-enum: none - partition_type ENUM(AUTO, RANGE, LIST, RANGE_COLUMNS, LIST_COLUMNS, HASH, KEY_51, KEY_55, LINEAR_HASH, LINEAR_KEY_51, LINEAR_KEY_55) - default_partitioning ENUM(NO, YES, NUMBER) - subpartition_type ENUM(HASH, KEY_51, KEY_55, LINEAR_HASH, LINEAR_KEY_51, LINEAR_KEY_55) - default_subpartitioning ENUM(NO, NAMES, NUMBER) - VIEWS - check_option ENUM(NONE, LOCAL, CASCADED) - is_updatable ENUM(NO, YES) - algorithm ENUM(UNDEFINED, TEMPTABLE, MERGE) - security_type ENUM(INVOKER, DEFINER) - COLUMNS - type ENUM(...) Corresponding C-enum: enum_field_types - INDEXES - type ENUM(PRIMARY, UNIQUE, MULTIPLE, FULLTEXT, SPATIAL) - algorithm ENUM(BTREE, RTREE, HASH, FULLTEXT) - INDEX_COLUMN_USAGE - order ENUM(ASC, DESC) - FOREIGN_KEYS - match_option ENUM(NONE, PARTIAL, FULL) - update_rule ENUM(NO ACTION, RESTRICT, CASCADE, SET NULL, SET DEFAULT) - delete_rule ENUM(NO ACTION, RESTRICT, CASCADE, SET NULL, SET DEFAULT) - TRIGGERS - event_type ENUM(INSERT,UPDATE,DELETE) - action_timing ENUM(BEFORE,AFTER) - ROUTINES - type ENUM(FUNCTION, PROCEDURE) - result_data_type ENUM(...), - sql_data_access ENUM(CONTAINS_SQL, NO_SQL, READS_SQL_DATA, MODIFIES_SQL_DATA) - security_type ENUM(INVOKER, DEFINER) - PARAMETERS - mode ENUM(IN, OUT, INOUT) - data_type ENUM(...) - EVENTS - interval_field ENUM(YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, WEEK, SECOND, MICROSECOND, YEAR_MONTH, DAY_HOUR, DAY_MINUTE, DAY_SECOND, HOUR_MINUTE, HOUR_SECOND, MINUTE_SECOND, DAY_MICROSECOND, HOUR_MICROSECOND, MINUTE_MICROSECOND, SECOND_MICROSECOND) - status ENUM(ENABLED, DISABLED, SLAVESIDE_DISABLED) - on_completion ENUM(DROP, PRESERVE) Single record lookup -------------------- In this specification, "single record lookup" means that an InnoDB record/row must be read no more than once to retrieve/restore a DD object. The thing is that the Data Dictionary doesn't know the exact type of a DD-object being restored. For example, both views and tables are stored in the same table. When a DD user requests a table-or-view by ID, the Data Dictionary does not know what it is before it reads the record and analyzes the type field. There are different keys used to look for needed DD-object in the DD-table: - by object-id; - by name; - by SE-private-id; - by composite key. We must have single record lookup in all these cases. Usage of size_t --------------- The Data Dictionary API must use the 'size_t' type for all attributes which essentially correspond to any "length" (such as string length). Length is usually represented by the 'size_t' type in C/C++ libraries. Thus if we chose any other data type, we would need a type conversion/cast every time length-attribute is used. Those conversions/casts would be required on the DD user side. Using of the 'size_t' type in the DD API moves those conversions/casts to the API implementation. That does *not* mean that any unsigned numeric type must be represented as size_t. For the record ============== Alternatives for mapping tightly-related objects ------------------------------------------------ There are several alternatives: 1. Expose internal containers The idea is like this: class Parent { ... typedef std::vector Child_seq; Child_seq m_children; Child_seq *children() { return &m_children; } }; This approach is simple and straightforward, but it is too low-level for the DD-user. It is vulnerable to misuse. Also, it would be good to have possibility to change the actual container implementation internally without affecting DD-users much. It's been decided to reject this approach, so it will not be discussed further. 2. "Microsoft" approach This approach is widely used within object systems designed by Microsoft. The idea is that a new entity is introduced, which provides all the operations for consistent reference management. So, we had: Parent, Child. Now, we're introducing Child_collection: class Parent { ... public Child_collection *children(); }; The Child_collection interface may be like the following: class Child_collection { add(Child *); drop(Child *); find(...); Child_iterator *iterator(); }; So, the use case scenarios are like this: a) Iterating through all items: Child_collection::iterator *it= parent->children()->iterator(); while (true) { Child *child= it->next(); if (!child) break; ... } b) Adding new item: Child *child= parent()->children()->add_new(); c) Adding existing item (not sure if that's needed at all): Child *child= ... (create new object somehow) parent->children()->add(child); d) Drop existing item parent->children()->drop(child); drop() returns error status if 'child' is not in the collection. Of course, Child_collection will be a new type for every Child-class (i.e. A_collection, B_collection, ...). This approach has the following benefits: - The logic to manage collections is outsourced to a separate class and the Parent is not polluted with various methods. - Most likely it should be possible to come up with a base-collection class, that does most of the functionality. Then inherit individual collection types from it to add type-specificity. Actually, it should be possible to write something like: class Parent { ... typedef Base_collection > Child_collection; Child_collection m_children; Child_collection *children() { return &m_children; } }; - If there is a need for a new operation, there is a class for it, which means: - no further Parent pollution of Child-specific methods; - it will be more self-explanatory for the DD-user to get the list of collection operations. The downside is vague fear of too many classes. Also, it might be a bit tricky to implement collection operations with custom semantic -- for example, if add() operation must do some non-trivial checks; or if more than one container should be consistently mantained (e.g. vector and bitmap). 3. "Java" approach The idea is that the Parent has all the operations to maintain items within the collection. class Parent { ... Child_iterator *children(); void add_child(Child *); void drop_child(Child *); void find_child(...); }; a) Iterating through all items: Child_iterator *it= parent->children(); while (true) { Child *child= it->next(); if (!child) break; ... } b) Adding new item: Child *child= parent()->add_child(); c) Adding existing item (not sure if that's needed at all): Child *child= ... (create new object somehow) parent->add_child(child); d) Drop existing item parent->drop_child(child); drop() returns error status if 'child' is not in the collection. The problem here is the pollution of the parent class with child-operations. However, implementing custom semantic should not be a problem. Overall, the approaches above are probably designed for the cases when fine-grained API is needed. In our situation, we want to provide more high-level API to alleviate the burden of setting references from the DD-user, and (what's more important) to reduce the chance of doing something wrong. From the implementation view point, it makes sense to have a collection class in any way. It might be directly exposed as in (2), or used to implement the parent operations as in (3). Catalog support --------------- The final design of the Data Dictionary depends on the answers on the following two questions: - What is the relation between Data Dictionary and Catalog? In other words: is Data Dictionary global, or per-catalog? - What is the relation between Catalog and Tablespace? Tentative answer: Catalog "has" 1 or more Tablespaces, a Tablespace can be contained by the only one Catalog. However, there are chances to revisit this answer, as the Catalog WL is not completed. It's been agreed to postpone decisions on these items because: - There will be no fully functional Catalog support in 5.7; - 5.7 will have one default catalog as it is in 5.1 / 5.5 / 5.6; - There are many questions about the way Catalogs should be supported in the server; - The Catalog support is not in the 5.8 plans. Thus, in order to finish the Data Dictionary design for 5.7 some assumptions have to be made. It is proposed to design Data Dictionary relying on the current level of Catalog support (the only one default catalog), and outline what should be modified when Catalog support will be added. An alternative to this is to design the Data Dictionary in the assumption of per-catalog Data Dictionary as it seems to cause less changes in the future. However, it seems, that currently Catalog interface will only complicate things. Moreover, the changes will be needed anyway in the future as the Catalog are not properly designed now, so we can only try to predict what will be in the final design. Introducing some Catalog interface right now and changing it later would mean user confusion. So, it's proposed to reject this alternative. Thus, to sum up, the proposal is to not introduce Catalog interface at this point in time. NOTE: even though Catalog interface will not exist in the DD API, it's still proposed to keep the catalog-related columns in the DD-tables (as specified by WL#6379) in order to minimize future differences in the DD-tables. NOTE: since the dd.schemata table has the catalog_id column, it's required to maintain its value. One record with the default catalog information will be inserted into the dd.catalogs table. Dictionary implementation will be aware of the default catalog id and name. Let's now see how the DD API can evolve in the future. The Dictionary interface has the following logical groups of operations: 1. Retrieve truly global read-only objects (individual object by name or collections of objects). There are two types of such objects: Charset and Collation. These objects can not be changed via Data Dictionary. 2. Retrieve global objects, such as Tablespace or Schema (individual object by name or collections of objects). 3. Retrieve non-global object(s) by some criteria. Example: - get a table by SE-private-ID - get all indexes by SE-private-table-ID The following modifications will be done for the global Data Dictionary: - Catalog interface will be added - Schema will be moved in scope of Catalog, so there will be no Dictionary::get_schema(), but instead Catalog::fetch_schemas(). - Operations to deal with Catalogs will be added Dictionary. The following modifications will be done for the per-catalog Data Dictionary: - Catalog interface will be added - (2) and (3) will be moved from Dictionary to Catalog - Schema will be moved in scope of Catalog. So, there will be no Schema-related operations in the Dictionary interface, the Catalog interface will have them. - Operations to deal with Catalogs will be added Dictionary. Handling DROP without restoring objects --------------------------------------- Let's say we're going to drop a table. That means, we should drop all its columns, indexes, foreign keys, partitions and may be something else. There are 2 alternatives: 1. Object-oriented approach: restore a dd::Table-instance and call drop() on it. The drawback here is that we're doing extra job when we're handling just DROP TABLE statement: we don't actually need dd::Table and all its children to be restored. We need an C++ object just to call drop() on it and destroy it afterwards. 2. RDBMS-approach: delete a matching record from the main table (dd.tables) and do cascade delete of all matching records in dependent tables using foreign keys. I.e. we just need to execute an SQL-statement like DELETE FROM dd.tables WHERE ... and be done. The thing is that there is no CASCADE foreign keys, so it is the responsibility of the DD to go through all the needed records and delete them. For example, in case of DROP TABLE the Data Dictionary must delete records for all table columns, indexes, etc. The alternatives have been considered and it's been decided that (2) is an optimization of 'DROP TABLE', which we should not bother with. Object_reference ---------------- The following approach for handling weak object references was considered. We wanted to encapsulate a weak object reference by a C++ class, which had two members: - Object ID - C-pointer to the DD-object in case it had been already loaded into the cache. Depending on the state of the object, a reference might be in the following conditions: - NULL C-pointer, invalid object id -- uninitialized reference. That could happen when a new C++ object is instantiated in memory. - Valid C-pointer, invalid object id. That could happen when a new tree of C++ objects had been created and initialized, but not stored to DD-tables, so the object-id was yet unknown. - NULL C-pointer, valid object id That happened when the object was just retrieved and the referenced object had not been restored in the memory - Valid C-pointer, valid object id The main problem with that approach is maintaining consistent C-pointer in the reference class, as the DD-object might get evicted from the cache. Consider the following scenario: - A user requested a Table and its Schema. The reference-object was initialized: the Table references the Schema by C-pointer. - A user released both objects, so they were "back to" the cache and marked as unused. - The cache then decided to evict the Schema object, but keep the Table object. The C-pointer in the reference-object became invalid. In order to solve that, each reference object must be aware of the object management that the cache is doing. We didn't want go down that road.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.