WL#8150: Dictionary object cache
Affects: Server-8.0
—
Status: Complete
The goal of this worklog is to provide support for the following three closely related issues: 1. Providing a shared cache for DD objects. 2. Tracking usage of DD objects. 3. Handling modification of DD objects.
Functional overview =================== Dictionary users (i.e., the MySQL server code) may access dictionary objects through a cache client. This is the only API available for accessing dictionary objects. The cache client is associated with the user transaction context descriptor (the THD), and each connection will thus have its own cache client. The cache client maintains a register of the objects that have been accessed. When getting an object, the cache client's object register is first checked to see if the object is present. If so, the object is returned. If not, the object is retrieved from a shared, global cache. The client's object register and the shared cache both keeps pointers to the same object instances. When modifying an object, we rely on MDL locking to prevent race conditions between concurrent object users. An object to be modified must first be retrieved through the cache client, then it must be copied, and the changes may be done on the copy, which is eventually written to the persistent dd tables, while the read only copy in the cache is dropped, forcing a cache miss on next lookup. Both the client cache's object registry and the shared, global cache associate keys with objects. The following keys are supported as of now. These keys are used both for cache lookup, and retrieving records from the persistent DD tables: - ID based keys, which come in two flavors, those based on the dictionary id, and those based on the storage engine private id. - Name based keys, which are usually schema qualified names. For e.g. a table, the name key is comprised of the schema id and the table name. Object usage must be tracked to allow objects to be released (and memory freed, if space is needed by other objects) from the shared cache when appropriate. Usage is tracked on a per-client basis, i.e., when a cache client retrieves an object the first time, it is noted that the object is being used. The lifetime of a retrieved dictionary object ends when the user releases the object explicitly, either by releasing a single object, or by releasing all objects retrieved by the client. This will mark the objects previously access by this client as not being used by this client anymore. The objects must be released explicitly; not releasing the objects before the client is deleted is considered an error. The cache client and the shared global cache will be totally transparent to dictionary users, with the exception of the performance impact implied. The three sections below present requirements to the shared cache, the usage tracking, and the cache client. It should be emphasized that when referring to "the user" or "the dictionary user", we mean the server code using the dictionary, i.e., we do not refer to end users. 1. Requirements to the shared cache =================================== We propose the following functional requirements to the shared cache: F-1.1 The cache will be used in a multi threaded environment, and the implementation must be thread safe. F-1.2 The cache shall provide functions for adding objects to the cache, retrieving objects, releasing (i.e. marking the object as being used by one client less) and dropping objects. F-1.3 To lower mutex contention and ensure key uniqueness, the cache shall be templatized to support partitioning based on dictionary object type. F-1.4 The cache must not hold a mutex while doing I/O, i.e., the object cache must be designed to avoid holding a mutex longer than absolutely necessary. F-1.5 The cache shall provide support for tracking object usage, i.e., at any time, it shall be possible to determine whether an object in the cache is being used or not. F-1.6 When an object is released, and becomes unused, it shall kept in the cache (provided there is enough space), to be readily available for new requests for the same object. F-1.7 The cache shall provide support for cache eviction. The eviction strategy, i.e., the process for selecting the unused object to remove when a new object is requested and there is no free space in the cache, shall remove the least recently used object (see next section about tracking usage). F-1.8 The eviction strategy shall not have an impact on the external API of the cache, i.e., as seen from the users of the cache, the API shall remain the same. F-1.9 It shall be possible to set the cache capacity in terms of the total number of objects (both used and unused) to keep. The cache will always store the objects being used, so if the usage exceeds the capacity, no unused objects will be kept. If current usage is below the capacity, then unused objects will be kept. This setting shall be user configurable, and there shall be one configuration variable for each cache partition: - table_definition_cache: For the cache storing tables and views. - schema_definition_cache: For the cache storing schemata. - tablespace_definition_cache: For the cache storing tablespaces. - stored_program_definition_cache: For the cache storing stored programs and events. The variables shall have the following minimum, default and maximum values: - table_definition_cache: Min: 400 Def: 400 Max: 524288 - schema_definition_cache: Min: 256 Def: 256 Max: 524288 - tablespace_definition_cache: Min: 256 Def: 256 Max: 524288 - stored_program_definition_cache: Min: 256 Def: 256 Max: 524288 There will also be cache partitions for collations and character sets. The capacity of these will be hardcoded as 256, and will not be user configurable. F-1.10 It shall be possible to set cache capacity to 0, meaning that the cache will not keep objects that are not currently being used. F-1.11 The dictionary cache will exist in parallel with the current Table Definition Cache (TDC). The TDC caches TABLE_SHARE objects, which will not be eliminated yet. This means that the two caches must be kept separate, i.e., objects in one cache must not rely on the existence of objects in the other cache. As long as the TDC exists, the capacity of the new cache is not user configurable, but is set to the value of 'max_connections', the reasoning being that on average, there should be room in the cache for each connection to leave one table element unused. F-1.12 The dictionary cache will exist in parallel with the current Stored Program Cache (SPC). The main reason for this is that the current SPC is per connection, while the DD object cache is not, and changing the SPC to become shared is beyond the scope of this worklog. The concern regarding cache synchronization as mentioned for the TDC applies here as well. When the old cache is removed, the existing configuration variable 'stored_program_cache' will be deprecated. F-1.13 The dictionary cache shall not be accessed directly by dictionary users (i.e., the server code outside of the dictionary module). All cache usage shall be implicit through the cache client API (see below). F-1.14 The cache shall be internally consistent, meaning that for every dictionary object, its access keys in the cache shall match the object's fields upon which the keys are based. F-1.15 The cache shall support lookup based, initially, on three key types: 1. The dictionary id. 2. The object name, possibly schema qualified. 3. The storage engine private id. It shall be simple to add support for additional key types. F-1.16 On a cache miss, the missing object shall not be loaded multiple times when requested from multiple connections simultaneously. Loading shall happen only once, forcing others to wait for loading to finish. An exception to this is when the same object is retrieved simultaneously using different key types. In that case, there is no way to check that the keys belong to the same object before the object is actually loaded. Thus, in this case, the object is loaded more than once. F-1.17 When loading is completed, all keys of the object must be generated. It must be verified that the key used for retrieval matches the same key type generated based on the object. The set of keys must either all be present in the cache (meaning that the object has been loaded by a different connection concurrently), or none must be present in the cache. In the first case, the already cached object shall be used, and the newly retrieved object shall be deleted. In the second case, the newly retrieved object shall be added to the cache. The presence of a partial set of the object keys is an error. 2. Tracking object usage ======================== To support cache eviction, there must be a way to track object usage. This must be an integral part of the cache (see above) and cache client (see below) infrastructure, and does not need to be visible from the outside, i.e., from the dictionary user's side (i.e., the server code outside of the dictionary module). F-2.1 For each object in the cache, it shall be possible to find the present number of concurrent cache clients using the object. We call this the reference counter. F-2.2 When an object is retrieved by a cache client, the reference counter shall be incremented. F-2.3 When an object is released by a cache client, the reference counter shall be decremented. F-2.4 When the reference counter of an object becomes 0, it is added to the list of currently unused objects. If the cache capacity is exceeded, the object which has been unused for the longest time (i.e., the least recently used object) shall be removed (see below). F-2.5 When an object is retrieved and already present in the shared cache, and it is not being used by another cache client, it shall be removed from the list of currently unused objects. F-2.6 Upon a cache miss, when the missing object is retrieved from persistent storage and added to the shared cache, if the capacity is exceeded, the least recently used object shall be evicted from the shared cache, i.e., its memory is freed, and the entries in the free list and the various key maps are removed. F-2.7 The dictionary objects must be self contained. There must not be pointers between separate top level dictionary objects that rely on a different object being present in the cache. If one object needs another object, then the other object must be retrieved explicitly, on demand. 3. Cache client =============== The cache client will be used for retrieving and storing objects, and provides the only interface that dictionary users (i.e., the server code outside of the dictionary module) will have for accessing dictionary objects. F-3.1 All cache clients shall share a common cache for objects that are only being read, and not being modified. F-3.2 The cache client shall keep a local record of the objects that have been retrieved. If an object has been retrieved previously by the same cache client, it is re-retrieved from the local record. Thus, an object will be retrieved from the shared cache only once per cache client. F-3.3 Modifying an object can be done in three ways. All three use cases below will require the dictionary user to have an MDL X-lock on the object. i. The dictionary user may retrieve a read only object, and then ask the client to drop it. The cache client will remove and delete the object instance in the local object registry and the shared cache, and delete the corresponding row(s) in the dd tables. ii. The dictionary user may retrieve a read only object, and then copy it, do the necessary modifications on the copy, and eventually ask the client to replace the existing read only object by the newly modified object, and to store the object in the appropriate dd tables. The object copying is not supported by the cache client, this is something the user is reponsible for doing. iii. The dictionary user may create a new dictionary object instance and ask the cache client to store it in the dd tables. Creating the object instance is not supported by the cache client, this is something the dictionary user is reponsible for doing. F-3.4 When a cache client is deleted, it shall be verified that all objects that have been accessed also have been released. Thus, releasing accessed objects must be done explicitly, and is the responsibility of the cache client users. The motivation for this is twofold: - To enable reusing cache client instances, i.e., a cache client will not need to be deleted and allocated over again in order to be used in a new context. - To keep the lifespan of the dictionary objects as short as possible, to avoid unnecessarily keeping unused object in the cache.
Overall design ============== The main components in the design, needed to implement the requirements identified above, are the following: 1. Cache client with local object registry: This is needed for three purposes: - To get hold of the object if it is re-retrieved by the client. - To know which objects to release when client is deleted. - To provide a uniform API for object management, dispatching to the shared cache if necessary. The cache client is associated with a THD, and each THD instance will have a cache client member. 2. Shared read only cache: The shared cache provides a templatized interface for different key types and object types. Internally, the cache will be partitioned on object type, and for each instance, there will be separate maps for the key types. The object instances will be wrapped in a cache element instance which supports tracking of object usage. The remainder of this section provides a subsection describing som basic assumptions, followed by a subsection for the two main components. 1. Assumptions -------------- 1.1 Key types ............. The design below frequently refers to key types. We assume that the dictionary types provide type definitions for the following type names: - id_key_type - name_key_type - aux_key_type The cache implementation relies on these three types being defined, and the implementation is therefore not completely generic. The implementation also relies on a comparison operator '<' being defined for each of the key types. This is needed by the maps storing associations between keys and objects. 1.2 Memory allocation ..................... For now, we assume that the required dynamic memory is allocated on the heap. There does not seem to be a suitable mem root in which to allocate the required memory. 1.3 Object relations .................... We assume that all top level objects in the cache are self contained in terms of not relying on the existence on other top level object in the cache. If one object needs to get hold of another object, the other object must be retrieved explicitly, on demand. 2. Cache client --------------- The cache client provides a unified interface to retrieving dictionary objects. The client maintains a local object registry keeping track of the objects that have been retrieved. This registry is used both for retrieving new objects, when the used is asking for the same object over again, and for implicitly releasing objects when the cache client is deleted. The registry is also used to do reverse lookups from dictionary objects to cache element wrapper instances. We must handle situations where an object is actually acquired from the shared cache, while the dynamic cast to a subtype fails. We use the auto release mechanism (see below) to achieve that. The remainder of this section will first present the public cache client API, followed by the protected cache client API, and then the internal object registry, and finally the auto releaser. 2.1 Public cache client API ........................... The public API provides one generic method for releasing all objects used by the client, and it also provides sets of type specific methods for retrieving, releasing, dropping, storing and replacing objects. /** Retrieve an object by its object id. @param id Object id to retrieve. @return Dictionary object, if present; otherwise NULL. */ templateconst T *acquire(Object_id id) /** Retrieve an object by its name. @param object_name Name of the object. @return Dictionary object, if present; otherwise NULL. */ template const T *acquire(const std::string &object_name) /** Retrieve an object by its schema- and object name. @param schema_name Name of the schema containing the table. @param object_name Name of the object. @return Dictionary object, if present; otherwise NULL. */ template const T *acquire(const std::string &schema_name, const std::string &object_name) /** Mark all objects as not being used by this client anymore. This function will release all objects used by this client. The function verifies that the objects are in use. If the objects become unused, they are added to the free list in the shared cache, which is then rectified to enforce its capacity constraints. The objects are also removed from the client's object registry. @return Number of objects released. */ size_t release() /** Mark a single object as not being used by this client anymore. This function will release a single object used by this client. The function verifies that the object is in use. If the object becomes unused, it is added to the free list in the shared cache, which is then rectified to enforce its capacity constraints. The object is also removed from the client's object registry. */ template void release(const T* object) /** Remove and delete an object from the cache and the dd tables. This function will remove the object from the local registry as well as the shared cache. This means that all keys associated with the object will be removed from the maps, and the cache element wrapper will be deleted. Afterwards, the object pointed to will also be deleted, and finally, the corresponding entry in the appropriate dd table is deleted. The object may not be accessed after calling this function. @param object Object to be dropped. */ template void drop(const T *object) /** Store a new dictionary object. This function will write the object to the dd tables. The object is not added neither to the cache client's object registry nor the shared cache. @param object Object to be stored. */ template void store(const T* object) /** Replace a dictionary object by another and store it. This function will replace one dictionary object by another. The object is also stored to the dd tables. The old object is deleted and may not be accessed after calling this function. The element is still present in the local object registry, and must be released eventually. @param old_object Old object to be replaced. @param new_object New object to replace the old one. */ template void replace(const T* old_object, const T* new_object) /** Update a modified dictionary object. This function will regenerate the keys of the object and store it to the dd tables. The element is still present in the local object registry, and must be released eventually. @param object Object to be updated. */ template void update(const T* object) /** Retrieve a collation by its name. @note Need explicit specialization due to different key signature. Must declare the template here (to avoid the compiler generating its own instance for this typename parameter) and define it in the cc file (to get the same definition in different translation units). @param object_name Name of the object. @return Dictionary object, if present; otherwise NULL. */ template <> const Collation *Cache_client::acquire(const std::string &object_name); /** Retrieve a character set by its name. @note Need explicit specialization due to different key signature. Must declare the template here (to avoid the compiler generating its own instance for this typename parameter) and define it in the cc file (to get the same definition in different translation units). @param object_name Name of the object. @return Dictionary object, if present; otherwise NULL. */ template <> const Charset *Cache_client::acquire(const std::string &object_name); /** Retrieve a tablespace by its name. @note Need explicit specialization due to different key signature. Must declare the template here (to avoid the compiler generating its own instance for this typename parameter) and define it in the cc file (to get the same definition in different translation units). @param object_name Name of the object. @return Dictionary object, if present; otherwise NULL. */ template <> const Tablespace *Cache_client::acquire(const std::string &object_name); 2.2 Protected cache client API .............................. /** Get a dictionary object. The operation retrieves a dictionary object by one of its keys from the cache and returns it through the object parameter. If the object is already present in the client's local object registry, it is fetched from there. Otherwise, it is fetched from the shared cache (possibly involving a cache miss), and eventually added to the local object registry. If there is no object for the given key, NULL is returned. The shared cache owns the returned object, i.e., the caller must not delete it. After using the object(s), the user must call release() on the cache client to release all objects used by the client. The reference counter for the object is incremented if the object is retrieved from the shared cache. If the object was present in the local registry, the reference counter stays the same. A cache miss is handled transparently. @note This function must be called with type T being the same as T::cache_partition_type. Dynamic casting to the actual subtype must be done on an outer level. @param key Key to use for looking up the object. @param object Object pointer, if present. NULL if not present. */ template void acquire(const K &key, const T **object) /** Mark all objects of a certain type as not being used by this client. This function is called with the client's own object registry, or with the registry of an auto releaser (which will contain a subset of the objects in the client's object registry). The function will release all objects of a given type in the registry submitted.The objects must be present and in use. If the objects become unused, they are added to the free list in the shared cache, which is then rectified to enforce its capacity constraints. The objects are also removed from the client's object registry. @return Number of objects released. */ template size_t release(Object_registry *registry) 2.2 Client local object registry ................................ The client's local object registry is an instance of the class presented below. Note that the methods operate on elements, i.e., the elements that wrap the dictionary objects. This is to allow the object registry and the shared cache to provide access to the same element instances. The mapping from between elements and dictionary objects happen in the cache client's API. The registry is mainly a collection of maps for each type supported. The functions dispatch to the appropriate map based on the key and object type parameter. There is no support for locking or thread synchronization. The object registry is intended to be used as a thread local record of which objects have been used, and provides the methods listed below. /** Get the element corresponding to the given key. @param key Key too lookup. @param [out] element Element, if present, otherwise, NULL. */ template void get(const K &key, Cache_element **element); /** Add a new element to the registry. @param element Element to be added. */ template void put(const K &key, Cache_element *element); /** Remove an element from the registry. @param element Element to be removed. */ template void remove(Cache_element *element); 2.3 Auto releaser ................. This class keeps a register of objects that are automatically released when the instance goes out of scope. When a new instance is created, the encompassing cache client's current auto releaser is replaced by this one, keeping a link to the old one. When the auto releaser is deleted, it links the old releaser back in as the client's current releaser. Objects that are added to the auto releaser will be released when the object is deleted. Only the cache client is allowed to add objects to the auto releaser. The usage pattern is that objects that are retrieved from the shared dictionary cache are added to the current auto releaser. Objects that are retrieved from the client's local object register are not added to the auto releaser. Thus, when the releaser is deleted, it releases all objects that have been retrieved from the shared cache during the lifetime of the releaser. private: // Register an object to be auto released. template void auto_release(const K& key, Cache_element *element) // Remove an object from the auto release registry. template void no_release(const T* object) public: // Save old releaser and install this. Auto_releaser(THD *thd): m_thd(thd), m_prev(thd->dd_cache()->m_releaser) // Release all objects registered and restore previous releaser. ~Auto_releaser() 3. Shared cache --------------- The cache will be made up of several components hidden behind a single cache API providing the following functions to its user (i.e., the cache client): - Get an element from the cache. - Release an element. - Drop an element. The dictionary cache is mainly a collection of shared maps for the object types supported. The functions dispatch to the appropriate map based on the key and object type parameter. Cache misses are handled by retrieving the object from the cache miss handler member instance. The first section below presents the cache API, and the second section breaks down the internal structure of the cache into smaller components. 3.1 Shared cache API .................... /** Get an element from the cache, given the key. The operation retrieves an element by one of its keys from the cache (possibly involving a cache miss, which will need the thd to handle the miss) and returns it through the parameter. If there is no element for the given key, NULL is returned. The cache owns the returned element, i.e., the caller must not delete it. After using the element, release() must be called for every element received via get(). The reference counter for the element is incremented if the element is retrieved from the shared cache. @param thd Thread context. @param key Key to use for looking up the object. @param element Element pointer, if present. NULL if not present. */ template void get(THD *thd, const K &key, Cache_element **element); /** Release an element used by a client. The element must be present and in use. If the element becomes unused, it is added to the free list, which is then rectified to enforce its capacity constraints. @param element Element pointer. */ template void release(Cache_element *e); /** Delete an element from the cache. This function will remove the element from the cache and delete the object pointed to. This means that all keys associated with the element will be removed from the maps, and the cache element wrapper will be deleted. The object may not be accessed after calling this function. @param element Element pointer. */ template void drop(Cache_element *element); /** Replace the object and re-create the keys for an element. The operation removes the current keys from the internal maps in the cache, assigns the new object to the element, generates new keys based on the new object, and inserts the new keys into the internal maps in the cache. The old object is deleted. @param element Element pointer. */ template void replace(Cache_element *element, const T *object); 3.2 Internal components of the cache .................................... The dictionary objects are wrapped within cache elements, which are associated with the three keys (defined by id_key_type, name_key_type and aux_key_type) in three corresponding element maps for each cache partition. Both the shared cache and the client's object registry will utilize the same basic element maps. The shared cache adds locking, free list management and cache miss handling, while the object registry adds iteration on top of the basic element maps. In the following, we elaborate a bit more on the main components presented above. /** Implementation of a cache element. This template class implements a wrapper to support caching of arbitrary objects. The wrapper provides support for free list linkage and reference counting, but does not make any assumptions regarding the semantics of this functionality. The enforcement of such assumptions must be built into the layer using the cache element implementation. The cache element stores copies of the keys that are used for looking up the object in the cache. This is needed to support fast reverse lookup of keys, given the object instance, e.g. to enable removing old keys when new keys must be created. @note The usage of the reference counter is not implemented by means of atomic operations. We expect locking at an outer level to take care of race conditions. */ template class Cache_element; /** Template for management of a double linked free list. The free list keeps pointers to first and last element in the list as well as a set of functions to manage the free list. The first element in the free list is the least recently used element. When a new element becomes unused, it is added to the end of the free list. */ template class Free_list; /** Implementation of a map between a key type and an element type. The map supports storing associations between instances of the key type K and pointers to the element type E. There are no expectations as far as the element type E is concerned. Tracking object usage, memory management, loading and evicting elements, locking and synchronization, as well as registering cache misses are all expected to be handled by users of this interface. Basic assumptions regarding correct usage is implemented in terms of asserts to verify that, e.g., a key does not already exist in the map when it is added. @note The element pointer in a key/value pair may not be NULL. */ template class Element_map; /** Implementation of a set of maps for a given object type. The class declares a set of maps, each of which maps from a key type to an element type. The element type wraps the template object type parameter into a wrapper instance. The implementation is intended to be used as a base to be extended for usage in a specific context. There is support for adding and removing elements in all maps at once, and for retrieving a single map. There is no support for tracking object usage, free list management, thread synchronization, etc. */ template class Multi_map_base; /** Implementation of a local set of maps for a given object type. The implementation is an extension of the multi map base, adding support for iteration. It is intended to be used in a single threaded context, and there is no support for tracking object usage, free list management, thread synchronization, etc. */ template class Local_multi_map: public Multi_map_base /** Implementation of a shared set of maps for a given object type. The implementation is an extension of the Multi_map_base, and adds support for: - Locking and thread synchronization. - Broadcasting a miss which has been handled. - Tracking object usage. - Managing the free list. - Tracking cache misses. @note The multi map provides support for tracking cache misses, but does not handle the miss. The miss must be handled by the user of the multi map. */ template class Shared_multi_map: public Multi_map_base ; /** Handling of cache misses. This class provides a template function that retrieves an object from persistent storage based on the submitted key and object type. */ class Miss_handler; /** Shared dictionary cache template containing several maps. The dictionary cache is mainly a collection of shared maps for the object types supported. The functions dispatch to the appropriate map based on the key and object type parameter. Cache misses are handled by retrieving the object from the cache miss handler member instance. The miss handler type is submitted as a template parameter. */ template class Shared_dictionary_cache; 4. New configuration variables ------------------------------ Three new configuration variables will be added, as described in the functional requirements F-1.9, F-1.11 and F-1.12. The variable 'table_definition_cache' will eventually be reused for the new cache being introduced when the old TDC is abandoned.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.