WL#14079: P_S memory instrumentation for DD and runtime code

Affects: Server-8.0   —   Status: Complete

The purpose of this worklog is to review current performance schema memory keys owned by runtime, propose a uniform naming scheme, update key names and append descriptions.

The WL will also implement basic P_S memory keys for the major data dictionary data structures. Suggested keys for first step:

  • dd::infrastructure
  • dd::objects

Note that the bulk of the object data is allocated with the dd::String_type, which has the associated P_S key dd::String_type. Hence, at least for now, it probably makes sense to have a single P_S key covering all DD objects since the dd::String_type key is used for all character strings anyway.

Contents


Use cases

Below are the main use cases for this functionality.

Cache size on idle system

If the system is running with expected load, then the user load is stopped, we will see the memory used by the data dictionary cache to store the non-evicted data dictionary objects. We cannot distinguish between memory used by different types of DD objects, but it is probably safe to assume that the bulk of this is occupied by table objects. Cache size can then be tuned if the size of the memory allocated needs to be changed. The system needs to be idle because with the current performance schema implementation, the incrementation and decrementation of counters is not always properly serialized, and hence can be misleading.

Cache size on system with DML load

If the data dictionary cache has sufficient capacity, it should be able to hold all data dictionary objects needed by the queries that are executed with few cache misses. To see if this is the case, we can set the size of the table definition cache to some small value (to force new table shares to be created when tables are opened), then we can run the system with the expected DML load. If the data dictionary cache is sufficiently large, then there should be only small amounts of additional allocation and freeing of data dictionary objects. For a system with DDL load, this approach will not make sense since new data dictionary objects are allocated regardless of the cache size.

Memory usage on system with load

If the performance schema memory counters were incremented and decremented in the correct order, then it might also be possible to use the data dictionary memory keys to see the maximum memory usage by the data dictionary cache. However, if external load is halted, then it might be possible to analyze this in retrospect.

Enable measuring total amount of allocated shared memory

Counting memory allocations to constrain memory usage can be done by piggybacking on the performance schema framework, i.e., the performance schema counters are not used directly, but different counters are maintained while using part of the performance schema framework. Thus, introducing the data dictionary memory keys may also allow them to be part of a forthcoming mechanism to monitor and constrain allocation of global shared memory. The implementation of such a mechanism is not part of this WL, though.

Functional requirements

F1
Add descriptions for performance schema memory keys owned by the Runtime team as listed in the HLS.
F2
Change key names as listed in the HLS to conform to a uniform naming scheme.
F3
Implement support for the additional data dictionary memory keys as listed in the HLS according to the deign in the LLD.

Contents


Current runtime P_S memory keys

Below, we list all the P_S memory keys. For the keys owned by runtime team we will:

  • Add a description for all keys.
  • Propose a new name for some of the keys.
  • New keys are listed with 'existing key name' being empty.

For the key naming scheme, we suggest to group related keys by using a key prefix, in the same style as C++ namespaces. E.g., for the data dictionary keys, we use the 'dd::' prefix, for the THD related keys, we use the 'THD::' prefix, etc.

The naming scheme of the P_S keys follows the pattern 'memory/category/instrument_name', hence we could also introduce new categories, however, this is presumably too much of a change for a maintenance release.

Existing key name New key name Description
acl_cache
acl_map_cache
binlog_cache_mngr
binlog_pos
binlog_statement_buffer
bison_stack
Blob_mem_storage::storage
db_worker_hash_entry
dd::column_statistics - Column statistics histograms allocated
dd::default_values - Temporary buffer for preparing column default values
dd::import - File name handling while importing MyISAM tables
dd::String_type - Character strings used by data dictionary objects
- dd::infrastructure Infrastructure of the data dictionary structures
- dd::objects Memory occupied by the data dictionary objects
debug_sync_control::debug_sync_action THD::debug_sync_action Debug sync actions to perform per thread.
Delegate::memroot
display_table_locks - Debug utility
errmsgs errmsgs::server In-memory representation of server error messages
Event_basic::mem_root - Event base class with root used for definiton etc.
Event_queue_element_for_exec::names - Copy of schema- and event name in exec queue element
Event_scheduler::scheduler_param - Infrastructure of the priority queue of events
File_query_log::name
Filesort_buffer::sort_keys
Filesort_info::merge
Filesort_info::record_pointers
Geometry::ptr_and_wkb_data
Gis_read_stream::err_msg
global_system_variables
Gtid_set::Interval_chunk
Gtid_set::to_string
Gtid_state::group_commit_sidno_locks
Gtid_state::to_string
handler::errmsgs errmsgs::handler Handler error messages (HA_ERR_...)
handlerton handlerton::objects Handlerton objects
hash_index_key_buffer
hash_join
HASH_ROW_ENTRY
help - Temporary memroot used to print help texts as part of usage description
histograms
host_cache::hostname - Hostname keys in the host_cache map
JOIN_CACHE
JSON
load_env_plugins
Locked_tables_list::m_locked_tables_root - Memroot for list of locked tables
log_error_loaded_services log_error::loaded_services Memory allocated for duplicate log events
log_error_stack log_error::stack Log events for the error log
Log_event
LOG_name LOG::file_name File name of slow log and general log
LOG_POS_COORD
MDL_context_backup_manager MDL_context::backup_manager MDL for prepared XA trans with disconnected client
MDL_context::acquire_locks - Buffer for sorting lock requests
MPVIO_EXT::auth_info
Mutex_cond_array::Mutex_cond
my_bitmap_map
my_str_malloc
MYSQL_BIN_LOG::basename
MYSQL_BIN_LOG::index
MYSQL_BIN_LOG::recover
MYSQL_LOCK - Table locks per session
MYSQL_LOG::name
mysql_plugin
mysql_plugin_dl
MYSQL_RELAY_LOG::basename
MYSQL_RELAY_LOG::index
NAMED_ILINK::name - Names in the MyISAM key cache
NET::buff - Buffer in the client protocol communications layer
NET::compress_packet - Buffer used when compressing a packet
opt_bin_logname
Owned_gtids::sidno_to_hash
Owned_gtids::to_string
Partition_admin Partition::admin Buffer for printing messages into the client protocol
Partition_share Partition::share Partition name and auto increment mutex
partition_sort_buffer Partition::sort_buffer Record buffer for a partition
partition_syntax_buffer Partition::syntax_buffer Buffer used for formatting the partition expression
plugin_bookmark
plugin_init_tmp
plugin_int_mem_root
plugin_mem_root
plugin_ref
Prepared_statement_map Prepared_statement::infrastructure Map infrastructure for prepared statements per session
Prepared_statement::main_mem_root - Mem root for each prepared statement for items etc.
PROFILE
prune_partitions::exec Partition::prune_exec Mem root used temporarily while pruning partitions
Queue::queue_item
QUICK_GROUP_MIN_MAX_SELECT::alloc
QUICK_INDEX_MERGE_SELECT::alloc
QUICK_RANGE_SELECT::alloc
QUICK_RANGE_SELECT::mrr_buf_desc
Quick_ranges
QUICK_ROR_INTERSECT_SELECT::alloc
QUICK_ROR_UNION_SELECT::alloc
READ_INFO
READ_RECORD_cache
Recovered_xa_transactions XA::recovered_transactions List infrastructure for recovered XA transactions
Relay_log_info::mts_coor
root
Row_data_memory::memory
rpl_filter memory
Rpl_info_file::buffer
Rpl_info_table
rpl_slave::check_temp_dir
servers - Note: Duplicate of the key below, will be deleted
servers_cache - Cache infrastructure and mem root for servers cache
Shared_memory_name - Communication through shared memory (windows)
show_slave_status_io_gtid_set
Sid_map::Node
SLAVE_INFO
Slave_job_group::group_relay_log_name
sp_head::call_mem_root - Mem root for objects with same life time as stored program call
sp_head::execute_mem_root - Mem root per instruction
sp_head::main_mem_root - Mem root for parsing and representation of stored programs
sql_acl_mem
sql_acl_memex
ST_SCHEMA_TABLE - Structure describing an information schema table implemented by a plugin
String::value
Sys_var_charptr::value
TABLE - Memory used by TABLE objects and their mem root
table_mapping::m_mem_root
TABLE_RULE_ENT
TABLE_SHARE::mem_root - Cache infrastructure and individual table shares
TABLE::sort_io_cache
TC_LOG_MMAP::pages - In-memory transaction coordinator log
test_quick_select
thd_timer - Thread timer object
THD::db - Name of currently used schema
THD::debug_sync_control - Structure to control debug sync per thread
THD::handler_tables_hash - Hash map of tables used by HANDLER statements
thd::main_mem_root THD::main_mem_root Main mem root used for e.g. the query arena
THD::Session_sysvar_resource_manager
THD::Session_tracker
THD::sp_cache - Per session cache for stored programs
THD::transactions::mem_root - Transaction context information per session
THD::variables - Per session copy of global dynamic variables
tz_storage - Shared time zone data
udf_mem - Shared structure of UDFs
Unique::merge_buffer
Unique::sort_buffer
user_conn - Objects describing user connections
User_level_lock - Per session storage of user level locks
user_var_entry
user_var_entry::value
write_set_extraction
XID XA::transaction_contexts Shared cache of XA transaction contexts

Runtime P_S memory keys grouped by relation

Keys related to the same subsystem or data structure are grouped below. We also split the groups into the shared resources and the per-session resources. Note that the shared resources are also usually allocated by request from a session, however, the shared resource may also be used by other sessions, and is usually not freed when the session exits. Per-session resources, on the other hand, are only available to the session that has allocated them, and are freed when the session ends.

Shared resources

  • Data dictionary (shared):
    • dd::column_statistics
    • dd::default_values
    • dd::import
    • dd::String_type
    • dd::objects
    • dd::infrastructure
  • Table definition cache (shared):
    • TABLE_SHARE::mem_root
  • Event scheduler (shared):
    • Event_basic::mem_root
    • Event_queue_element_for_exec::names
    • Event_scheduler::scheduler_param
  • Transaction coordinator (shared):
    • TC_LOG_MMAP::pages
  • XA (shared):
    • Recovered_xa_transactions
    • XID
  • Host name cache (shared):
    • host_cache::hostname
  • Error messages, error- and query logging (shared):
    • errmsgs
    • handler::errmsgs
    • log_error_loaded_services
    • log_error_stack
    • LOG_name
  • MDL (shared):
    • MDL_context_backup_manager
    • MDL_context::acquire_locks
  • Information schema (shared):
    • ST_SCHEMA_TABLE
  • Time zone storage (shared):
    • tz_storage
  • User defined functions (shared):
    • udf_mem
  • User connections (shared):
    • user_conn
  • Help (shared):
    • help
  • Handlerton (shared):
    • handlerton
  • Servers cache for Federated (shared):
    • servers
    • servers_cache
  • MyISAM (shared):
    • NAMED_ILINK::name

Per session resources

  • Per THD memory (session):
    • thd_timer
    • THD::db
    • THD::debug_sync_control
    • THD::handler_tables_hash
    • thd::main_mem_root
    • THD::sp_cache
    • THD::transactions::mem_root
    • THD::variables
    • MYSQL_LOCK
    • Locked_tables_list::m_locked_tables_root
  • Table cache (session):
    • TABLE
  • SP structures (session):
    • sp_head::call_mem_root
    • sp_head::execute_mem_root
    • sp_head::main_mem_root
  • Prepared statements (session):
    • Prepared_statement_map
    • Prepared_statement::main_mem_root
  • User level locks (session):
    • User_level_lock
  • Partitioning (session):
    • Partition_admin
    • Partition_share
    • partition_sort_buffer
    • partition_syntax_buffer
    • prune_partitions::exec
  • Protocol / communication (session):
    • NET::buff
    • NET::compress_packet
    • Shared_memory_name
  • Debug (session):
    • display_table_locks

Contents


Micro benchmarking

We will implement benchmarking to verify:

  • The memory allocation overhead imposed by the P_S instrumentation.
  • The performance overhead imposed by the P_S instrumentation.

Unit tests will be implemented to test these two issues. For the performance related test, we will use the existing micro benchmarking framework for the unit tests.

Small allocations

Adding P_S memory instrumentation implicitly adds some overhead in terms of allocating additional memory to store P_S meta data. For small objects, the overhead can be substantial. However, for most data dictionary objects, the size can be assumed to be sufficiently large that it makes sense to instrument all object allocations.

We will introduce monitoring and control of per-session memory. Here, allocations for certain P_S memory keys are counted. The counting itself is a separate mechanism, so the P_S instrumentation is added for all memory keys in the same way as it used to be. Hence, even if some allocations are not counted by the resource constraint framework, the overhead imposed by the P_S instrumentation is still present.

Possibly increase max number of keys

  • We might have to increase performance_schema_max_memory_classes to accommodate the new classes. The current max is 450. There are 422 classes defined in the server, InnoDB and elsewhere.
  • The 75 memory instruments defined by the Performance Schema are not counted against max_memory_classes.
  • Status var performance_schema_memory_classes_lost will indicate > 0 if max_memory_classes is insufficient, and at least one P_S mtr test will fail.

Change the key type for the dd::String_type

For future use by a resource control mechanism, we may need to see which threads have allocated the memory consumed by the data dictionary objects.

---------------------------- sql/psi_memory_key.cc ----------------------------
@@ -311,7 +311,7 @@ static PSI_memory_info all_server_memory[] = {
      PSI_DOCUMENT_ME},
     {&key_memory_TABLE, "TABLE", PSI_FLAG_ONLY_GLOBAL_STAT, 0, PSI_DOCUMENT_ME},
     {&key_memory_LOG_name, "LOG_name", 0, 0, PSI_DOCUMENT_ME},
-    {&key_memory_DD_String_type, "dd::String_type", PSI_FLAG_ONLY_GLOBAL_STAT,
+    {&key_memory_DD_String_type, "dd::String_type", 0,
      0, PSI_DOCUMENT_ME},
     {&key_memory_ST_SCHEMA_TABLE, "ST_SCHEMA_TABLE", 0, 0, PSI_DOCUMENT_ME},
     {&key_memory_PROFILE, "PROFILE", 0, 0, PSI_DOCUMENT_ME},

Define the new performance schema memory keys

---------------------------- sql/psi_memory_key.cc ----------------------------
@@ -33,9 +33,11 @@
   MAINTAINER: Please keep this list in order, to limit merge collisions.
 */
 
+PSI_memory_key key_memory_DD_cache_infrastructure;
 PSI_memory_key key_memory_DD_column_statistics;
 PSI_memory_key key_memory_DD_default_values;
 PSI_memory_key key_memory_DD_import;
+PSI_memory_key key_memory_DD_objects;
 PSI_memory_key key_memory_DD_String_type;
 PSI_memory_key key_memory_Event_queue_element_for_exec_names;
 PSI_memory_key key_memory_Event_scheduler_scheduler_param;
@@ -300,11 +302,14 @@ static PSI_memory_info all_server_memory[] = {
     {&key_memory_JOIN_CACHE, "JOIN_CACHE", 0, 0, PSI_DOCUMENT_ME},
     {&key_memory_TABLE_sort_io_cache, "TABLE::sort_io_cache", 0, 0,
      PSI_DOCUMENT_ME},
+    {&key_memory_DD_cache_infrastructure, "dd::cache_infrastructure", 0, 0,
+     PSI_DOCUMENT_ME},
     {&key_memory_DD_column_statistics, "dd::column_statistics", 0, 0,
      PSI_DOCUMENT_ME},
     {&key_memory_DD_default_values, "dd::default_values", 0, 0,
      PSI_DOCUMENT_ME},
     {&key_memory_DD_import, "dd::import", 0, 0, PSI_DOCUMENT_ME},
+    {&key_memory_DD_objects, "dd::objects", 0, 0, PSI_DOCUMENT_ME},
     {&key_memory_Unique_sort_buffer, "Unique::sort_buffer", 0, 0,
      PSI_DOCUMENT_ME},
     {&key_memory_Unique_merge_buffer, "Unique::merge_buffer", 0, 0,

----------------------------- sql/psi_memory_key.h -----------------------------
@@ -58,9 +58,11 @@ extern PSI_memory_key key_memory_string_service_iterator;
 /*
   These are defined in psi_memory_key.cc
  */
+extern PSI_memory_key key_memory_DD_cache_infrastructure;
 extern PSI_memory_key key_memory_DD_column_statistics;
 extern PSI_memory_key key_memory_DD_default_values;
 extern PSI_memory_key key_memory_DD_import;
+extern PSI_memory_key key_memory_DD_objects;
 extern PSI_memory_key key_memory_DD_String_type;
 extern PSI_memory_key key_memory_Event_queue_element_for_exec_names;
 extern PSI_memory_key key_memory_Event_scheduler_scheduler_param;

Use the appropriate key for various list infrastructures

-------------------------- sql/dd/cache/element_map.h --------------------------

@@ -29,6 +29,7 @@
 
 #include "my_dbug.h"
 #include "sql/malloc_allocator.h"  // Malloc_allocator.
+#include "sql/psi_memory_key.h"    // key_memory_DD_cache_infrastructure
 
 namespace dd {
 namespace cache {
@@ -84,9 +85,10 @@ class Element_map {
 
  public:
   Element_map()
-      : m_map(std::less<K>(),
-              Malloc_allocator<std::pair<const K, E *>>(PSI_INSTRUMENT_ME)),
-        m_missed(std::less<K>(), Malloc_allocator<K>(PSI_INSTRUMENT_ME)) {
+      : m_map(std::less<K>(), Malloc_allocator<std::pair<const K, E *>>(
+                                  key_memory_DD_cache_infrastructure)),
+        m_missed(std::less<K>(),
+                 Malloc_allocator<K>(key_memory_DD_cache_infrastructure)) {
   } /* purecov: tested */
 
   /**

------------------------ sql/dd/impl/cache/free_list.h ------------------------
@@ -27,6 +27,7 @@
 
 #include "my_dbug.h"
 #include "sql/malloc_allocator.h"  // Malloc_allocator.
+#include "sql/psi_memory_key.h"    // key_memory_DD_cache_infrastructure
 
 namespace dd {
 namespace cache {
@@ -54,7 +55,8 @@ class Free_list {
   List_type m_list;  // The actual list.
 
  public:
-  Free_list() : m_list(Malloc_allocator<E *>(PSI_INSTRUMENT_ME)) {}
+  Free_list()
+      : m_list(Malloc_allocator<E *>(key_memory_DD_cache_infrastructure)) {}
 
   // Return the actual free list length.
   size_t length() const { return m_list.size(); }

--------------------- sql/dd/impl/cache/shared_multi_map.h ---------------------
@@ -52,6 +52,7 @@
 #include "sql/dd/types/tablespace.h"
 #include "sql/malloc_allocator.h"  // Malloc_allocator.
 #include "sql/mysqld.h"            // max_connections
+#include "sql/psi_memory_key.h"    // key_memory_DD_cache_infrastructure
 #include "thr_mutex.h"
 
 namespace dd {
@@ -137,9 +138,10 @@ class Shared_multi_map : public Multi_map_base<T> {
    public:
     // Lock the multi map on instantiation.
     explicit Autolocker(Shared_multi_map<T> *map)
-        : m_objects_to_delete(Malloc_allocator<const T *>(PSI_INSTRUMENT_ME)),
-          m_elements_to_delete(
-              Malloc_allocator<const Cache_element<T> *>(PSI_INSTRUMENT_ME)),
+        : m_objects_to_delete(
+              Malloc_allocator<const T *>(key_memory_DD_cache_infrastructure)),
+          m_elements_to_delete(Malloc_allocator<const Cache_element<T> *>(
+              key_memory_DD_cache_infrastructure)),
           m_map(map) {
       mysql_mutex_lock(&m_map->m_lock);
     }

Redefine new for the common DD object superclass

--------------------- sql/dd/impl/types/weak_object_impl.h ---------------------
@@ -1,4 +1,4 @@
-/* Copyright (c) 2014, 2017, Oracle and/or its affiliates. All rights reserved.
+/* Copyright (c) 2014, 2019, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License, version 2.0,
@@ -23,8 +23,11 @@
 #ifndef DD__WEAK_OBJECT_IMPL_INCLUDED
 #define DD__WEAK_OBJECT_IMPL_INCLUDED
 
-#include "sql/dd/object_id.h"          // Object_id
-#include "sql/dd/types/weak_object.h"  // dd::Weak_object
+#include "my_sys.h"                     // MY_WME
+#include "mysql/service_mysql_alloc.h"  // my_malloc
+#include "sql/dd/object_id.h"           // Object_id
+#include "sql/dd/types/weak_object.h"   // dd::Weak_object
+#include "sql/psi_memory_key.h"         // key_memory_DD_objects
 
 namespace dd {
 
@@ -46,6 +49,24 @@ class Weak_object_impl : virtual public Weak_object {
 
   virtual ~Weak_object_impl() {}
 
+  void *operator new(size_t size, const std::nothrow_t &) noexcept {
+    /*
+      Call my_malloc() with the MY_WME flag to make sure that it will
+      write an error message if the memory could not be allocated.
+    */
+    return my_malloc(key_memory_DD_objects, size, MYF(MY_WME));
+  }
+
+  void *operator new(size_t size) noexcept {
+    /*
+      Call my_malloc() with the MY_WME flag to make sure that it will
+      write an error message if the memory could not be allocated.
+    */
+    return my_malloc(key_memory_DD_objects, size, MYF(MY_WME));
+  }
+
+  void operator delete(void *ptr) noexcept { my_free(ptr); }
+
  public:
   virtual const Object_table &object_table() const = 0;