WL#7794: PERFORMANCE SCHEMA, SCALABLE MEMORY ALLOCATION
Affects: Server-5.7
—
Status: Complete
Currently the performance schema: - allocates all the memory it needs up front - requires the user to configure how much memory to use - consumes all memory allocated but does not use all of it, when the server load is low. This task is to relax the memory constraints: - to increase ease of use, with less configuration - to decrease the memory footprint, scaling the memory consumption with the server actual load.
F-1 Memory instruments New instruments, named with the prefix "memory/performance_schema/", are defined and visible in table performance_schema.setup_instruments. These instruments expose how much memory is allocated for internal buffers in the performance schema. F-2 Global memory aggregates Table performance_schema.memory_summary_global_by_event_name displays global statistics for the instruments named "memory/performance_schema/%" Note that other memory aggregates, namely: - table memory_summary_by_account_by_event_name - table memory_summary_by_host_by_event_name - table memory_summary_by_thread_by_event_name - table memory_summary_by_user_by_event_name are not affected. NF-3 Memory life cycle Before this change, design constraints on the performance schema were: - allocate all the memory needed at startup - never allocate memory during the server operation - never free memory during the server operation - free all the memory used at shutdown These constraints are now relaxed, so that the performance schema: - may allocate memory at server startup - may allocate additional memory during server operation - never free memory during the server operation - free all the memory used at shutdown F-4 "Autoscale" configuration parameters The following configuration parameters: - performance_schema_accounts_size - performance_schema_hosts_size - performance_schema_max_cond_instances - performance_schema_max_file_instances - performance_schema_max_index_stat - performance_schema_max_metadata_locks - performance_schema_max_mutex_instances - performance_schema_max_prepared_statements_instances - performance_schema_max_program_instances - performance_schema_max_rwlock_instances - performance_schema_max_socket_instances - performance_schema_max_table_handles - performance_schema_max_table_instances - performance_schema_max_table_lock_stat - performance_schema_max_thread_instances - performance_schema_users_size are now parameters that support automatic scaling. Requirements for auto scaling parameters are as follows. F-4-A Setting autoscale parameters to zero When an autoscale parameter is set to 0, the corresponding internal buffer is empty, and no memory is allocated. Note: No change from the previous behavior. F-4-B Setting autoscale parameters to a positive value When an autoscale parameter is set to a positive value N, the corresponding internal buffer is initially empty, and no memory is initially allocated. As the performance schema collects data, memory is allocated in the corresponding buffer, until the buffer size reaches N. Once the buffer size is N, no more memory is allocated. Data collected by the performance schema for this buffer is lost, and the buffer corresponding lost counters are incremented. Note: compared to the previous implementation, - allocation is not N up-front, but up to N during the server operation (improvement) - the buffer total size is capped and lost counters indicate overflow (no change) F-4-C Setting autoscale parameters to minus one When an autoscale parameter is set to -1, the corresponding internal buffer is initially empty, and no memory is initially allocated. As the performance schema collects data, memory is allocated in the corresponding buffer. The buffer size is unbounded, and may grow with the load. Note: compared to the previous implementation, - allocation is not up-front, but increasing during the server operation (improvement) - there is no "autotuned" computed value for the max size of a buffer, the buffer grows depending on the load actually seen, not the load estimated (improvement)
Interface changes : additional memory instrumentation ===================================================== The performance schema allocates memory internally, for various buffers. Each buffer is instrumented with a dedicated instrument name, so that memory consumption can be traced to individual buffers. Given that the server is already instrumented for memory statistics, statistics about the performance schema internal memory consumption are reported in the existing tables, as for any other instrument. Internal buffers are global to the server, so only a global statistic is exposed: there are no per thread / account / user / host memory statistics reported, as this is not applicable. Behavior changes : scalable memory allocation ============================================= The performance schema allocates memory incrementally, based on the server load, instead of allocating all the memory needed during server startup. Memory is never freed during server operation, but can be recycled. Allocation behavior =================== This section is implementation dependent, and subject to change. It is documented to help understand the behavior, and for the user manual. The server allocates memory by increment, for each internal buffer. The size of each increment is fixed, and depends on the buffer used. The maximum number of increments is capped (this allows a simpler and more efficient implementation), the maximum cap is a very high limit that should never be reached in practice in production. Currently, the increments are as follows. Format of each line is: Internal buffer name / increment size / max number of increments / max size Mutex instances / 1024 / 1024 / 1048576 Rwlock instances / 1024 / 1024 / 1048576 Cond instances / 256 / 256 / 65536 File instances / 1024 / 1024 / 1048576 Socket instances / 256 / 256 / 65536 Metadata locks / 1024 / 1024 / 1048576 Setup actors / 1024 / 1024 / 1048576 table handles / 1024 / 1024 / 1048576 table instances / 1024 / 1024 / 1048576 table indexes / 1024 / 1024 / 1048576 table locks / 1024 / 1024 / 1048576 Stored programs / 1024 / 1024 / 1048576 Prepared statements / 1024 / 1024 / 1048576 Accounts / 256 / 256 / 65536 Hosts / 256 / 256 / 65536 Threads / 256 / 256 / 65536 Users / 256 / 256 / 65536 In other words, mutex instances are allocated by chuck of 1,024 at a time, up to 1 million (1,048,576 precisely) instances, or less if a specific limit is given by configuration parameters. Likewise, memory to instrument threads is allocated by chunk of 256, up to a maximum limit of 65,536, or less if a specific limit is given by configuration paramters. Benefits ======== Configuration of most sizing parameters is no longer required. A server used with a very low load will not consume arbitrary high amount of memory as before.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.