WL#3249: PERFORMANCE SCHEMA, Instrument memory usage

Affects: Server-5.7   —   Status: Complete   —   Priority: Very High

The performance schema instruments memory usage,
and aggregates memory usage statistics detailed by:
- type of memory used (various caches, internal buffers, etc)
- thread/account/user/host indirectly performing the memory operation.

The following aspects are instrumented:
- memory sizes used
- counts of operations
- low / high water marks.

Memory sizes are important to understand / tune the memory consumption of a 
server.

Counts of operations are important to understand / tune the overall pressure the
server is putting on the memory allocator, which has an impact of performances.
Allocating 1 byte 1 million times is not the same as allocating 1 Million bytes,
and only keeping track of both counts and sizes can expose the difference.

Low and high water marks are critical to detect spikes in workload, the overall
stability of a workload, and possible leaks.

Target audience

* Production engineers

Production engineers are the primary audience for this instrumentation.

Production engineers are expected to use the memory instrumentation for:
- configuring a server,
- monitoring a server,
- capacity planning

* Server developers

Server developers are the secondary audience for this instrumentation.

Implementers of the server, for either the core server or plugins, can leverage
this instrumentation to understand the memory behavior of some code. This is
useful to evaluate different code or designs, to detect early potential
performance or consumption problems.

* Application developers

In most cases, application developers have little control on memory allocation
done inside the server itself.
Either the code is not configurable, in which case server developers are
responsible for how the code allocates memory,
or the code behavior is configurable, in which case the DBA has control on the
server with the my.cnf file.

In some cases however, in particular when some sizing parameters can be
specified by session, application developers have some control on the server
memory usage, and will benefit from this instrumentation.


User Documentation
==================

http://dev.mysql.com/doc/refman/5.7/en/memory-summary-tables.html
SERVER VARIABLE performance_schema_max_memory_classes
=====================================================

A new configuration variable, "performance_schema_max_memory_classes",
is used to size internal performance schema buffers.
This variable is read only, and global.

Note: this is a "class" sizing, no autotuning is needed.

The default value for performance_schema_max_memory_classes depends
on how many instruments are used in the server, and it can change other time.
The default value is currently 250.

SERVER STATUS performance_schema_memory_classes_lost
====================================================

A new server status variable, "performance_schema_memory_classes_lost",
is used to indicate if memory instruments could not be loaded.

This variable is global.

TABLE performance_schema.setup_instruments
==========================================

A new instrument type is supported, a memory instrument.
Memory instruments are named "memory/<component>/<instrument_name>".

Memory instrumentation is enabled by default.
This is a tentative decision, which will be revisited once the feature is
implemented, and once performance overhead can be measured in benchmarks.

TABLE performance_schema.memory_summary_by_thread_by_event_name
===============================================================

CREATE TABLE `memory_summary_by_thread_by_event_name` (
  `THREAD_ID` bigint(20) unsigned NOT NULL,
  `EVENT_NAME` varchar(128) NOT NULL,
  `COUNT_ALLOC` bigint(20) unsigned NOT NULL,
  `COUNT_FREE` bigint(20) unsigned NOT NULL,
  `SUM_NUMBER_OF_BYTES_ALLOC` bigint(20) unsigned NOT NULL,
  `SUM_NUMBER_OF_BYTES_FREE` bigint(20) unsigned NOT NULL,
  `LOW_COUNT_USED` bigint(20) NOT NULL,
  `CURRENT_COUNT_USED` bigint(20) NOT NULL,
  `HIGH_COUNT_USED` bigint(20) NOT NULL,
  `LOW_NUMBER_OF_BYTES_USED` bigint(20) NOT NULL,
  `CURRENT_NUMBER_OF_BYTES_USED` bigint(20) NOT NULL,
  `HIGH_NUMBER_OF_BYTES_USED` bigint(20) NOT NULL
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8

Operations
----------

SELECT is supported.
TRUNCATE is supported, and is used to reset all summaries.

Normal privilege checks do apply: a user must be granted the proper table
privileges to perform an operation.

Semantic
--------

This table aggregates all the memory operations by thread,
as generated by memory instruments.

The columns COUNT_ALLOC, COUNT_FREE aggregates the number of calls to
malloc-like and free-like functions.

The columns SUM_NUMBER_OF_BYTES_ALLOC, SUM_NUMBER_OF_BYTES_FREE
aggregates the size of the memory allocated and freed.

The column CURRENT_COUNT_USED is the current number of allocated blocks that
have not been freed yet. This is a convenience column, which is by definition
equal to COUNT_ALLOC - COUNT_FREE.

The column CURRENT_NUMBER_OF_BYTES_USED is the current memory size of allocated
blocks that have not been freed yet. This is a convenience column, which is by
definition equal to SUM_NUMBER_OF_BYTES_ALLOC - SUM_NUMBER_OF_BYTES_FREE.

The columns LOW_COUNT_USED, HIGH_COUNT_USED are the low/high water mark
corresponding to column CURRENT_COUNT_USED.

The columns LOW_NUMBER_OF_BYTES_USED, HIGH_NUMBER_OF_BYTES_USED are the low/high
water mark corresponding to column CURRENT_NUMBER_OF_BYTES_USED.

Effect of truncate
------------------

In general, truncate reset the baseline for statistics, but does not change the
server state (i.e., "truncate" on a "memory" table does not free memory).

COUNT_ALLOC, COUNT_FREE are reset to a new baseline, by reducing each counter by
the same value.

Likewise, SUM_NUMBER_OF_BYTES_ALLOC, SUM_NUMBER_OF_BYTES_FREE are reset to a new
baseline.

LOW_COUNT_USED, HIGH_COUNT_USED are reset to CURRENT_COUNT_USED.

LOW_NUMBER_OF_BYTES_USED, HIGH_NUMBER_OF_BYTES_USED are reset to
CURRENT_NUMBER_OF_BYTES_USED.

TABLE performance_schema.memory_summary_by_account_by_event_name
================================================================

Similar to memory_summary_by_thread_by_event_name,
but grouped by USER, HOST.

TABLE performance_schema.memory_summary_by_user_by_event_name
=============================================================

Similar to memory_summary_by_thread_by_event_name,
but grouped by USER.

TABLE performance_schema.memory_summary_by_host_by_event_name
=============================================================

Similar to memory_summary_by_thread_by_event_name,
but grouped by HOST.

TABLE performance_schema.memory_summary_global_by_event_name
============================================================

Similar to memory_summary_by_thread_by_event_name,
but without grouping (global summary).

Instrumentation behavior
========================

The memory instrumentation can be enabled or disabled dynamically, by updating
the column ENABLED in table setup_instruments.

The column TIMED in table setup_instrument is ignored for memory instruments, as
memory operations are not timed.

When a thread in the server executes a memory allocation that has been
instrumented, the following rules apply:
- a) if the thread is not instrumented, the memory allocated is not instrumented.
- b) if the instrument is not ENABLED, the memory allocated is not instrumented
- otherwise (both the thread and the instrument are enabled), the memory block
allocated is instrumented.

When a memory block is not instrumented:
- no statistics are collected during memory allocation
- no statistics are collected during memory de-allocation

When a thread in the server executes a memory free operation:
- free on a non instrumented memory block is never instrumented
- free on an instrumented memory block is always instrumented,
  and statistics about the free operation are collected.

For the per thread statistics, the following rules apply.

When a memory block is instrumented, upon allocation of a block of size N:
- COUNT_ALLOC is incremented by 1
- CURRENT_COUNT_USED is incremented by 1
- HIGH_COUNT_USED is increased if CURRENT_COUNT_USED is a new maximum
- SUM_NUMBER_OF_BYTES_ALLOC is increased by N,
- CURRENT_NUMBER_OF_BYTES_USED is increased by N,
- HIGH_NUMBER_OF_BYTES_USED is increased if CURRENT_NUMBER_OF_BYTES_USED is a
new maximum

When a memory block is instrumented, upon de-allocation:
- COUNT_FREE is incremented by 1
- CURRENT_COUNT_USED is decremented by 1
- LOW_COUNT_USED is decreased if CURRENT_COUNT_USED is a new minimum
- SUM_NUMBER_OF_BYTES_FREE is increased by N,
- CURRENT_NUMBER_OF_BYTES_USED is decreased by N,
- LOW_NUMBER_OF_BYTES_USED is decreased if CURRENT_NUMBER_OF_BYTES_USED is a
new minimum

For higher lever aggregates (by account, by user, by host, global),
the same rules apply expect for low and high water marks.

- LOW_COUNT_USED / LOW_NUMBER_OF_BYTES_USED are lower estimates
- HIGH_COUNT_USED / HIGH_NUMBER_OF_BYTES_USED are higher estimates

"Lower estimates" means that the value reported by the performance schema is
guaranteed to be lower or equal to the lowest count/size of memory effectively
used at runtime.

"Higher estimates" means that the value reported by the performance schema is
guaranteed to be greater or equal to the highest count/size of memory
effectively used at runtime.

How estimates are computed is implementation dependent, and subject to change.

Example of estimates currently implemented:

Thread 1 uses memory in the 1M - 2M range during execution,
as reported with LOW_NUMBER_OF_BYTES_USED, HIGH_NUMBER_OF_BYTES_USED
in table memory_summary_by_thread_by_event_name

Thread 2 uses memory in the 10M - 12M range during execution, likewise

When these two threads belong to the same user account, the per account summary
estimates that this account used memory in the 11M - 14M range.

That is, the LOW_NUMBER_OF_BYTES_USED for the higher level aggregate is the sum
of each LOW_NUMBER_OF_BYTES_USED (assuming worst case), and likewise, the
HIGH_NUMBER_OF_BYTES_USED for the higher lever aggregate is the sum of each
HIGH_NUMBER_OF_BYTES_USED (assuming worst case).

11M is a lower estimate, that can only occur if both thread 1 and thread 2 hit
the low usage mark at the same time.

14M is a higher estimate, that can only occur if both threads hit the high usage
mark at the same time.

The real memory usage for this account could have been in the [11.5M - 13.5M] range

Note that for capacity planning, reporting the worst case is actually the
desired behavior, as it shows what can potentially happen when sessions are un
correlated, which is typically the case.

Scope of instrumentation
========================

All memory allocation of the server, including:
- the sql layer
- the mysys layer
- storage engines
- various plugins
is to be instrumented.

REQUIREMENTS
============

Install
-------

Func-Req (1): A fresh MySQL installation of CURRENT-VERSION must create the
following tables:
- 1.1, TABLE performance_schema.memory_summary_by_thread_by_event_name
- 1.2, TABLE performance_schema.memory_summary_by_account_by_event_name
- 1.3, TABLE performance_schema.memory_summary_by_user_by_event_name
- 1.4, TABLE performance_schema.memory_summary_by_host_by_event_name
- 1.5, TABLE performance_schema.memory_summary_global_by_event_name

Upgrade
-------

Func-Req (2): An upgrade from PREVIOUS-VERSION to CURRENT-VERSION must create
the following tables:
- 2.1, TABLE performance_schema.memory_summary_by_thread_by_event_name
- 2.2, TABLE performance_schema.memory_summary_by_account_by_event_name
- 2.3, TABLE performance_schema.memory_summary_by_user_by_event_name
- 2.4, TABLE performance_schema.memory_summary_by_host_by_event_name
- 2.5, TABLE performance_schema.memory_summary_global_by_event_name

Server options and variables
----------------------------

Func-Req (3.1): The server variable performance-schema-max-memory-classes is
defined.

Func-Req (3.2): The server status variable
performance_schema_memory_classes_lost is defined.