MySQL :: MySQL 8.0: Excluding the Buffer Pool from a Core File

The latest release of MySQL 8.0 introduces a new dynamic system variable @@innodb_buffer_pool_in_core_file which lets you omit the Buffer Pool’s memory content when generating a core file.

This change is an adaptation of a patch contributed by the Facebook team. We would like to thank and acknowledge this important and timely contribution by Facebook.

A Core File

A core file or core dump is a file that records the memory image of a running process and its process status (register values etc.). Its primary use is post-mortem debugging of a program that crashed while it ran outside a debugger.
— sourceware.org

To enable core file creation in case MySQL crashes, you have to specify --core-file command line option when running mysqld, which changes the value of @@core_file read-only system variable to ON from its default value of OFF.

For example, suppose you happen to be using Linux, and you’ve run

./bin/mysqld --core-file --datadir=/var/mysql/data

1	./bin/mysqld --core-file --datadir=/var/mysql/data

and it crashed due to some bug (which you can simulate using kill -s SIGABRT $pid where $pid is the id of your mysqld process). Then you can inspect the state just before the program has crashed using:

gdb ./bin/mysqld /var/mysql/data/core.$pid

1	gdb ./bin/mysqld /var/mysql/data/core.$pid

The exact filename and location of the core file depends on your particular system configuration – our example assumes that cat /proc/sys/kernel/core_pattern outputs core or core.%p and that cat /proc/sys/kernel/core_uses_pid outputs 1, and that /var/mysql/data is your data directory (which is used as current working directory by mysqld process, and that’s why the core file is by default created in it).

The Buffer Pool

The InnoDB Buffer Pool is a storage area for caching data and indexes in memory. Together with InnoDB Redo Log and the data pages persisted on disk, they form a low-level abstraction of I/O: data can be thought of as divided into pages, where each page is identified by its tablespace id and page id, and InnoDB can load, modify, and store such pages in an atomic way. Only on top of this abstraction the more complicated structures of various primary and secondary indexes are built, which use these low-level pages to store nodes of trees for example.

To perform any work on a page, the page needs to be brought from disk to memory, and the place in memory where we keep such pages is called the Buffer Pool. Subsequent usages of the same page can be served from the Buffer Pool as long as the page was not removed from it (a.k.a. evicted) which may happen if there is not enough space in memory to hold all the pages which are accessed. In this regard the Buffer Pool serves as a cache for pages on disk. Also, to avoid writing a page to disk each time it is modified, a page is only marked as dirty in memory, but the write to disk is deferred until it is really necessary, and only the information needed to recreate the state of the page after crash is written to the append only write ahead log (called the Redo Log). One can see from this rough description that the larger the Buffer Pool the more rare are situations in which we have to perform costly disk I/O operations. Thus, the Buffer Pool is often configured to consume a considerable fraction of available RAM.

Since the Buffer Pool resides in main memory, and the memory of a process is dumped to a core file, it follows that a huge Buffer Pool results in a huge core file. This can be problematic for several reasons:

a big file consumes space on disk, which can create a cascade of problems if there is not enough space
a big file takes longer to write
a big file is more difficult to move around, in particular when one needs to send it to somebody else for analysis

Also, the Buffer Pool contains pages of the database, which poses some security considerations when it gets dumped to a file.

There are however cases, where investigating the crash would benefit from having access to the exact content of pages at the moment of crash.

So, there are good reasons to exclude the Buffer Pool from a core file, but also there are scenarios where you would rather prefer to have the data.

Advising operating system about our intention

On Linux 3.4+ a programmer can use a non-POSIX extension to madvise() interface by calling madvise(ptr,size,MADV_DONTDUMP) to let the operating system know, that size bytes of memory pointed by ptr should not be dumped to a core file.
In the patch contributed by Facebook madvise() was used on all large buffers allocated by MySQL to make core files smaller.
We have ported this patch to MySQL 8.0 narrowing it down to the Buffer Pool pages only.

The innodb_buffer_pool_in_core_file variable

Striving for backward compatibility, we’ve introduced a new system variable @@innodb_buffer_pool_in_core_file, which by default is set to ON , in order to mimick the old behavior. Also, this new variable only affects behavior if @@core_file is ON, as otherwise there will be no core file generated at all.

Only when this variable is set to OFF (for example by passing --skip--innodb-buffer-pool-in-core-file via command line) we change the behavior. If all following conditions hold:

@@innodb_buffer_pool_in_core_file is OFF and
@@core_file is set to ON, and
the operating system supports madvise(ptr,size,MADV_DONTDUMP)

the OS will be advised to exclude the Buffer Pool pages from a core file.

When something goes wrong, we’ve decided to err on the safe side. If the user didn’t want the Buffer Pool data to be included in a core file, but the operating system does not fully support that intention, we make sure that core file will not be generated at all. We believe that this is a better option than writing the core file anyway, which might expose sensitive data or overflow the disk. Thus, if @@innodb_buffer_pool_in_core_file is disabled but an madvise() failure occurs or marking Buffer Pool pages as MADV_DONTDUMP is unsupported by the operating system, an error is written to server’s error log and the @@core_file variable is disabled to prevent core file from being written.

This may sound a bit complicated, so here is a table covering all cases:

@@core_file	@@innodb_buffer_pool_in_core_file	OS supports MADV_DONTDUMP	effect
`OFF` (default)	*	*	no core file will be generated at all
`ON`	`ON` (default)	*	a core file will be generated including the Buffer Pool
`ON`	`OFF`	yes	a core file will be generated without the Buffer Pool
`ON`	`OFF`	no	no core file will be generated at all, `@@core_file` will be set to `NO`, and a warning will be emitted to server logs

In an effort to make run-time configuration as smooth and simple as possible, this new @@innodb_buffer_pool_in_core_file system variable is dynamic, so you can change its value whenever you like, for example using this command:

SET GLOBAL innodb_buffer_pool_in_core_file = OFF;

1	SET GLOBAL innodb_buffer_pool_in_core_file = OFF;

(Keep in mind, that as explained above, on systems which do not support MADV_DONTDUMP above command will set @@core_file to OFF, and since @@core_file is read-only there is no way to set it back to ON without restarting the server. You can use ./mtr --mem innodb.mysqld_core_dump_without_buffer_pool to check if your system supports this feature.)

To restore the old behavior use:

SET GLOBAL innodb_buffer_pool_in_core_file = ON;

1	SET GLOBAL innodb_buffer_pool_in_core_file = ON;

You can check the current value using:

SELECT @@global.innodb_buffer_pool_in_core_file;

1	SELECT @@global.innodb_buffer_pool_in_core_file;

You might also want to check if @@core_file is enabled:

SELECT @@global.core_file;

1	SELECT @@global.core_file;

Results

How much you gain by enabling this option is obviously dependent on how large your Buffer Pool was in the first place, but to give you some rough idea here’s a table with example results for --innodb-buffer-pool-size=1G:

	@@innodb_buffer_pool_in_core_file
@@innodb_page_size	ON	OFF
4kb	2.1GB	0.9GB
64kb	1.7GB	0.7GB

as you can see the InnoDB page size itself impacts the size of core file, because the smaller the page, the larger the number of pages, and thus more metadata for these pages. The difference between ON and OFF is not exactly 1GB as there is some variance of core file size between multiple runs and the exact moment of crashing the server, and obviously you can’t easily crash the same process more than once.

Thank you for using MySQL !