The latest release of MySQL 8.0 introduces a new dynamic system variable @@innodb_buffer_pool_in_core_file
which lets you omit the Buffer Pool’s memory content when generating a core file.
This change is an adaptation of a patch contributed by the Facebook team. We would like to thank and acknowledge this important and timely contribution by Facebook.
A Core File
A core file or core dump is a file that records the memory image of a running process and its process status (register values etc.). Its primary use is post-mortem debugging of a program that crashed while it ran outside a debugger.
— sourceware.org
To enable core file creation in case MySQL crashes, you have to specify --core-file
command line option when running mysqld
, which changes the value of @@core_file
read-only system variable to ON
from its default value of OFF
.
For example, suppose you happen to be using Linux, and you’ve run
1 |
./bin/mysqld --core-file --datadir=/var/mysql/data |
and it crashed due to some bug (which you can simulate using kill -s SIGABRT $pid
where $pid
is the id of your mysqld process). Then you can inspect the state just before the program has crashed using:
1 |
gdb ./bin/mysqld /var/mysql/data/core.$pid |
The exact filename and location of the core file depends on your particular system configuration – our example assumes that cat /proc/sys/kernel/core_pattern
outputs core
or core.%p
and that cat /proc/sys/kernel/core_uses_pid
outputs 1
, and that /var/mysql/data
is your data directory (which is used as current working directory by mysqld
process, and that’s why the core file is by default created in it).
The Buffer Pool
The InnoDB Buffer Pool is a storage area for caching data and indexes in memory. Together with InnoDB Redo Log and the data pages persisted on disk, they form a low-level abstraction of I/O: data can be thought of as divided into pages, where each page is identified by its tablespace id and page id, and InnoDB can load, modify, and store such pages in an atomic way. Only on top of this abstraction the more complicated structures of various primary and secondary indexes are built, which use these low-level pages to store nodes of trees for example.
To perform any work on a page, the page needs to be brought from disk to memory, and the place in memory where we keep such pages is called the Buffer Pool. Subsequent usages of the same page can be served from the Buffer Pool as long as the page was not removed from it (a.k.a. evicted) which may happen if there is not enough space in memory to hold all the pages which are accessed. In this regard the Buffer Pool serves as a cache for pages on disk. Also, to avoid writing a page to disk each time it is modified, a page is only marked as dirty in memory, but the write to disk is deferred until it is really necessary, and only the information needed to recreate the state of the page after crash is written to the append only write ahead log (called the Redo Log). One can see from this rough description that the larger the Buffer Pool the more rare are situations in which we have to perform costly disk I/O operations. Thus, the Buffer Pool is often configured to consume a considerable fraction of available RAM.
Since the Buffer Pool resides in main memory, and the memory of a process is dumped to a core file, it follows that a huge Buffer Pool results in a huge core file. This can be problematic for several reasons:
- a big file consumes space on disk, which can create a cascade of problems if there is not enough space
- a big file takes longer to write
- a big file is more difficult to move around, in particular when one needs to send it to somebody else for analysis
Also, the Buffer Pool contains pages of the database, which poses some security considerations when it gets dumped to a file.
There are however cases, where investigating the crash would benefit from having access to the exact content of pages at the moment of crash.
So, there are good reasons to exclude the Buffer Pool from a core file, but also there are scenarios where you would rather prefer to have the data.
Advising operating system about our intention
On Linux 3.4+ a programmer can use a non-POSIX extension to madvise()
interface by calling madvise(ptr,size,MADV_DONTDUMP)
to let the operating system know, that size
bytes of memory pointed by ptr
should not be dumped to a core file.
In the patch contributed by Facebook madvise()
was used on all large buffers allocated by MySQL to make core files smaller.
We have ported this patch to MySQL 8.0 narrowing it down to the Buffer Pool pages only.
The innodb_buffer_pool_in_core_file variable
Striving for backward compatibility, we’ve introduced a new system variable @@innodb_buffer_pool_in_core_file
, which by default is set to ON
, in order to mimick the old behavior. Also, this new variable only affects behavior if @@core_file
is ON
, as otherwise there will be no core file generated at all.
Only when this variable is set to OFF
(for example by passing --skip--innodb-buffer-pool-in-core-file
via command line) we change the behavior. If all following conditions hold:
-
@@innodb_buffer_pool_in_core_file
isOFF
and -
@@core_file
is set toON
, and - the operating system supports
madvise(ptr,size,MADV_DONTDUMP)
the OS will be advised to exclude the Buffer Pool pages from a core file.
When something goes wrong, we’ve decided to err on the safe side. If the user didn’t want the Buffer Pool data to be included in a core file, but the operating system does not fully support that intention, we make sure that core file will not be generated at all. We believe that this is a better option than writing the core file anyway, which might expose sensitive data or overflow the disk. Thus, if @@innodb_buffer_pool_in_core_file
is disabled but an madvise()
failure occurs or marking Buffer Pool pages as MADV_DONTDUMP
is unsupported by the operating system, an error is written to server’s error log and the @@core_file
variable is disabled to prevent core file from being written.
This may sound a bit complicated, so here is a table covering all cases:
@@core_file | @@innodb_buffer_pool_in_core_file | OS supports MADV_DONTDUMP | effect |
---|---|---|---|
OFF (default) |
* | * | no core file will be generated at all |
ON |
ON (default) |
* | a core file will be generated including the Buffer Pool |
ON |
OFF |
yes | a core file will be generated without the Buffer Pool |
ON |
OFF |
no | no core file will be generated at all, @@core_file will be set to NO , and a warning will be emitted to server logs |
In an effort to make run-time configuration as smooth and simple as possible, this new @@innodb_buffer_pool_in_core_file
system variable is dynamic, so you can change its value whenever you like, for example using this command:
1 |
SET GLOBAL innodb_buffer_pool_in_core_file = OFF; |
(Keep in mind, that as explained above, on systems which do not support MADV_DONTDUMP
above command will set @@core_file
to OFF
, and since @@core_file
is read-only there is no way to set it back to ON
without restarting the server. You can use ./mtr --mem innodb.mysqld_core_dump_without_buffer_pool
to check if your system supports this feature.)
To restore the old behavior use:
1 |
SET GLOBAL innodb_buffer_pool_in_core_file = ON; |
You can check the current value using:
1 |
SELECT @@global.innodb_buffer_pool_in_core_file; |
You might also want to check if @@core_file
is enabled:
1 |
SELECT @@global.core_file; |
Results
How much you gain by enabling this option is obviously dependent on how large your Buffer Pool was in the first place, but to give you some rough idea here’s a table with example results for --innodb-buffer-pool-size=1G
:
@@innodb_buffer_pool_in_core_file | ||
---|---|---|
@@innodb_page_size | ON | OFF |
4kb | 2.1GB | 0.9GB |
64kb | 1.7GB | 0.7GB |
as you can see the InnoDB page size itself impacts the size of core file, because the smaller the page, the larger the number of pages, and thus more metadata for these pages. The difference between ON
and OFF
is not exactly 1GB as there is some variance of core file size between multiple runs and the exact moment of crashing the server, and obviously you can’t easily crash the same process more than once.
Thank you for using MySQL !