WL#5535: Separate tablespace(s) for the InnoDB undo log
Status: Complete
Currently, the InnoDB transaction undo log lives in the system tablespace. There is some provision in the code to have it in different tablespaces. Splitting the undo log to multiple tablespaces could lift some performance bottlenecks (mutex contention). It has been requested in BUG#25697. Moving the undo log outside the system tablespace would also benefit users of solid-state storage (SSD). As noted in BUG#56283, it could be useful to move the user tablespaces to SSD and keep the system tablespace on HDD. The reason to keep the system tablespace on HDD is the sequential bulk writes to the doublewrite buffer that would quickly wear out SSDs due to their limited number of rewrites. If we disable the insert buffer and move the undo logs away from the system tablespace, it will be practically read-only except for DDL operations and the doublewrite buffer. Moving the undo log to different tablespace(s) doesn't require a file format change, it will however break backward compatibility. Older versions of InnoDB will not be able to access the UNDO logs that reside in their own tablespace because they will not know where to find them. Secondly, InnoDB opens all system tablespaces before doing recovery and the UNDO logs qualify as system space and older version will not know about this new system tablespace. * Have one tablespace per undo log (insert, update, update_key_or_delete). Advantages: * More flexibility in using posix_fadvise to avoid file system cache pollution: http://dom.as/2010/11/18/logs-memory-pressure/ * Users can control how many rollback segments they want to create and whether to use separate tablespaces or not Disadvantages: * Breaks backward compatibility
Allow users to move the additional UNDO logs to separate tablespaces. For this we introduce 3 new parameters: 1. innodb_undo_directory (string) Path to UNDO tablespace directory, default is ".". Value can be an absolute path. Default value will create the UNDO log in the process current working directory. 2. innodb_undo_tablespaces (ulint) Allow 1-N mapping between UNDO logs and tablespaces, default is 0. 3. innodb_undo_logs (ulint) Default is 1, if value is greater than current UNDO log numbers then create the additional UNDO log segments. If innodb_undo_tablespaces > 0 then they will be created outside ibdata1 (system tablespace) otherwise in ibdata1. Rename innodb_rolback_segments to innodb_undo_logs. Refuse to startup if innodb_undo_tablespaces doesn't match what InnoDB can access. With the above settings users have more flexibility than is available with the current design. Currently we create all 128 UNDO logs at once. Users can't drop UNDO tablespaces or the UNDO segments created within.
At startup before we open any rollback segment (a.k.a UNDO log) for recovery or MVCC, we first check to see how many rollback segments are currently configured and where they are located. If they are already in a tablespace separate from the system tablespace (space id 0) then we try and open these tablespaces using the innodb_undo_dir path and "undoN" as the file name. Where N == space id. Failure to open the currently in-use tablespaces is a fatal error. This is because recovery cannot proceed unless we have access to all the configured UNDO tablespaces. If the above operations is successful then we try and open up to innodb_undo_tablespaces. If we can't open up to the total number of tablespaces that the user is requesting then we print out an error message with a suggestion about the correctly configured and discovered tablespaces. The UNDO tablespace ids are expected to be contiguous and monotonically increasing and less than any user user created tablespace ids. This is an invariant and checked by various parts of the code. Note: All of this doesn't require a file format change because the on-disk data structures have the necessary provision to support separate UNDO log tablespaces. It is the code in previous versions of InnoDB that has shortcomings. * Needs to know the path to the UNDO tablespaces * Needs to open the tablespaces * Hard coded assertions about rseg space id being 0 (system tablespace), mainly purge. * Assumes system space id will always be 0, this has ramifications in the fil0fil.c LRU code. We extend the notion of system tablespace to include the UNDO tablespaces too. Patch: rb561
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.