WL#5580: changes to LRU flushing
Status: Complete
This work is performance related. The idea is to off load flushing activity that happens in the LRU list from user threads to the background thread i.e.: the page_cleaner. Also included in the scope is simpler and may be better heuristic for LRU flushing.
New Config Options: =================== innodb_lru_scan_depth (default 1024): dynamic, min:100, max:~0 innodb_flush_neighbors (default TRUE): dynamic New LRU algorithm: =================== Basically I have ripped off all the old constants and code related to LRU flushing. The new scheme works like this: * LRU flushing happens only in page_cleaner thread (previously it happened only in user threads) * LRU flushing includes cleaning the tail of LRU list AND putting blocks to the free list * When a user threads can't find a block in free list or a clean block in the tail of LRU then it triggers a new type of flush called BUF_FLUSH_SINGLE_PAGE in which it tries to flush a single page from LRU list instead of triggering a batch. Finding a Victim for Replacement: ================================= /******************************************************************//** Returns a free block from the buf_pool. The block is taken off the free list. If free list is empty, blocks are moved from the end of the LRU list to the free list. This function is called from a user thread when it needs a clean block to read in a page. Note that we only ever get a block from the free list. Even when we flush a page or find a page in LRU scan we put it to free list to be used. * iteration 0: * get a block from free list, success:done * if there is an LRU flush batch in progress: * wait for batch to end: retry free list * if buf_pool->try_LRU_scan is set * scan LRU up to srv_LRU_scan_depth to find a clean block * the above will put the block on free list * success:retry the free list * flush one dirty page from tail of LRU to disk * the above will put the block on free list * success: retry the free list * iteration 1: * same as iteration 0 except: * scan whole LRU list * scan LRU list even if buf_pool->try_LRU_scan is not set * iteration > 1: * same as iteration 1 but sleep 100ms LRU flush batch: ================ Every second the page_cleaner thread will call page_cleaner_flush_LRU_tail() which will: * Unconditionally scan innodb_lru_scan_depth - len(free_list) blocks of LRU list * If a block is replaceable it is put to free list * If a block is flushable it is flushed (in this case not put on the free list. That is left for the next scan) * We do not try to hold off an LRU flush unless the tail becomes peppered with dirty pages. We'll just do the flush even if there is one page that is dirty in the scan_depth. Subtleties: =========== * I have to make the eviction a part of LRU flushing (instead of just flushing the dirty pages) because whenever we flush we have to release the buf_pool and block mutex and that means that we have to restart the scan of LRU for next iteration. That makes scanning of the tail of LRU (when there are lots of dirty pages there an O(n*n) thing which makes buf_pool mutex very hot. * In buf_LRU_free_block: /* There can be multiple threads doing an LRU scan to free a block. The page_cleaner thread can be doing and LRU batch whereas user threads can potentially be doing multiple single page flushes. As we release buf_pool->mutex below we need to make sure that no one else considers this block as vicitim. This block is already out of page_hash and we are about to remove it from the LRU list and put it on free list. To avoid this situation we set the buf_fix_count and io_fix fields here. */ Instrumentation: ================ Many new counters added. (Some old counters introduced in page_cleaner patch now also report cumulative values instead of instant values): mysql> select name, comment from innodb_metrics where name like '%buffer_lru%' or name like 'buffer_flush%'; +---------------------------------------+----------------------------------------------------------------+ | name | comment | +---------------------------------------+----------------------------------------------------------------+ | buffer_flush_adaptive_flushes | Occurrences of adaptive flush | | buffer_flush_adaptive_pages | Number of pages flushed as part of adaptive flushing | | buffer_flush_async_flushes | Occurrences of async flush | | buffer_flush_async_pages | Number of pages flushed as part of async flushing | | buffer_flush_sync_flushes | Occurrences of sync flush | | buffer_flush_sync_pages | Number of pages flushed as part of sync flushing | | buffer_flush_max_dirty_flushes | Occurrences of max dirty page flush | | buffer_flush_max_dirty_pages | Number of pages flushed as part of max dirty flushing | | buffer_flush_background_flushes | Occurrences of background flush | | buffer_flush_background_pages | Number of pages flushed as part of background flushing | | buffer_flush_io_capacity_pct | Percent of Server I/O capacity during flushing | | buffer_flush_neighbor_calls | Number of times neighbor flushing is invoked | | buffer_flush_neighbor_count | Total neighbors flushed | | buffer_flush_batch_count | Number of flush batches | | buffer_flush_batch_scanned | Total pages scanned as part of flush batch | | buffer_flush_batch_pages | Number of pages flushed as part of flush batch | | buffer_lru_batch_count | Number of LRU batches | | buffer_lru_batch_scanned | Total pages scanned as part of LRU batch | | buffer_lru_batch_pages | Number of pages flushed as part of LRU batch | | buffer_lru_single_flush_count | Number of single page LRU flushes | | buffer_lru_single_flush_scanned | Total pages scanned as part of single page LRU flushes | | buffer_lru_single_flush_pages | Number of pages flushed as part of single page LRU flushes | | buffer_lru_single_flush_failure_count | Number of times attempt to flush a single page from LRU failed | | buffer_lru_get_free_search | Number of searches performed for a clean page | | buffer_lru_search | Number of searches performed on LRU for clean page | | buffer_lru_search_scanned | Total pages scanned as part of LRU searches | | buffer_lru_unzip_lru_search | Number of searches performed on unzip_LRU for clean page | | buffer_lru_unzip_lru_search_scanned | Total pages scanned as part of unzip_LRU searches | +---------------------------------------+----------------------------------------------------------------+ There is an undocumented hidden config parameter innodb_doublewrite_batch_size which is visible only with UNIV_PERF_DEBUG or UNIV_DEBUG. The value determines how much of doublewrite is to be used for batch flushing. The default is 120 and allowable values are 1 - 127. It is a static variable."
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.