WL#7868: InnoDB: Improvements around flushing for proper performance

Status: Complete   —   Priority: Medium

The flushing activity behavior is important to control throughput of DML
oriented workload.

The oldest modification LSN age must be controlled in the transaction log file
total capacity, not to overwrite the last checkpoint log LSN. The age is
increased by new transaction logging the modification. And is decreased by
flushing the page which has the oldest modification. So, in long term, flushing
the oldest pages is needed to progress new transactions.

1: Implementing improvements to the adaptive flushing algorithm

1-a: should consider the distribution about the oldest LSN of the pages in the
end of the flush_list

Along with the each oldest LSN for the pages, the number of pages to be flushed
for the same LSN age progress are different. If there are many similar oldest
LSN aged pages in the end of the flush_list, then all pages should be flushed to
progress the LSN age.

The factor which considers "How many pages should be flushed to progress the LSN
age at the time" should added to the adaptive flushing algorithm.

1-b: should consider flushing balance for each buffer pool instances

The adaptive flushing algorithm also should consider dirty page balance between
buffer pool instances. Dirty pages in the buffer pool instance which has the
most dirty pages should be flushed in prior


2: Setting a thread priority for the page_cleaner (if the platform allows and
authorized to set)

To keep the proper flushing along with the throughput of the workload,
page_cleaner threads should have opportunity to be executed properly. If there
are many active user threads and their thread priority are all the same, the
page_cleaner has less opportunity to run and might not flush enough dirty pages.

Even if it depends on environment and 'best effort', page_cleaner priority
should be greater than the others.


3: Proper flush waiting when the max modified LSN age is around
max_modified_age_sync

In the previous version, if the max modified LSN age reaches
max_modified_age_sync the flushing could overreacts (long flushing batch and
all user threads should wait for the batch end).

Each user thread should wait only for the target LSN age, not for the batch
end. They should progress along with the flushing pages and reduced the last
modified LSN age. By that throughput around max_modified_age_sync should be much
more smooth and proper, and much less performance risk to reach to
max_modified_age_sync.


3': should block the checkpoint LSN overwritten

By implementing the above item "3:", the user threads have more opportunity to
progress than before, and also might have more possibility to overwrite the
checkpoint.

The each log write should be conscious about the last checkpoint LSN not to
overwrite its transaction log.
Functional requirements:
F-1: Proper flushing throughput along with setting innodb_io_capacity_max.
     (User should not set too large innodb_io_capacity_max.
      [written blocks/s] can reach to system max
      and can be limited by innodb_io_capacity_max with the number (blocks/s))

F-2: User can detect too large setting of innodb_io_capacity_max by the server
error log.
     (like "page_cleaner: 1000ms intended loop took XXXXms. The settings might
not be optimal. (flushed=YYYY and evicted=ZZZZ, during the time.)"
      At the case the optimal innodb_io_capacity_max might be 'ZZZZ * (1000/XXXX)')

F-3: User can get activity detail of the page_cleaner threads by
INFORMATION_SCHEMA.INNODB_METRICS table


Non-Functional requirements:
NF-1: optimize (stabilize and same/higher average) long-term (means might reach
to max_modified_age_sync flushing)
RW-workload

Changes to the interface specification:
I-1: No new files
I-2: New syntax
  I-2-1: new items for INFORMATION_SCHEMA.INNODB_METRICS table
  buffer_flush_n_to_flush_by_age: "Number of pages target by LSN Age for flushing."
  buffer_flush_adaptive_avg_time_slot:    "Avg time (ms) spent for adaptive
flushing recently per slot."
  buffer_LRU_batch_flush_avg_time_slot:   "Avg time (ms) spent for LRU batch
flushing recently per slot.",
  buffer_flush_adaptive_avg_time_thread:  "Avg time (ms) spent for adaptive
flushing recently per thread.",
  buffer_LRU_batch_flush_avg_time_thread: "Avg time (ms) spent for LRU batch
flushing recently per thread.",
  buffer_flush_adaptive_avg_time_est:     "Estimated time (ms) spent for
adaptive flushing recently.",
  buffer_LRU_batch_flush_avg_time_est:    "Estimated time (ms) spent for LRU
batch flushing recently.",
  buffer_flush_avg_time:  "Avg time (ms) spent for flushing recently.",
  buffer_flush_adaptive_avg_pass: "Numner of adaptive flushes passed during the
recent Avg period.",
  buffer_LRU_batch_flush_avg_pass:"Number of LRU batch flushes passed during the
recent Avg period.",
  buffer_flush_avg_pass:  "Number of flushes passed during the recent Avg period.",
  buffer_LRU_get_free_loops:      "Total loops in LRU get free.",
  buffer_LRU_get_free_waits:      "Total sleep waits in LRU get free."

I-3: No new commands
I-4: No new tools
I-5: No impact on existing functionality