The world's most popular open source database
00001 /* Innobase relational database engine; Copyright (C) 2001 Innobase Oy 00002 00003 This program is free software; you can redistribute it and/or modify 00004 it under the terms of the GNU General Public License 2 00005 as published by the Free Software Foundation in June 1991. 00006 00007 This program is distributed in the hope that it will be useful, 00008 but WITHOUT ANY WARRANTY; without even the implied warranty of 00009 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 00010 GNU General Public License for more details. 00011 00012 You should have received a copy of the GNU General Public License 2 00013 along with this program (in file COPYING); if not, write to the Free 00014 Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ 00015 /****************************************************** 00016 The database buffer buf_pool 00017 00018 (c) 1995 Innobase Oy 00019 00020 Created 11/5/1995 Heikki Tuuri 00021 *******************************************************/ 00022 00023 #include "buf0buf.h" 00024 00025 #ifdef UNIV_NONINL 00026 #include "buf0buf.ic" 00027 #endif 00028 00029 #include "mem0mem.h" 00030 #include "btr0btr.h" 00031 #include "fil0fil.h" 00032 #include "lock0lock.h" 00033 #include "btr0sea.h" 00034 #include "ibuf0ibuf.h" 00035 #include "dict0dict.h" 00036 #include "log0recv.h" 00037 #include "log0log.h" 00038 #include "trx0undo.h" 00039 #include "srv0srv.h" 00040 00041 /* 00042 IMPLEMENTATION OF THE BUFFER POOL 00043 ================================= 00044 00045 Performance improvement: 00046 ------------------------ 00047 Thread scheduling in NT may be so slow that the OS wait mechanism should 00048 not be used even in waiting for disk reads to complete. 00049 Rather, we should put waiting query threads to the queue of 00050 waiting jobs, and let the OS thread do something useful while the i/o 00051 is processed. In this way we could remove most OS thread switches in 00052 an i/o-intensive benchmark like TPC-C. 00053 00054 A possibility is to put a user space thread library between the database 00055 and NT. User space thread libraries might be very fast. 00056 00057 SQL Server 7.0 can be configured to use 'fibers' which are lightweight 00058 threads in NT. These should be studied. 00059 00060 Buffer frames and blocks 00061 ------------------------ 00062 Following the terminology of Gray and Reuter, we call the memory 00063 blocks where file pages are loaded buffer frames. For each buffer 00064 frame there is a control block, or shortly, a block, in the buffer 00065 control array. The control info which does not need to be stored 00066 in the file along with the file page, resides in the control block. 00067 00068 Buffer pool struct 00069 ------------------ 00070 The buffer buf_pool contains a single mutex which protects all the 00071 control data structures of the buf_pool. The content of a buffer frame is 00072 protected by a separate read-write lock in its control block, though. 00073 These locks can be locked and unlocked without owning the buf_pool mutex. 00074 The OS events in the buf_pool struct can be waited for without owning the 00075 buf_pool mutex. 00076 00077 The buf_pool mutex is a hot-spot in main memory, causing a lot of 00078 memory bus traffic on multiprocessor systems when processors 00079 alternately access the mutex. On our Pentium, the mutex is accessed 00080 maybe every 10 microseconds. We gave up the solution to have mutexes 00081 for each control block, for instance, because it seemed to be 00082 complicated. 00083 00084 A solution to reduce mutex contention of the buf_pool mutex is to 00085 create a separate mutex for the page hash table. On Pentium, 00086 accessing the hash table takes 2 microseconds, about half 00087 of the total buf_pool mutex hold time. 00088 00089 Control blocks 00090 -------------- 00091 00092 The control block contains, for instance, the bufferfix count 00093 which is incremented when a thread wants a file page to be fixed 00094 in a buffer frame. The bufferfix operation does not lock the 00095 contents of the frame, however. For this purpose, the control 00096 block contains a read-write lock. 00097 00098 The buffer frames have to be aligned so that the start memory 00099 address of a frame is divisible by the universal page size, which 00100 is a power of two. 00101 00102 We intend to make the buffer buf_pool size on-line reconfigurable, 00103 that is, the buf_pool size can be changed without closing the database. 00104 Then the database administarator may adjust it to be bigger 00105 at night, for example. The control block array must 00106 contain enough control blocks for the maximum buffer buf_pool size 00107 which is used in the particular database. 00108 If the buf_pool size is cut, we exploit the virtual memory mechanism of 00109 the OS, and just refrain from using frames at high addresses. Then the OS 00110 can swap them to disk. 00111 00112 The control blocks containing file pages are put to a hash table 00113 according to the file address of the page. 00114 We could speed up the access to an individual page by using 00115 "pointer swizzling": we could replace the page references on 00116 non-leaf index pages by direct pointers to the page, if it exists 00117 in the buf_pool. We could make a separate hash table where we could 00118 chain all the page references in non-leaf pages residing in the buf_pool, 00119 using the page reference as the hash key, 00120 and at the time of reading of a page update the pointers accordingly. 00121 Drawbacks of this solution are added complexity and, 00122 possibly, extra space required on non-leaf pages for memory pointers. 00123 A simpler solution is just to speed up the hash table mechanism 00124 in the database, using tables whose size is a power of 2. 00125 00126 Lists of blocks 00127 --------------- 00128 00129 There are several lists of control blocks. The free list contains 00130 blocks which are currently not used. 00131 00132 The LRU-list contains all the blocks holding a file page 00133 except those for which the bufferfix count is non-zero. 00134 The pages are in the LRU list roughly in the order of the last 00135 access to the page, so that the oldest pages are at the end of the 00136 list. We also keep a pointer to near the end of the LRU list, 00137 which we can use when we want to artificially age a page in the 00138 buf_pool. This is used if we know that some page is not needed 00139 again for some time: we insert the block right after the pointer, 00140 causing it to be replaced sooner than would noramlly be the case. 00141 Currently this aging mechanism is used for read-ahead mechanism 00142 of pages, and it can also be used when there is a scan of a full 00143 table which cannot fit in the memory. Putting the pages near the 00144 of the LRU list, we make sure that most of the buf_pool stays in the 00145 main memory, undisturbed. 00146 00147 The chain of modified blocks contains the blocks 00148 holding file pages that have been modified in the memory 00149 but not written to disk yet. The block with the oldest modification 00150 which has not yet been written to disk is at the end of the chain. 00151 00152 Loading a file page 00153 ------------------- 00154 00155 First, a victim block for replacement has to be found in the 00156 buf_pool. It is taken from the free list or searched for from the 00157 end of the LRU-list. An exclusive lock is reserved for the frame, 00158 the io_fix field is set in the block fixing the block in buf_pool, 00159 and the io-operation for loading the page is queued. The io-handler thread 00160 releases the X-lock on the frame and resets the io_fix field 00161 when the io operation completes. 00162 00163 A thread may request the above operation using the buf_page_get- 00164 function. It may then continue to request a lock on the frame. 00165 The lock is granted when the io-handler releases the x-lock. 00166 00167 Read-ahead 00168 ---------- 00169 00170 The read-ahead mechanism is intended to be intelligent and 00171 isolated from the semantically higher levels of the database 00172 index management. From the higher level we only need the 00173 information if a file page has a natural successor or 00174 predecessor page. On the leaf level of a B-tree index, 00175 these are the next and previous pages in the natural 00176 order of the pages. 00177 00178 Let us first explain the read-ahead mechanism when the leafs 00179 of a B-tree are scanned in an ascending or descending order. 00180 When a read page is the first time referenced in the buf_pool, 00181 the buffer manager checks if it is at the border of a so-called 00182 linear read-ahead area. The tablespace is divided into these 00183 areas of size 64 blocks, for example. So if the page is at the 00184 border of such an area, the read-ahead mechanism checks if 00185 all the other blocks in the area have been accessed in an 00186 ascending or descending order. If this is the case, the system 00187 looks at the natural successor or predecessor of the page, 00188 checks if that is at the border of another area, and in this case 00189 issues read-requests for all the pages in that area. Maybe 00190 we could relax the condition that all the pages in the area 00191 have to be accessed: if data is deleted from a table, there may 00192 appear holes of unused pages in the area. 00193 00194 A different read-ahead mechanism is used when there appears 00195 to be a random access pattern to a file. 00196 If a new page is referenced in the buf_pool, and several pages 00197 of its random access area (for instance, 32 consecutive pages 00198 in a tablespace) have recently been referenced, we may predict 00199 that the whole area may be needed in the near future, and issue 00200 the read requests for the whole area. 00201 00202 AWE implementation 00203 ------------------ 00204 00205 By a 'block' we mean the buffer header of type buf_block_t. By a 'page' 00206 we mean the physical 16 kB memory area allocated from RAM for that block. 00207 By a 'frame' we mean a 16 kB area in the virtual address space of the 00208 process, in the frame_mem of buf_pool. 00209 00210 We can map pages to the frames of the buffer pool. 00211 00212 1) A buffer block allocated to use as a non-data page, e.g., to the lock 00213 table, is always mapped to a frame. 00214 2) A bufferfixed or io-fixed data page is always mapped to a frame. 00215 3) When we need to map a block to frame, we look from the list 00216 awe_LRU_free_mapped and try to unmap its last block, but note that 00217 bufferfixed or io-fixed pages cannot be unmapped. 00218 4) For every frame in the buffer pool there is always a block whose page is 00219 mapped to it. When we create the buffer pool, we map the first elements 00220 in the free list to the frames. 00221 5) When we have AWE enabled, we disable adaptive hash indexes. 00222 */ 00223 00224 buf_pool_t* buf_pool = NULL; /* The buffer buf_pool of the database */ 00225 00226 #ifdef UNIV_DEBUG 00227 ulint buf_dbg_counter = 0; /* This is used to insert validation 00228 operations in excution in the 00229 debug version */ 00230 ibool buf_debug_prints = FALSE; /* If this is set TRUE, 00231 the program prints info whenever 00232 read-ahead or flush occurs */ 00233 #endif /* UNIV_DEBUG */ 00234 /************************************************************************ 00235 Calculates a page checksum which is stored to the page when it is written 00236 to a file. Note that we must be careful to calculate the same value on 00237 32-bit and 64-bit architectures. */ 00238 00239 ulint 00240 buf_calc_page_new_checksum( 00241 /*=======================*/ 00242 /* out: checksum */ 00243 byte* page) /* in: buffer page */ 00244 { 00245 ulint checksum; 00246 00247 /* Since the field FIL_PAGE_FILE_FLUSH_LSN, and in versions <= 4.1.x 00248 ..._ARCH_LOG_NO, are written outside the buffer pool to the first 00249 pages of data files, we have to skip them in the page checksum 00250 calculation. 00251 We must also skip the field FIL_PAGE_SPACE_OR_CHKSUM where the 00252 checksum is stored, and also the last 8 bytes of page because 00253 there we store the old formula checksum. */ 00254 00255 checksum = ut_fold_binary(page + FIL_PAGE_OFFSET, 00256 FIL_PAGE_FILE_FLUSH_LSN - FIL_PAGE_OFFSET) 00257 + ut_fold_binary(page + FIL_PAGE_DATA, 00258 UNIV_PAGE_SIZE - FIL_PAGE_DATA 00259 - FIL_PAGE_END_LSN_OLD_CHKSUM); 00260 checksum = checksum & 0xFFFFFFFFUL; 00261 00262 return(checksum); 00263 } 00264 00265 /************************************************************************ 00266 In versions < 4.0.14 and < 4.1.1 there was a bug that the checksum only 00267 looked at the first few bytes of the page. This calculates that old 00268 checksum. 00269 NOTE: we must first store the new formula checksum to 00270 FIL_PAGE_SPACE_OR_CHKSUM before calculating and storing this old checksum 00271 because this takes that field as an input! */ 00272 00273 ulint 00274 buf_calc_page_old_checksum( 00275 /*=======================*/ 00276 /* out: checksum */ 00277 byte* page) /* in: buffer page */ 00278 { 00279 ulint checksum; 00280 00281 checksum = ut_fold_binary(page, FIL_PAGE_FILE_FLUSH_LSN); 00282 00283 checksum = checksum & 0xFFFFFFFFUL; 00284 00285 return(checksum); 00286 } 00287 00288 /************************************************************************ 00289 Checks if a page is corrupt. */ 00290 00291 ibool 00292 buf_page_is_corrupted( 00293 /*==================*/ 00294 /* out: TRUE if corrupted */ 00295 byte* read_buf) /* in: a database page */ 00296 { 00297 ulint checksum; 00298 ulint old_checksum; 00299 ulint checksum_field; 00300 ulint old_checksum_field; 00301 #ifndef UNIV_HOTBACKUP 00302 dulint current_lsn; 00303 #endif 00304 if (mach_read_from_4(read_buf + FIL_PAGE_LSN + 4) 00305 != mach_read_from_4(read_buf + UNIV_PAGE_SIZE 00306 - FIL_PAGE_END_LSN_OLD_CHKSUM + 4)) { 00307 00308 /* Stored log sequence numbers at the start and the end 00309 of page do not match */ 00310 00311 return(TRUE); 00312 } 00313 00314 #ifndef UNIV_HOTBACKUP 00315 if (recv_lsn_checks_on && log_peek_lsn(¤t_lsn)) { 00316 if (ut_dulint_cmp(current_lsn, 00317 mach_read_from_8(read_buf + FIL_PAGE_LSN)) 00318 < 0) { 00319 ut_print_timestamp(stderr); 00320 00321 fprintf(stderr, 00322 " InnoDB: Error: page %lu log sequence number %lu %lu\n" 00323 "InnoDB: is in the future! Current system log sequence number %lu %lu.\n" 00324 "InnoDB: Your database may be corrupt or you may have copied the InnoDB\n" 00325 "InnoDB: tablespace but not the InnoDB log files. See\n" 00326 "http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.\n", 00327 (ulong) mach_read_from_4(read_buf + FIL_PAGE_OFFSET), 00328 (ulong) ut_dulint_get_high( 00329 mach_read_from_8(read_buf + FIL_PAGE_LSN)), 00330 (ulong) ut_dulint_get_low( 00331 mach_read_from_8(read_buf + FIL_PAGE_LSN)), 00332 (ulong) ut_dulint_get_high(current_lsn), 00333 (ulong) ut_dulint_get_low(current_lsn)); 00334 } 00335 } 00336 #endif 00337 00338 /* If we use checksums validation, make additional check before 00339 returning TRUE to ensure that the checksum is not equal to 00340 BUF_NO_CHECKSUM_MAGIC which might be stored by InnoDB with checksums 00341 disabled. Otherwise, skip checksum calculation and return FALSE */ 00342 00343 if (srv_use_checksums) { 00344 old_checksum = buf_calc_page_old_checksum(read_buf); 00345 00346 old_checksum_field = mach_read_from_4(read_buf + UNIV_PAGE_SIZE 00347 - FIL_PAGE_END_LSN_OLD_CHKSUM); 00348 00349 /* There are 2 valid formulas for old_checksum_field: 00350 00351 1. Very old versions of InnoDB only stored 8 byte lsn to the 00352 start and the end of the page. 00353 00354 2. Newer InnoDB versions store the old formula checksum 00355 there. */ 00356 00357 if (old_checksum_field != mach_read_from_4(read_buf 00358 + FIL_PAGE_LSN) 00359 && old_checksum_field != old_checksum 00360 && old_checksum_field != BUF_NO_CHECKSUM_MAGIC) { 00361 00362 return(TRUE); 00363 } 00364 00365 checksum = buf_calc_page_new_checksum(read_buf); 00366 checksum_field = mach_read_from_4(read_buf + 00367 FIL_PAGE_SPACE_OR_CHKSUM); 00368 00369 /* InnoDB versions < 4.0.14 and < 4.1.1 stored the space id 00370 (always equal to 0), to FIL_PAGE_SPACE_SPACE_OR_CHKSUM */ 00371 00372 if (checksum_field != 0 && checksum_field != checksum 00373 && checksum_field != BUF_NO_CHECKSUM_MAGIC) { 00374 00375 return(TRUE); 00376 } 00377 } 00378 00379 return(FALSE); 00380 } 00381 00382 /************************************************************************ 00383 Prints a page to stderr. */ 00384 00385 void 00386 buf_page_print( 00387 /*===========*/ 00388 byte* read_buf) /* in: a database page */ 00389 { 00390 dict_index_t* index; 00391 ulint checksum; 00392 ulint old_checksum; 00393 00394 ut_print_timestamp(stderr); 00395 fprintf(stderr, " InnoDB: Page dump in ascii and hex (%lu bytes):\n", 00396 (ulint)UNIV_PAGE_SIZE); 00397 ut_print_buf(stderr, read_buf, UNIV_PAGE_SIZE); 00398 fputs("InnoDB: End of page dump\n", stderr); 00399 00400 checksum = srv_use_checksums ? 00401 buf_calc_page_new_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC; 00402 old_checksum = srv_use_checksums ? 00403 buf_calc_page_old_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC; 00404 00405 ut_print_timestamp(stderr); 00406 fprintf(stderr, 00407 " InnoDB: Page checksum %lu, prior-to-4.0.14-form checksum %lu\n" 00408 "InnoDB: stored checksum %lu, prior-to-4.0.14-form stored checksum %lu\n", 00409 (ulong) checksum, (ulong) old_checksum, 00410 (ulong) mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM), 00411 (ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE 00412 - FIL_PAGE_END_LSN_OLD_CHKSUM)); 00413 fprintf(stderr, 00414 "InnoDB: Page lsn %lu %lu, low 4 bytes of lsn at page end %lu\n" 00415 "InnoDB: Page number (if stored to page already) %lu,\n" 00416 "InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) %lu\n", 00417 (ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN), 00418 (ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN + 4), 00419 (ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE 00420 - FIL_PAGE_END_LSN_OLD_CHKSUM + 4), 00421 (ulong) mach_read_from_4(read_buf + FIL_PAGE_OFFSET), 00422 (ulong) mach_read_from_4(read_buf + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID)); 00423 00424 if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE) 00425 == TRX_UNDO_INSERT) { 00426 fprintf(stderr, 00427 "InnoDB: Page may be an insert undo log page\n"); 00428 } else if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR 00429 + TRX_UNDO_PAGE_TYPE) 00430 == TRX_UNDO_UPDATE) { 00431 fprintf(stderr, 00432 "InnoDB: Page may be an update undo log page\n"); 00433 } 00434 00435 switch (fil_page_get_type(read_buf)) { 00436 case FIL_PAGE_INDEX: 00437 fprintf(stderr, 00438 "InnoDB: Page may be an index page where index id is %lu %lu\n", 00439 (ulong) ut_dulint_get_high(btr_page_get_index_id(read_buf)), 00440 (ulong) ut_dulint_get_low(btr_page_get_index_id(read_buf))); 00441 00442 /* If the code is in ibbackup, dict_sys may be uninitialized, 00443 i.e., NULL */ 00444 00445 if (dict_sys != NULL) { 00446 00447 index = dict_index_find_on_id_low( 00448 btr_page_get_index_id(read_buf)); 00449 if (index) { 00450 fputs("InnoDB: (", stderr); 00451 dict_index_name_print(stderr, NULL, index); 00452 fputs(")\n", stderr); 00453 } 00454 } 00455 break; 00456 case FIL_PAGE_INODE: 00457 fputs("InnoDB: Page may be an 'inode' page\n", stderr); 00458 break; 00459 case FIL_PAGE_IBUF_FREE_LIST: 00460 fputs("InnoDB: Page may be an insert buffer free list page\n", 00461 stderr); 00462 break; 00463 case FIL_PAGE_TYPE_ALLOCATED: 00464 fputs("InnoDB: Page may be a freshly allocated page\n", 00465 stderr); 00466 break; 00467 case FIL_PAGE_IBUF_BITMAP: 00468 fputs("InnoDB: Page may be an insert buffer bitmap page\n", 00469 stderr); 00470 break; 00471 case FIL_PAGE_TYPE_SYS: 00472 fputs("InnoDB: Page may be a system page\n", 00473 stderr); 00474 break; 00475 case FIL_PAGE_TYPE_TRX_SYS: 00476 fputs("InnoDB: Page may be a transaction system page\n", 00477 stderr); 00478 break; 00479 case FIL_PAGE_TYPE_FSP_HDR: 00480 fputs("InnoDB: Page may be a file space header page\n", 00481 stderr); 00482 break; 00483 case FIL_PAGE_TYPE_XDES: 00484 fputs("InnoDB: Page may be an extent descriptor page\n", 00485 stderr); 00486 break; 00487 case FIL_PAGE_TYPE_BLOB: 00488 fputs("InnoDB: Page may be a BLOB page\n", 00489 stderr); 00490 break; 00491 } 00492 } 00493 00494 /************************************************************************ 00495 Initializes a buffer control block when the buf_pool is created. */ 00496 static 00497 void 00498 buf_block_init( 00499 /*===========*/ 00500 buf_block_t* block, /* in: pointer to control block */ 00501 byte* frame) /* in: pointer to buffer frame, or NULL if in 00502 the case of AWE there is no frame */ 00503 { 00504 block->magic_n = 0; 00505 00506 block->state = BUF_BLOCK_NOT_USED; 00507 00508 block->frame = frame; 00509 00510 block->awe_info = NULL; 00511 00512 block->buf_fix_count = 0; 00513 block->io_fix = 0; 00514 00515 block->modify_clock = ut_dulint_zero; 00516 00517 block->file_page_was_freed = FALSE; 00518 00519 block->check_index_page_at_flush = FALSE; 00520 block->index = NULL; 00521 00522 block->in_free_list = FALSE; 00523 block->in_LRU_list = FALSE; 00524 00525 block->n_pointers = 0; 00526 00527 rw_lock_create(&block->lock, SYNC_LEVEL_VARYING); 00528 ut_ad(rw_lock_validate(&(block->lock))); 00529 00530 #ifdef UNIV_SYNC_DEBUG 00531 rw_lock_create(&block->debug_latch, SYNC_NO_ORDER_CHECK); 00532 #endif /* UNIV_SYNC_DEBUG */ 00533 } 00534 00535 /************************************************************************ 00536 Creates the buffer pool. */ 00537 00538 buf_pool_t* 00539 buf_pool_init( 00540 /*==========*/ 00541 /* out, own: buf_pool object, NULL if not 00542 enough memory or error */ 00543 ulint max_size, /* in: maximum size of the buf_pool in 00544 blocks */ 00545 ulint curr_size, /* in: current size to use, must be <= 00546 max_size, currently must be equal to 00547 max_size */ 00548 ulint n_frames) /* in: number of frames; if AWE is used, 00549 this is the size of the address space window 00550 where physical memory pages are mapped; if 00551 AWE is not used then this must be the same 00552 as max_size */ 00553 { 00554 byte* frame; 00555 ulint i; 00556 buf_block_t* block; 00557 00558 ut_a(max_size == curr_size); 00559 ut_a(srv_use_awe || n_frames == max_size); 00560 00561 if (n_frames > curr_size) { 00562 fprintf(stderr, 00563 "InnoDB: AWE: Error: you must specify in my.cnf .._awe_mem_mb larger\n" 00564 "InnoDB: than .._buffer_pool_size. Now the former is %lu pages,\n" 00565 "InnoDB: the latter %lu pages.\n", (ulong) curr_size, (ulong) n_frames); 00566 00567 return(NULL); 00568 } 00569 00570 buf_pool = mem_alloc(sizeof(buf_pool_t)); 00571 00572 /* 1. Initialize general fields 00573 ---------------------------- */ 00574 mutex_create(&buf_pool->mutex, SYNC_BUF_POOL); 00575 00576 mutex_enter(&(buf_pool->mutex)); 00577 00578 if (srv_use_awe) { 00579 /*----------------------------------------*/ 00580 /* Allocate the virtual address space window, i.e., the 00581 buffer pool frames */ 00582 00583 buf_pool->frame_mem = os_awe_allocate_virtual_mem_window( 00584 UNIV_PAGE_SIZE * (n_frames + 1)); 00585 00586 /* Allocate the physical memory for AWE and the AWE info array 00587 for buf_pool */ 00588 00589 if ((curr_size % ((1024 * 1024) / UNIV_PAGE_SIZE)) != 0) { 00590 00591 fprintf(stderr, 00592 "InnoDB: AWE: Error: physical memory must be allocated in full megabytes.\n" 00593 "InnoDB: Trying to allocate %lu database pages.\n", 00594 (ulong) curr_size); 00595 00596 return(NULL); 00597 } 00598 00599 if (!os_awe_allocate_physical_mem(&(buf_pool->awe_info), 00600 curr_size / ((1024 * 1024) / UNIV_PAGE_SIZE))) { 00601 00602 return(NULL); 00603 } 00604 /*----------------------------------------*/ 00605 } else { 00606 buf_pool->frame_mem = os_mem_alloc_large( 00607 UNIV_PAGE_SIZE * (n_frames + 1), 00608 TRUE, FALSE); 00609 } 00610 00611 if (buf_pool->frame_mem == NULL) { 00612 00613 return(NULL); 00614 } 00615 00616 buf_pool->blocks = ut_malloc(sizeof(buf_block_t) * max_size); 00617 00618 if (buf_pool->blocks == NULL) { 00619 00620 return(NULL); 00621 } 00622 00623 buf_pool->max_size = max_size; 00624 buf_pool->curr_size = curr_size; 00625 00626 buf_pool->n_frames = n_frames; 00627 00628 /* Align pointer to the first frame */ 00629 00630 frame = ut_align(buf_pool->frame_mem, UNIV_PAGE_SIZE); 00631 00632 buf_pool->frame_zero = frame; 00633 buf_pool->high_end = frame + UNIV_PAGE_SIZE * n_frames; 00634 00635 if (srv_use_awe) { 00636 /*----------------------------------------*/ 00637 /* Map an initial part of the allocated physical memory to 00638 the window */ 00639 00640 os_awe_map_physical_mem_to_window(buf_pool->frame_zero, 00641 n_frames * 00642 (UNIV_PAGE_SIZE / OS_AWE_X86_PAGE_SIZE), 00643 buf_pool->awe_info); 00644 /*----------------------------------------*/ 00645 } 00646 00647 buf_pool->blocks_of_frames = ut_malloc(sizeof(void*) * n_frames); 00648 00649 if (buf_pool->blocks_of_frames == NULL) { 00650 00651 return(NULL); 00652 } 00653 00654 /* Init block structs and assign frames for them; in the case of 00655 AWE there are less frames than blocks. Then we assign the frames 00656 to the first blocks (we already mapped the memory above). We also 00657 init the awe_info for every block. */ 00658 00659 for (i = 0; i < max_size; i++) { 00660 00661 block = buf_pool_get_nth_block(buf_pool, i); 00662 00663 if (i < n_frames) { 00664 frame = buf_pool->frame_zero + i * UNIV_PAGE_SIZE; 00665 *(buf_pool->blocks_of_frames + i) = block; 00666 } else { 00667 frame = NULL; 00668 } 00669 00670 buf_block_init(block, frame); 00671 00672 if (srv_use_awe) { 00673 /*----------------------------------------*/ 00674 block->awe_info = buf_pool->awe_info 00675 + i * (UNIV_PAGE_SIZE / OS_AWE_X86_PAGE_SIZE); 00676 /*----------------------------------------*/ 00677 } 00678 } 00679 00680 buf_pool->page_hash = hash_create(2 * max_size); 00681 00682 buf_pool->n_pend_reads = 0; 00683 00684 buf_pool->last_printout_time = time(NULL); 00685 00686 buf_pool->n_pages_read = 0; 00687 buf_pool->n_pages_written = 0; 00688 buf_pool->n_pages_created = 0; 00689 buf_pool->n_pages_awe_remapped = 0; 00690 00691 buf_pool->n_page_gets = 0; 00692 buf_pool->n_page_gets_old = 0; 00693 buf_pool->n_pages_read_old = 0; 00694 buf_pool->n_pages_written_old = 0; 00695 buf_pool->n_pages_created_old = 0; 00696 buf_pool->n_pages_awe_remapped_old = 0; 00697 00698 /* 2. Initialize flushing fields 00699 ---------------------------- */ 00700 UT_LIST_INIT(buf_pool->flush_list); 00701 00702 for (i = BUF_FLUSH_LRU; i <= BUF_FLUSH_LIST; i++) { 00703 buf_pool->n_flush[i] = 0; 00704 buf_pool->init_flush[i] = FALSE; 00705 buf_pool->no_flush[i] = os_event_create(NULL); 00706 } 00707 00708 buf_pool->LRU_flush_ended = 0; 00709 00710 buf_pool->ulint_clock = 1; 00711 buf_pool->freed_page_clock = 0; 00712 00713 /* 3. Initialize LRU fields 00714 ---------------------------- */ 00715 UT_LIST_INIT(buf_pool->LRU); 00716 00717 buf_pool->LRU_old = NULL; 00718 00719 UT_LIST_INIT(buf_pool->awe_LRU_free_mapped); 00720 00721 /* Add control blocks to the free list */ 00722 UT_LIST_INIT(buf_pool->free); 00723 00724 for (i = 0; i < curr_size; i++) { 00725 00726 block = buf_pool_get_nth_block(buf_pool, i); 00727 00728 if (block->frame) { 00729 /* Wipe contents of frame to eliminate a Purify 00730 warning */ 00731 00732 #ifdef HAVE_purify 00733 memset(block->frame, '\0', UNIV_PAGE_SIZE); 00734 #endif 00735 if (srv_use_awe) { 00736 /* Add to the list of blocks mapped to 00737 frames */ 00738 00739 UT_LIST_ADD_LAST(awe_LRU_free_mapped, 00740 buf_pool->awe_LRU_free_mapped, block); 00741 } 00742 } 00743 00744 UT_LIST_ADD_LAST(free, buf_pool->free, block); 00745 block->in_free_list = TRUE; 00746 } 00747 00748 mutex_exit(&(buf_pool->mutex)); 00749 00750 if (srv_use_adaptive_hash_indexes) { 00751 btr_search_sys_create( 00752 curr_size * UNIV_PAGE_SIZE / sizeof(void*) / 64); 00753 } else { 00754 /* Create only a small dummy system */ 00755 btr_search_sys_create(1000); 00756 } 00757 00758 return(buf_pool); 00759 } 00760 00761 /************************************************************************ 00762 Maps the page of block to a frame, if not mapped yet. Unmaps some page 00763 from the end of the awe_LRU_free_mapped. */ 00764 00765 void 00766 buf_awe_map_page_to_frame( 00767 /*======================*/ 00768 buf_block_t* block, /* in: block whose page should be 00769 mapped to a frame */ 00770 ibool add_to_mapped_list) /* in: TRUE if we in the case 00771 we need to map the page should also 00772 add the block to the 00773 awe_LRU_free_mapped list */ 00774 { 00775 buf_block_t* bck; 00776 00777 #ifdef UNIV_SYNC_DEBUG 00778 ut_ad(mutex_own(&(buf_pool->mutex))); 00779 #endif /* UNIV_SYNC_DEBUG */ 00780 ut_ad(block); 00781 00782 if (block->frame) { 00783 00784 return; 00785 } 00786 00787 /* Scan awe_LRU_free_mapped from the end and try to find a block 00788 which is not bufferfixed or io-fixed */ 00789 00790 bck = UT_LIST_GET_LAST(buf_pool->awe_LRU_free_mapped); 00791 00792 while (bck) { 00793 if (bck->state == BUF_BLOCK_FILE_PAGE 00794 && (bck->buf_fix_count != 0 || bck->io_fix != 0)) { 00795 00796 /* We have to skip this */ 00797 bck = UT_LIST_GET_PREV(awe_LRU_free_mapped, bck); 00798 } else { 00799 /* We can map block to the frame of bck */ 00800 00801 os_awe_map_physical_mem_to_window( 00802 bck->frame, 00803 UNIV_PAGE_SIZE / OS_AWE_X86_PAGE_SIZE, 00804 block->awe_info); 00805 00806 block->frame = bck->frame; 00807 00808 *(buf_pool->blocks_of_frames 00809 + (((ulint)(block->frame 00810 - buf_pool->frame_zero)) 00811 >> UNIV_PAGE_SIZE_SHIFT)) 00812 = block; 00813 00814 bck->frame = NULL; 00815 UT_LIST_REMOVE(awe_LRU_free_mapped, 00816 buf_pool->awe_LRU_free_mapped, 00817 bck); 00818 00819 if (add_to_mapped_list) { 00820 UT_LIST_ADD_FIRST(awe_LRU_free_mapped, 00821 buf_pool->awe_LRU_free_mapped, 00822 block); 00823 } 00824 00825 buf_pool->n_pages_awe_remapped++; 00826 00827 return; 00828 } 00829 } 00830 00831 fprintf(stderr, 00832 "InnoDB: AWE: Fatal error: cannot find a page to unmap\n" 00833 "InnoDB: awe_LRU_free_mapped list length %lu\n", 00834 (ulong) UT_LIST_GET_LEN(buf_pool->awe_LRU_free_mapped)); 00835 00836 ut_a(0); 00837 } 00838 00839 /************************************************************************ 00840 Allocates a buffer block. */ 00841 UNIV_INLINE 00842 buf_block_t* 00843 buf_block_alloc(void) 00844 /*=================*/ 00845 /* out, own: the allocated block; also if AWE 00846 is used it is guaranteed that the page is 00847 mapped to a frame */ 00848 { 00849 buf_block_t* block; 00850 00851 block = buf_LRU_get_free_block(); 00852 00853 return(block); 00854 } 00855 00856 /************************************************************************ 00857 Moves to the block to the start of the LRU list if there is a danger 00858 that the block would drift out of the buffer pool. */ 00859 UNIV_INLINE 00860 void 00861 buf_block_make_young( 00862 /*=================*/ 00863 buf_block_t* block) /* in: block to make younger */ 00864 { 00865 if (buf_pool->freed_page_clock >= block->freed_page_clock 00866 + 1 + (buf_pool->curr_size / 1024)) { 00867 00868 /* There has been freeing activity in the LRU list: 00869 best to move to the head of the LRU list */ 00870 00871 buf_LRU_make_block_young(block); 00872 } 00873 } 00874 00875 /************************************************************************ 00876 Moves a page to the start of the buffer pool LRU list. This high-level 00877 function can be used to prevent an important page from from slipping out of 00878 the buffer pool. */ 00879 00880 void 00881 buf_page_make_young( 00882 /*================*/ 00883 buf_frame_t* frame) /* in: buffer frame of a file page */ 00884 { 00885 buf_block_t* block; 00886 00887 mutex_enter(&(buf_pool->mutex)); 00888 00889 block = buf_block_align(frame); 00890 00891 ut_a(block->state == BUF_BLOCK_FILE_PAGE); 00892 00893 buf_LRU_make_block_young(block); 00894 00895 mutex_exit(&(buf_pool->mutex)); 00896 } 00897 00898 /************************************************************************ 00899 Frees a buffer block which does not contain a file page. */ 00900 UNIV_INLINE 00901 void 00902 buf_block_free( 00903 /*===========*/ 00904 buf_block_t* block) /* in, own: block to be freed */ 00905 { 00906 ut_a(block->state != BUF_BLOCK_FILE_PAGE); 00907 00908 mutex_enter(&(buf_pool->mutex)); 00909 00910 buf_LRU_block_free_non_file_page(block); 00911 00912 mutex_exit(&(buf_pool->mutex)); 00913 } 00914 00915 /************************************************************************* 00916 Allocates a buffer frame. */ 00917 00918 buf_frame_t* 00919 buf_frame_alloc(void) 00920 /*=================*/ 00921 /* out: buffer frame */ 00922 { 00923 return(buf_block_alloc()->frame); 00924 } 00925 00926 /************************************************************************* 00927 Frees a buffer frame which does not contain a file page. */ 00928 00929 void 00930 buf_frame_free( 00931 /*===========*/ 00932 buf_frame_t* frame) /* in: buffer frame */ 00933 { 00934 buf_block_free(buf_block_align(frame)); 00935 } 00936 00937 /************************************************************************ 00938 Returns the buffer control block if the page can be found in the buffer 00939 pool. NOTE that it is possible that the page is not yet read 00940 from disk, though. This is a very low-level function: use with care! */ 00941 00942 buf_block_t* 00943 buf_page_peek_block( 00944 /*================*/ 00945 /* out: control block if found from page hash table, 00946 otherwise NULL; NOTE that the page is not necessarily 00947 yet read from disk! */ 00948 ulint space, /* in: space id */ 00949 ulint offset) /* in: page number */ 00950 { 00951 buf_block_t* block; 00952 00953 mutex_enter_fast(&(buf_pool->mutex)); 00954 00955 block = buf_page_hash_get(space, offset); 00956 00957 mutex_exit(&(buf_pool->mutex)); 00958 00959 return(block); 00960 } 00961 00962 /************************************************************************ 00963 Resets the check_index_page_at_flush field of a page if found in the buffer 00964 pool. */ 00965 00966 void 00967 buf_reset_check_index_page_at_flush( 00968 /*================================*/ 00969 ulint space, /* in: space id */ 00970 ulint offset) /* in: page number */ 00971 { 00972 buf_block_t* block; 00973 00974 mutex_enter_fast(&(buf_pool->mutex)); 00975 00976 block = buf_page_hash_get(space, offset); 00977 00978 if (block) { 00979 block->check_index_page_at_flush = FALSE; 00980 } 00981 00982 mutex_exit(&(buf_pool->mutex)); 00983 } 00984 00985 /************************************************************************ 00986 Returns the current state of is_hashed of a page. FALSE if the page is 00987 not in the pool. NOTE that this operation does not fix the page in the 00988 pool if it is found there. */ 00989 00990 ibool 00991 buf_page_peek_if_search_hashed( 00992 /*===========================*/ 00993 /* out: TRUE if page hash index is built in search 00994 system */ 00995 ulint space, /* in: space id */ 00996 ulint offset) /* in: page number */ 00997 { 00998 buf_block_t* block; 00999 ibool is_hashed; 01000 01001 mutex_enter_fast(&(buf_pool->mutex)); 01002 01003 block = buf_page_hash_get(space, offset); 01004 01005 if (!block) { 01006 is_hashed = FALSE; 01007 } else { 01008 is_hashed = block->is_hashed; 01009 } 01010 01011 mutex_exit(&(buf_pool->mutex)); 01012 01013 return(is_hashed); 01014 } 01015 01016 /************************************************************************ 01017 Returns TRUE if the page can be found in the buffer pool hash table. NOTE 01018 that it is possible that the page is not yet read from disk, though. */ 01019 01020 ibool 01021 buf_page_peek( 01022 /*==========*/ 01023 /* out: TRUE if found from page hash table, 01024 NOTE that the page is not necessarily yet read 01025 from disk! */ 01026 ulint space, /* in: space id */ 01027 ulint offset) /* in: page number */ 01028 { 01029 if (buf_page_peek_block(space, offset)) { 01030 01031 return(TRUE); 01032 } 01033 01034 return(FALSE); 01035 } 01036 01037 /************************************************************************ 01038 Sets file_page_was_freed TRUE if the page is found in the buffer pool. 01039 This function should be called when we free a file page and want the 01040 debug version to check that it is not accessed any more unless 01041 reallocated. */ 01042 01043 buf_block_t* 01044 buf_page_set_file_page_was_freed( 01045 /*=============================*/ 01046 /* out: control block if found from page hash table, 01047 otherwise NULL */ 01048 ulint space, /* in: space id */ 01049 ulint offset) /* in: page number */ 01050 { 01051 buf_block_t* block; 01052 01053 mutex_enter_fast(&(buf_pool->mutex)); 01054 01055 block = buf_page_hash_get(space, offset); 01056 01057 if (block) { 01058 block->file_page_was_freed = TRUE; 01059 } 01060 01061 mutex_exit(&(buf_pool->mutex)); 01062 01063 return(block); 01064 } 01065 01066 /************************************************************************ 01067 Sets file_page_was_freed FALSE if the page is found in the buffer pool. 01068 This function should be called when we free a file page and want the 01069 debug version to check that it is not accessed any more unless 01070 reallocated. */ 01071 01072 buf_block_t* 01073 buf_page_reset_file_page_was_freed( 01074 /*===============================*/ 01075 /* out: control block if found from page hash table, 01076 otherwise NULL */ 01077 ulint space, /* in: space id */ 01078 ulint offset) /* in: page number */ 01079 { 01080 buf_block_t* block; 01081 01082 mutex_enter_fast(&(buf_pool->mutex)); 01083 01084 block = buf_page_hash_get(space, offset); 01085 01086 if (block) { 01087 block->file_page_was_freed = FALSE; 01088 } 01089 01090 mutex_exit(&(buf_pool->mutex)); 01091 01092 return(block); 01093 } 01094 01095 /************************************************************************ 01096 This is the general function used to get access to a database page. */ 01097 01098 buf_frame_t* 01099 buf_page_get_gen( 01100 /*=============*/ 01101 /* out: pointer to the frame or NULL */ 01102 ulint space, /* in: space id */ 01103 ulint offset, /* in: page number */ 01104 ulint rw_latch,/* in: RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH */ 01105 buf_frame_t* guess, /* in: guessed frame or NULL */ 01106 ulint mode, /* in: BUF_GET, BUF_GET_IF_IN_POOL, 01107 BUF_GET_NO_LATCH, BUF_GET_NOWAIT */ 01108 const char* file, /* in: file name */ 01109 ulint line, /* in: line where called */ 01110 mtr_t* mtr) /* in: mini-transaction */ 01111 { 01112 buf_block_t* block; 01113 ibool accessed; 01114 ulint fix_type; 01115 ibool success; 01116 ibool must_read; 01117 01118 ut_ad(mtr); 01119 ut_ad((rw_latch == RW_S_LATCH) 01120 || (rw_latch == RW_X_LATCH) 01121 || (rw_latch == RW_NO_LATCH)); 01122 ut_ad((mode != BUF_GET_NO_LATCH) || (rw_latch == RW_NO_LATCH)); 01123 ut_ad((mode == BUF_GET) || (mode == BUF_GET_IF_IN_POOL) 01124 || (mode == BUF_GET_NO_LATCH) || (mode == BUF_GET_NOWAIT)); 01125 #ifndef UNIV_LOG_DEBUG 01126 ut_ad(!ibuf_inside() || ibuf_page(space, offset)); 01127 #endif 01128 buf_pool->n_page_gets++; 01129 loop: 01130 mutex_enter_fast(&(buf_pool->mutex)); 01131 01132 block = NULL; 01133 01134 if (guess) { 01135 block = buf_block_align(guess); 01136 01137 if ((offset != block->offset) || (space != block->space) 01138 || (block->state != BUF_BLOCK_FILE_PAGE)) { 01139 01140 block = NULL; 01141 } 01142 } 01143 01144 if (block == NULL) { 01145 block = buf_page_hash_get(space, offset); 01146 } 01147 01148 if (block == NULL) { 01149 /* Page not in buf_pool: needs to be read from file */ 01150 01151 mutex_exit(&(buf_pool->mutex)); 01152 01153 if (mode == BUF_GET_IF_IN_POOL) { 01154 01155 return(NULL); 01156 } 01157 01158 buf_read_page(space, offset); 01159 01160 #ifdef UNIV_DEBUG 01161 buf_dbg_counter++; 01162 01163 if (buf_dbg_counter % 37 == 0) { 01164 ut_ad(buf_validate()); 01165 } 01166 #endif 01167 goto loop; 01168 } 01169 01170 ut_a(block->state == BUF_BLOCK_FILE_PAGE); 01171 01172 must_read = FALSE; 01173 01174 if (block->io_fix == BUF_IO_READ) { 01175 01176 must_read = TRUE; 01177 01178 if (mode == BUF_GET_IF_IN_POOL) { 01179 01180 /* The page is only being read to buffer */ 01181 mutex_exit(&(buf_pool->mutex)); 01182 01183 return(NULL); 01184 } 01185 } 01186 01187 /* If AWE is enabled and the page is not mapped to a frame, then 01188 map it */ 01189 01190 if (block->frame == NULL) { 01191 ut_a(srv_use_awe); 01192 01193 /* We set second parameter TRUE because the block is in the 01194 LRU list and we must put it to awe_LRU_free_mapped list once 01195 mapped to a frame */ 01196 01197 buf_awe_map_page_to_frame(block, TRUE); 01198 } 01199 01200 #ifdef UNIV_SYNC_DEBUG 01201 buf_block_buf_fix_inc_debug(block, file, line); 01202 #else 01203 buf_block_buf_fix_inc(block); 01204 #endif 01205 buf_block_make_young(block); 01206 01207 /* Check if this is the first access to the page */ 01208 01209 accessed = block->accessed; 01210 01211 block->accessed = TRUE; 01212 01213 #ifdef UNIV_DEBUG_FILE_ACCESSES 01214 ut_a(block->file_page_was_freed == FALSE); 01215 #endif 01216 mutex_exit(&(buf_pool->mutex)); 01217 01218 #ifdef UNIV_DEBUG 01219 buf_dbg_counter++; 01220 01221 if (buf_dbg_counter % 5771 == 0) { 01222 ut_ad(buf_validate()); 01223 } 01224 #endif 01225 ut_ad(block->buf_fix_count > 0); 01226 ut_ad(block->state == BUF_BLOCK_FILE_PAGE); 01227 01228 if (mode == BUF_GET_NOWAIT) { 01229 if (rw_latch == RW_S_LATCH) { 01230 success = rw_lock_s_lock_func_nowait(&(block->lock), 01231 file, line); 01232 fix_type = MTR_MEMO_PAGE_S_FIX; 01233 } else { 01234 ut_ad(rw_latch == RW_X_LATCH); 01235 success = rw_lock_x_lock_func_nowait(&(block->lock), 01236 file, line); 01237 fix_type = MTR_MEMO_PAGE_X_FIX; 01238 } 01239 01240 if (!success) { 01241 mutex_enter(&(buf_pool->mutex)); 01242 01243 block->buf_fix_count--; 01244 #ifdef UNIV_SYNC_DEBUG 01245 rw_lock_s_unlock(&(block->debug_latch)); 01246 #endif 01247 mutex_exit(&(buf_pool->mutex)); 01248 01249 return(NULL); 01250 } 01251 } else if (rw_latch == RW_NO_LATCH) { 01252 01253 if (must_read) { 01254 /* Let us wait until the read operation 01255 completes */ 01256 01257 for (;;) { 01258 mutex_enter(&(buf_pool->mutex)); 01259 01260 if (block->io_fix == BUF_IO_READ) { 01261 01262 mutex_exit(&(buf_pool->mutex)); 01263 01264 /* Sleep 20 milliseconds */ 01265 01266 os_thread_sleep(20000); 01267 } else { 01268 01269 mutex_exit(&(buf_pool->mutex)); 01270 01271 break; 01272 } 01273 } 01274 } 01275 01276 fix_type = MTR_MEMO_BUF_FIX; 01277 } else if (rw_latch == RW_S_LATCH) { 01278 01279 rw_lock_s_lock_func(&(block->lock), 0, file, line); 01280 01281 fix_type = MTR_MEMO_PAGE_S_FIX; 01282 } else { 01283 rw_lock_x_lock_func(&(block->lock), 0, file, line); 01284 01285 fix_type = MTR_MEMO_PAGE_X_FIX; 01286 } 01287 01288 mtr_memo_push(mtr, block, fix_type); 01289 01290 if (!accessed) { 01291 /* In the case of a first access, try to apply linear 01292 read-ahead */ 01293 01294 buf_read_ahead_linear(space, offset); 01295 } 01296 01297 #ifdef UNIV_IBUF_DEBUG 01298 ut_a(ibuf_count_get(block->space, block->offset) == 0); 01299 #endif 01300 return(block->frame); 01301 } 01302 01303 /************************************************************************ 01304 This is the general function used to get optimistic access to a database 01305 page. */ 01306 01307 ibool 01308 buf_page_optimistic_get_func( 01309 /*=========================*/ 01310 /* out: TRUE if success */ 01311 ulint rw_latch,/* in: RW_S_LATCH, RW_X_LATCH */ 01312 buf_block_t* block, /* in: guessed buffer block */ 01313 buf_frame_t* guess, /* in: guessed frame; note that AWE may move 01314 frames */ 01315 dulint modify_clock,/* in: modify clock value if mode is 01316 ..._GUESS_ON_CLOCK */ 01317 const char* file, /* in: file name */ 01318 ulint line, /* in: line where called */ 01319 mtr_t* mtr) /* in: mini-transaction */ 01320 { 01321 ibool accessed; 01322 ibool success; 01323 ulint fix_type; 01324 01325 ut_ad(mtr && block); 01326 ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH)); 01327 01328 mutex_enter(&(buf_pool->mutex)); 01329 01330 /* If AWE is used, block may have a different frame now, e.g., NULL */ 01331 01332 if (UNIV_UNLIKELY(block->state != BUF_BLOCK_FILE_PAGE) 01333 || UNIV_UNLIKELY(block->frame != guess)) { 01334 exit_func: 01335 mutex_exit(&(buf_pool->mutex)); 01336 01337 return(FALSE); 01338 } 01339 01340 #ifdef UNIV_SYNC_DEBUG 01341 buf_block_buf_fix_inc_debug(block, file, line); 01342 #else 01343 buf_block_buf_fix_inc(block); 01344 #endif 01345 buf_block_make_young(block); 01346 01347 /* Check if this is the first access to the page */ 01348 01349 accessed = block->accessed; 01350 01351 block->accessed = TRUE; 01352 01353 mutex_exit(&(buf_pool->mutex)); 01354 01355 ut_ad(!ibuf_inside() || ibuf_page(block->space, block->offset)); 01356 01357 if (rw_latch == RW_S_LATCH) { 01358 success = rw_lock_s_lock_func_nowait(&(block->lock), 01359 file, line); 01360 fix_type = MTR_MEMO_PAGE_S_FIX; 01361 } else { 01362 success = rw_lock_x_lock_func_nowait(&(block->lock), 01363 file, line); 01364 fix_type = MTR_MEMO_PAGE_X_FIX; 01365 } 01366 01367 if (UNIV_UNLIKELY(!success)) { 01368 mutex_enter(&(buf_pool->mutex)); 01369 01370 block->buf_fix_count--; 01371 #ifdef UNIV_SYNC_DEBUG 01372 rw_lock_s_unlock(&(block->debug_latch)); 01373 #endif 01374 goto exit_func; 01375 } 01376 01377 if (UNIV_UNLIKELY(!UT_DULINT_EQ(modify_clock, block->modify_clock))) { 01378 #ifdef UNIV_SYNC_DEBUG 01379 buf_page_dbg_add_level(block->frame, SYNC_NO_ORDER_CHECK); 01380 #endif /* UNIV_SYNC_DEBUG */ 01381 if (rw_latch == RW_S_LATCH) { 01382 rw_lock_s_unlock(&(block->lock)); 01383 } else { 01384 rw_lock_x_unlock(&(block->lock)); 01385 } 01386 01387 mutex_enter(&(buf_pool->mutex)); 01388 01389 block->buf_fix_count--; 01390 #ifdef UNIV_SYNC_DEBUG 01391 rw_lock_s_unlock(&(block->debug_latch)); 01392 #endif 01393 goto exit_func; 01394 } 01395 01396 mtr_memo_push(mtr, block, fix_type); 01397 01398 #ifdef UNIV_DEBUG 01399 buf_dbg_counter++; 01400 01401 if (buf_dbg_counter % 5771 == 0) { 01402 ut_ad(buf_validate()); 01403 } 01404 #endif 01405 ut_ad(block->buf_fix_count > 0); 01406 ut_ad(block->state == BUF_BLOCK_FILE_PAGE); 01407 01408 #ifdef UNIV_DEBUG_FILE_ACCESSES 01409 ut_a(block->file_page_was_freed == FALSE); 01410 #endif 01411 if (UNIV_UNLIKELY(!accessed)) { 01412 /* In the case of a first access, try to apply linear 01413 read-ahead */ 01414 01415 buf_read_ahead_linear(buf_frame_get_space_id(guess), 01416 buf_frame_get_page_no(guess)); 01417 } 01418 01419 #ifdef UNIV_IBUF_DEBUG 01420 ut_a(ibuf_count_get(block->space, block->offset) == 0); 01421 #endif 01422 buf_pool->n_page_gets++; 01423 01424 return(TRUE); 01425 } 01426 01427 /************************************************************************ 01428 This is used to get access to a known database page, when no waiting can be 01429 done. For example, if a search in an adaptive hash index leads us to this 01430 frame. */ 01431 01432 ibool 01433 buf_page_get_known_nowait( 01434 /*======================*/ 01435 /* out: TRUE if success */ 01436 ulint rw_latch,/* in: RW_S_LATCH, RW_X_LATCH */ 01437 buf_frame_t* guess, /* in: the known page frame */ 01438 ulint mode, /* in: BUF_MAKE_YOUNG or BUF_KEEP_OLD */ 01439 const char* file, /* in: file name */ 01440 ulint line, /* in: line where called */ 01441 mtr_t* mtr) /* in: mini-transaction */ 01442 { 01443 buf_block_t* block; 01444 ibool success; 01445 ulint fix_type; 01446 01447 ut_ad(mtr); 01448 ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH)); 01449 01450 mutex_enter(&(buf_pool->mutex)); 01451 01452 block = buf_block_align(guess); 01453 01454 if (block->state == BUF_BLOCK_REMOVE_HASH) { 01455 /* Another thread is just freeing the block from the LRU list 01456 of the buffer pool: do not try to access this page; this 01457 attempt to access the page can only come through the hash 01458 index because when the buffer block state is ..._REMOVE_HASH, 01459 we have already removed it from the page address hash table 01460 of the buffer pool. */ 01461 01462 mutex_exit(&(buf_pool->mutex)); 01463 01464 return(FALSE); 01465 } 01466 01467 ut_a(block->state == BUF_BLOCK_FILE_PAGE); 01468 01469 #ifdef UNIV_SYNC_DEBUG 01470 buf_block_buf_fix_inc_debug(block, file, line); 01471 #else 01472 buf_block_buf_fix_inc(block); 01473 #endif 01474 if (mode == BUF_MAKE_YOUNG) { 01475 buf_block_make_young(block); 01476 } 01477 01478 mutex_exit(&(buf_pool->mutex)); 01479 01480 ut_ad(!ibuf_inside() || (mode == BUF_KEEP_OLD)); 01481 01482 if (rw_latch == RW_S_LATCH) { 01483 success = rw_lock_s_lock_func_nowait(&(block->lock), 01484 file, line); 01485 fix_type = MTR_MEMO_PAGE_S_FIX; 01486 } else { 01487 success = rw_lock_x_lock_func_nowait(&(block->lock), 01488 file, line); 01489 fix_type = MTR_MEMO_PAGE_X_FIX; 01490 } 01491 01492 if (!success) { 01493 mutex_enter(&(buf_pool->mutex)); 01494 01495 block->buf_fix_count--; 01496 #ifdef UNIV_SYNC_DEBUG 01497 rw_lock_s_unlock(&(block->debug_latch)); 01498 #endif 01499 mutex_exit(&(buf_pool->mutex)); 01500 01501 return(FALSE); 01502 } 01503 01504 mtr_memo_push(mtr, block, fix_type); 01505 01506 #ifdef UNIV_DEBUG 01507 buf_dbg_counter++; 01508 01509 if (buf_dbg_counter % 5771 == 0) { 01510 ut_ad(buf_validate()); 01511 } 01512 #endif 01513 ut_ad(block->buf_fix_count > 0); 01514 ut_ad(block->state == BUF_BLOCK_FILE_PAGE); 01515 #ifdef UNIV_DEBUG_FILE_ACCESSES 01516 ut_a(block->file_page_was_freed == FALSE); 01517 #endif 01518 01519 #ifdef UNIV_IBUF_DEBUG 01520 ut_a((mode == BUF_KEEP_OLD) 01521 || (ibuf_count_get(block->space, block->offset) == 0)); 01522 #endif 01523 buf_pool->n_page_gets++; 01524 01525 return(TRUE); 01526 } 01527 01528 /************************************************************************ 01529 Inits a page to the buffer buf_pool, for use in ibbackup --restore. */ 01530 01531 void 01532 buf_page_init_for_backup_restore( 01533 /*=============================*/ 01534 ulint space, /* in: space id */ 01535 ulint offset, /* in: offset of the page within space 01536 in units of a page */ 01537 buf_block_t* block) /* in: block to init */ 01538 { 01539 /* Set the state of the block */ 01540 block->magic_n = BUF_BLOCK_MAGIC_N; 01541 01542 block->state = BUF_BLOCK_FILE_PAGE; 01543 block->space = space; 01544 block->offset = offset; 01545 01546 block->lock_hash_val = 0; 01547 block->lock_mutex = NULL; 01548 01549 block->freed_page_clock = 0; 01550 01551 block->newest_modification = ut_dulint_zero; 01552 block->oldest_modification = ut_dulint_zero; 01553 01554 block->accessed = FALSE; 01555 block->buf_fix_count = 0; 01556 block->io_fix = 0; 01557 01558 block->n_hash_helps = 0; 01559 block->is_hashed = FALSE; 01560 block->n_fields = 1; 01561 block->n_bytes = 0; 01562 block->side = BTR_SEARCH_LEFT_SIDE; 01563 01564 block->file_page_was_freed = FALSE; 01565 } 01566 01567 /************************************************************************ 01568 Inits a page to the buffer buf_pool. */ 01569 static 01570 void 01571 buf_page_init( 01572 /*==========*/ 01573 ulint space, /* in: space id */ 01574 ulint offset, /* in: offset of the page within space 01575 in units of a page */ 01576 buf_block_t* block) /* in: block to init */ 01577 { 01578 #ifdef UNIV_SYNC_DEBUG 01579 ut_ad(mutex_own(&(buf_pool->mutex))); 01580 #endif /* UNIV_SYNC_DEBUG */ 01581 ut_a(block->state != BUF_BLOCK_FILE_PAGE); 01582 01583 /* Set the state of the block */ 01584 block->magic_n = BUF_BLOCK_MAGIC_N; 01585 01586 block->state = BUF_BLOCK_FILE_PAGE; 01587 block->space = space; 01588 block->offset = offset; 01589 01590 block->check_index_page_at_flush = FALSE; 01591 block->index = NULL; 01592 01593 block->lock_hash_val = lock_rec_hash(space, offset); 01594 block->lock_mutex = NULL; 01595 01596 /* Insert into the hash table of file pages */ 01597 01598 if (buf_page_hash_get(space, offset)) { 01599 fprintf(stderr, 01600 "InnoDB: Error: page %lu %lu already found from the hash table\n", 01601 (ulong) space, 01602 (ulong) offset); 01603 #ifdef UNIV_DEBUG 01604 buf_print(); 01605 buf_LRU_print(); 01606 buf_validate(); 01607 buf_LRU_validate(); 01608 #endif /* UNIV_DEBUG */ 01609 ut_a(0); 01610 } 01611 01612 HASH_INSERT(buf_block_t, hash, buf_pool->page_hash, 01613 buf_page_address_fold(space, offset), block); 01614 01615 block->freed_page_clock = 0; 01616 01617 block->newest_modification = ut_dulint_zero; 01618 block->oldest_modification = ut_dulint_zero; 01619 01620 block->accessed = FALSE; 01621 block->buf_fix_count = 0; 01622 block->io_fix = 0; 01623 01624 block->n_hash_helps = 0; 01625 block->is_hashed = FALSE; 01626 block->n_fields = 1; 01627 block->n_bytes = 0; 01628 block->side = BTR_SEARCH_LEFT_SIDE; 01629 01630 block->file_page_was_freed = FALSE; 01631 } 01632 01633 /************************************************************************ 01634 Function which inits a page for read to the buffer buf_pool. If the page is 01635 (1) already in buf_pool, or 01636 (2) if we specify to read only ibuf pages and the page is not an ibuf page, or 01637 (3) if the space is deleted or being deleted, 01638 then this function does nothing. 01639 Sets the io_fix flag to BUF_IO_READ and sets a non-recursive exclusive lock 01640 on the buffer frame. The io-handler must take care that the flag is cleared 01641 and the lock released later. This is one of the functions which perform the 01642 state transition NOT_USED => FILE_PAGE to a block (the other is 01643 buf_page_create). */ 01644 01645 buf_block_t* 01646 buf_page_init_for_read( 01647 /*===================*/ 01648 /* out: pointer to the block or NULL */ 01649 ulint* err, /* out: DB_SUCCESS or DB_TABLESPACE_DELETED */ 01650 ulint mode, /* in: BUF_READ_IBUF_PAGES_ONLY, ... */ 01651 ulint space, /* in: space id */ 01652 ib_longlong tablespace_version,/* in: prevents reading from a wrong 01653 version of the tablespace in case we have done 01654 DISCARD + IMPORT */ 01655 ulint offset) /* in: page number */ 01656 { 01657 buf_block_t* block; 01658 mtr_t mtr; 01659 01660 ut_ad(buf_pool); 01661 01662 *err = DB_SUCCESS; 01663 01664 if (mode == BUF_READ_IBUF_PAGES_ONLY) { 01665 /* It is a read-ahead within an ibuf routine */ 01666 01667 ut_ad(!ibuf_bitmap_page(offset)); 01668 ut_ad(ibuf_inside()); 01669 01670 mtr_start(&mtr); 01671 01672 if (!ibuf_page_low(space, offset, &mtr)) { 01673 01674 mtr_commit(&mtr); 01675 01676 return(NULL); 01677 } 01678 } else { 01679 ut_ad(mode == BUF_READ_ANY_PAGE); 01680 } 01681 01682 block = buf_block_alloc(); 01683 01684 ut_a(block); 01685 01686 mutex_enter(&(buf_pool->mutex)); 01687 01688 if (fil_tablespace_deleted_or_being_deleted_in_mem(space, 01689 tablespace_version)) { 01690 *err = DB_TABLESPACE_DELETED; 01691 } 01692 01693 if (*err == DB_TABLESPACE_DELETED 01694 || NULL != buf_page_hash_get(space, offset)) { 01695 01696 /* The page belongs to a space which has been deleted or is 01697 being deleted, or the page is already in buf_pool, return */ 01698 01699 mutex_exit(&(buf_pool->mutex)); 01700 buf_block_free(block); 01701 01702 if (mode == BUF_READ_IBUF_PAGES_ONLY) { 01703 01704 mtr_commit(&mtr); 01705 } 01706 01707 return(NULL); 01708 } 01709 01710 ut_ad(block); 01711 01712 buf_page_init(space, offset, block); 01713 01714 /* The block must be put to the LRU list, to the old blocks */ 01715 01716 buf_LRU_add_block(block, TRUE); /* TRUE == to old blocks */ 01717 01718 block->io_fix = BUF_IO_READ; 01719 buf_pool->n_pend_reads++; 01720 01721 /* We set a pass-type x-lock on the frame because then the same 01722 thread which called for the read operation (and is running now at 01723 this point of code) can wait for the read to complete by waiting 01724 for the x-lock on the frame; if the x-lock were recursive, the 01725 same thread would illegally get the x-lock before the page read 01726 is completed. The x-lock is cleared by the io-handler thread. */ 01727 01728 rw_lock_x_lock_gen(&(block->lock), BUF_IO_READ); 01729 01730 mutex_exit(&(buf_pool->mutex)); 01731 01732 if (mode == BUF_READ_IBUF_PAGES_ONLY) { 01733 01734 mtr_commit(&mtr); 01735 } 01736 01737 return(block); 01738 } 01739 01740 /************************************************************************ 01741 Initializes a page to the buffer buf_pool. The page is usually not read 01742 from a file even if it cannot be found in the buffer buf_pool. This is one 01743 of the functions which perform to a block a state transition NOT_USED => 01744 FILE_PAGE (the other is buf_page_init_for_read above). */ 01745 01746 buf_frame_t* 01747 buf_page_create( 01748 /*============*/ 01749 /* out: pointer to the frame, page bufferfixed */ 01750 ulint space, /* in: space id */ 01751 ulint offset, /* in: offset of the page within space in units of 01752 a page */ 01753 mtr_t* mtr) /* in: mini-transaction handle */ 01754 { 01755 buf_frame_t* frame; 01756 buf_block_t* block; 01757 buf_block_t* free_block = NULL; 01758 01759 ut_ad(mtr); 01760 01761 free_block = buf_LRU_get_free_block(); 01762 01763 mutex_enter(&(buf_pool->mutex)); 01764 01765 block = buf_page_hash_get(space, offset); 01766 01767 if (block != NULL) { 01768 #ifdef UNIV_IBUF_DEBUG 01769 ut_a(ibuf_count_get(block->space, block->offset) == 0); 01770 #endif 01771 block->file_page_was_freed = FALSE; 01772 01773 /* Page can be found in buf_pool */ 01774 mutex_exit(&(buf_pool->mutex)); 01775 01776 buf_block_free(free_block); 01777 01778 frame = buf_page_get_with_no_latch(space, offset, mtr); 01779 01780 return(frame); 01781 } 01782 01783 /* If we get here, the page was not in buf_pool: init it there */ 01784 01785 #ifdef UNIV_DEBUG 01786 if (buf_debug_prints) { 01787 fprintf(stderr, "Creating space %lu page %lu to buffer\n", 01788 (ulong) space, (ulong) offset); 01789 } 01790 #endif /* UNIV_DEBUG */ 01791 01792 block = free_block; 01793 01794 buf_page_init(space, offset, block); 01795 01796 /* The block must be put to the LRU list */ 01797 buf_LRU_add_block(block, FALSE); 01798 01799 #ifdef UNIV_SYNC_DEBUG 01800 buf_block_buf_fix_inc_debug(block, __FILE__, __LINE__); 01801 #else 01802 buf_block_buf_fix_inc(block); 01803 #endif 01804 mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX); 01805 01806 block->accessed = TRUE; 01807 01808 buf_pool->n_pages_created++; 01809 01810 mutex_exit(&(buf_pool->mutex)); 01811 01812 /* Delete possible entries for the page from the insert buffer: 01813 such can exist if the page belonged to an index which was dropped */ 01814 01815 ibuf_merge_or_delete_for_page(NULL, space, offset, TRUE); 01816 01817 /* Flush pages from the end of the LRU list if necessary */ 01818 buf_flush_free_margin(); 01819 01820 frame = block->frame; 01821 01822 memset(frame + FIL_PAGE_PREV, 0xff, 4); 01823 memset(frame + FIL_PAGE_NEXT, 0xff, 4); 01824 mach_write_to_2(frame + FIL_PAGE_TYPE, FIL_PAGE_TYPE_ALLOCATED); 01825 01826 /* Reset to zero the file flush lsn field in the page; if the first 01827 page of an ibdata file is 'created' in this function into the buffer 01828 pool then we lose the original contents of the file flush lsn stamp. 01829 Then InnoDB could in a crash recovery print a big, false, corruption 01830 warning if the stamp contains an lsn bigger than the ib_logfile lsn. */ 01831 01832 memset(frame + FIL_PAGE_FILE_FLUSH_LSN, 0, 8); 01833 01834 #ifdef UNIV_DEBUG 01835 buf_dbg_counter++; 01836 01837 if (buf_dbg_counter % 357 == 0) { 01838 ut_ad(buf_validate()); 01839 } 01840 #endif 01841 #ifdef UNIV_IBUF_DEBUG 01842 ut_a(ibuf_count_get(block->space, block->offset) == 0); 01843 #endif 01844 return(frame); 01845 } 01846 01847 /************************************************************************ 01848 Completes an asynchronous read or write request of a file page to or from 01849 the buffer pool. */ 01850 01851 void 01852 buf_page_io_complete( 01853 /*=================*/ 01854 buf_block_t* block) /* in: pointer to the block in question */ 01855 { 01856 ulint io_type; 01857 01858 ut_ad(block); 01859 01860 ut_a(block->state == BUF_BLOCK_FILE_PAGE); 01861 01862 io_type = block->io_fix; 01863 01864 if (io_type == BUF_IO_READ) { 01865 /* If this page is not uninitialized and not in the 01866 doublewrite buffer, then the page number and space id 01867 should be the same as in block. */ 01868 ulint read_page_no = mach_read_from_4((block->frame) 01869 + FIL_PAGE_OFFSET); 01870 ulint read_space_id = mach_read_from_4((block->frame) 01871 + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID); 01872 01873 if (!block->space && trx_doublewrite_page_inside( 01874 block->offset)) { 01875 01876 ut_print_timestamp(stderr); 01877 fprintf(stderr, 01878 " InnoDB: Error: reading page %lu\n" 01879 "InnoDB: which is in the doublewrite buffer!\n", 01880 (ulong) block->offset); 01881 } else if (!read_space_id && !read_page_no) { 01882 /* This is likely an uninitialized page. */ 01883 } else if ((block->space && block->space != read_space_id) 01884 || block->offset != read_page_no) { 01885 /* We did not compare space_id to read_space_id 01886 if block->space == 0, because the field on the 01887 page may contain garbage in MySQL < 4.1.1, 01888 which only supported block->space == 0. */ 01889 01890 ut_print_timestamp(stderr); 01891 fprintf(stderr, 01892 " InnoDB: Error: space id and page n:o stored in the page\n" 01893 "InnoDB: read in are %lu:%lu, should be %lu:%lu!\n", 01894 (ulong) read_space_id, (ulong) read_page_no, 01895 (ulong) block->space, (ulong) block->offset); 01896 } 01897 /* From version 3.23.38 up we store the page checksum 01898 to the 4 first bytes of the page end lsn field */ 01899 01900 if (buf_page_is_corrupted(block->frame)) { 01901 fprintf(stderr, 01902 "InnoDB: Database page corruption on disk or a failed\n" 01903 "InnoDB: file read of page %lu.\n", (ulong) block->offset); 01904 01905 fputs( 01906 "InnoDB: You may have to recover from a backup.\n", stderr); 01907 01908 buf_page_print(block->frame); 01909 01910 fprintf(stderr, 01911 "InnoDB: Database page corruption on disk or a failed\n" 01912 "InnoDB: file read of page %lu.\n", (ulong) block->offset); 01913 fputs( 01914 "InnoDB: You may have to recover from a backup.\n", stderr); 01915 fputs( 01916 "InnoDB: It is also possible that your operating\n" 01917 "InnoDB: system has corrupted its own file cache\n" 01918 "InnoDB: and rebooting your computer removes the\n" 01919 "InnoDB: error.\n" 01920 "InnoDB: If the corrupt page is an index page\n" 01921 "InnoDB: you can also try to fix the corruption\n" 01922 "InnoDB: by dumping, dropping, and reimporting\n" 01923 "InnoDB: the corrupt table. You can use CHECK\n" 01924 "InnoDB: TABLE to scan your table for corruption.\n" 01925 "InnoDB: See also " 01926 "http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html\n" 01927 "InnoDB: about forcing recovery.\n", stderr); 01928 01929 if (srv_force_recovery < SRV_FORCE_IGNORE_CORRUPT) { 01930 fputs( 01931 "InnoDB: Ending processing because of a corrupt database page.\n", 01932 stderr); 01933 exit(1); 01934 } 01935 } 01936 01937 if (recv_recovery_is_on()) { 01938 recv_recover_page(FALSE, TRUE, block->frame, 01939 block->space, block->offset); 01940 } 01941 01942 if (!recv_no_ibuf_operations) { 01943 ibuf_merge_or_delete_for_page(block->frame, 01944 block->space, block->offset, TRUE); 01945 } 01946 } 01947 01948 #ifdef UNIV_IBUF_DEBUG 01949 ut_a(ibuf_count_get(block->space, block->offset) == 0); 01950 #endif 01951 mutex_enter(&(buf_pool->mutex)); 01952 01953 /* Because this thread which does the unlocking is not the same that 01954 did the locking, we use a pass value != 0 in unlock, which simply 01955 removes the newest lock debug record, without checking the thread 01956 id. */ 01957 01958 block->io_fix = 0; 01959 01960 if (io_type == BUF_IO_READ) { 01961 /* NOTE that the call to ibuf may have moved the ownership of 01962 the x-latch to this OS thread: do not let this confuse you in 01963 debugging! */ 01964 01965 ut_ad(buf_pool->n_pend_reads > 0); 01966 buf_pool->n_pend_reads--; 01967 buf_pool->n_pages_read++; 01968 01969 rw_lock_x_unlock_gen(&(block->lock), BUF_IO_READ); 01970 01971 #ifdef UNIV_DEBUG 01972 if (buf_debug_prints) { 01973 fputs("Has read ", stderr); 01974 } 01975 #endif /* UNIV_DEBUG */ 01976 } else { 01977 ut_ad(io_type == BUF_IO_WRITE); 01978 01979 /* Write means a flush operation: call the completion 01980 routine in the flush system */ 01981 01982 buf_flush_write_complete(block); 01983 01984 rw_lock_s_unlock_gen(&(block->lock), BUF_IO_WRITE); 01985 01986 buf_pool->n_pages_written++; 01987 01988 #ifdef UNIV_DEBUG 01989 if (buf_debug_prints) { 01990 fputs("Has written ", stderr); 01991 } 01992 #endif /* UNIV_DEBUG */ 01993 } 01994 01995 mutex_exit(&(buf_pool->mutex)); 01996 01997 #ifdef UNIV_DEBUG 01998 if (buf_debug_prints) { 01999 fprintf(stderr, "page space %lu page no %lu\n", 02000 (ulong) block->space, (ulong) block->offset); 02001 } 02002 #endif /* UNIV_DEBUG */ 02003 } 02004 02005 /************************************************************************* 02006 Invalidates the file pages in the buffer pool when an archive recovery is 02007 completed. All the file pages buffered must be in a replaceable state when 02008 this function is called: not latched and not modified. */ 02009 02010 void 02011 buf_pool_invalidate(void) 02012 /*=====================*/ 02013 { 02014 ibool freed; 02015 02016 ut_ad(buf_all_freed()); 02017 02018 freed = TRUE; 02019 02020 while (freed) { 02021 freed = buf_LRU_search_and_free_block(100); 02022 } 02023 02024 mutex_enter(&(buf_pool->mutex)); 02025 02026 ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0); 02027 02028 mutex_exit(&(buf_pool->mutex)); 02029 } 02030 02031 #ifdef UNIV_DEBUG 02032 /************************************************************************* 02033 Validates the buffer buf_pool data structure. */ 02034 02035 ibool 02036 buf_validate(void) 02037 /*==============*/ 02038 { 02039 buf_block_t* block; 02040 ulint i; 02041 ulint n_single_flush = 0; 02042 ulint n_lru_flush = 0; 02043 ulint n_list_flush = 0; 02044 ulint n_lru = 0; 02045 ulint n_flush = 0; 02046 ulint n_free = 0; 02047 ulint n_page = 0; 02048 02049 ut_ad(buf_pool); 02050 02051 mutex_enter(&(buf_pool->mutex)); 02052 02053 for (i = 0; i < buf_pool->curr_size; i++) { 02054 02055 block = buf_pool_get_nth_block(buf_pool, i); 02056 02057 if (block->state == BUF_BLOCK_FILE_PAGE) { 02058 02059 ut_a(buf_page_hash_get(block->space, 02060 block->offset) == block); 02061 n_page++; 02062 02063 #ifdef UNIV_IBUF_DEBUG 02064 ut_a((block->io_fix == BUF_IO_READ) 02065 || ibuf_count_get(block->space, block->offset) 02066 == 0); 02067 #endif 02068 if (block->io_fix == BUF_IO_WRITE) { 02069 02070 if (block->flush_type == BUF_FLUSH_LRU) { 02071 n_lru_flush++; 02072 ut_a(rw_lock_is_locked(&(block->lock), 02073 RW_LOCK_SHARED)); 02074 } else if (block->flush_type == 02075 BUF_FLUSH_LIST) { 02076 n_list_flush++; 02077 } else if (block->flush_type == 02078 BUF_FLUSH_SINGLE_PAGE) { 02079 n_single_flush++; 02080 } else { 02081 ut_error; 02082 } 02083 02084 } else if (block->io_fix == BUF_IO_READ) { 02085 02086 ut_a(rw_lock_is_locked(&(block->lock), 02087 RW_LOCK_EX)); 02088 } 02089 02090 n_lru++; 02091 02092 if (ut_dulint_cmp(block->oldest_modification, 02093 ut_dulint_zero) > 0) { 02094 n_flush++; 02095 } 02096 02097 } else if (block->state == BUF_BLOCK_NOT_USED) { 02098 n_free++; 02099 } 02100 } 02101 02102 if (n_lru + n_free > buf_pool->curr_size) { 02103 fprintf(stderr, "n LRU %lu, n free %lu\n", (ulong) n_lru, (ulong) n_free); 02104 ut_error; 02105 } 02106 02107 ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru); 02108 if (UT_LIST_GET_LEN(buf_pool->free) != n_free) { 02109 fprintf(stderr, "Free list len %lu, free blocks %lu\n", 02110 (ulong) UT_LIST_GET_LEN(buf_pool->free), (ulong) n_free); 02111 ut_error; 02112 } 02113 ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == n_flush); 02114 02115 ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_single_flush); 02116 ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush); 02117 ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush); 02118 02119 mutex_exit(&(buf_pool->mutex)); 02120 02121 ut_a(buf_LRU_validate()); 02122 ut_a(buf_flush_validate()); 02123 02124 return(TRUE); 02125 } 02126 02127 /************************************************************************* 02128 Prints info of the buffer buf_pool data structure. */ 02129 02130 void 02131 buf_print(void) 02132 /*===========*/ 02133 { 02134 dulint* index_ids; 02135 ulint* counts; 02136 ulint size; 02137 ulint i; 02138 ulint j; 02139 dulint id; 02140 ulint n_found; 02141 buf_frame_t* frame; 02142 dict_index_t* index; 02143 02144 ut_ad(buf_pool); 02145 02146 size = buf_pool->curr_size; 02147 02148 index_ids = mem_alloc(sizeof(dulint) * size); 02149 counts = mem_alloc(sizeof(ulint) * size); 02150 02151 mutex_enter(&(buf_pool->mutex)); 02152 02153 fprintf(stderr, 02154 "buf_pool size %lu\n" 02155 "database pages %lu\n" 02156 "free pages %lu\n" 02157 "modified database pages %lu\n" 02158 "n pending reads %lu\n" 02159 "n pending flush LRU %lu list %lu single page %lu\n" 02160 "pages read %lu, created %lu, written %lu\n", 02161 (ulong) size, 02162 (ulong) UT_LIST_GET_LEN(buf_pool->LRU), 02163 (ulong) UT_LIST_GET_LEN(buf_pool->free), 02164 (ulong) UT_LIST_GET_LEN(buf_pool->flush_list), 02165 (ulong) buf_pool->n_pend_reads, 02166 (ulong) buf_pool->n_flush[BUF_FLUSH_LRU], 02167 (ulong) buf_pool->n_flush[BUF_FLUSH_LIST], 02168 (ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE], 02169 (ulong) buf_pool->n_pages_read, buf_pool->n_pages_created, 02170 (ulong) buf_pool->n_pages_written); 02171 02172 /* Count the number of blocks belonging to each index in the buffer */ 02173 02174 n_found = 0; 02175 02176 for (i = 0; i < size; i++) { 02177 frame = buf_pool_get_nth_block(buf_pool, i)->frame; 02178 02179 if (fil_page_get_type(frame) == FIL_PAGE_INDEX) { 02180 02181 id = btr_page_get_index_id(frame); 02182 02183 /* Look for the id in the index_ids array */ 02184 j = 0; 02185 02186 while (j < n_found) { 02187 02188 if (ut_dulint_cmp(index_ids[j], id) == 0) { 02189 (counts[j])++; 02190 02191 break; 02192 } 02193 j++; 02194 } 02195 02196 if (j == n_found) { 02197 n_found++; 02198 index_ids[j] = id; 02199 counts[j] = 1; 02200 } 02201 } 02202 } 02203 02204 mutex_exit(&(buf_pool->mutex)); 02205 02206 for (i = 0; i < n_found; i++) { 02207 index = dict_index_get_if_in_cache(index_ids[i]); 02208 02209 fprintf(stderr, 02210 "Block count for index %lu in buffer is about %lu", 02211 (ulong) ut_dulint_get_low(index_ids[i]), 02212 (ulong) counts[i]); 02213 02214 if (index) { 02215 putc(' ', stderr); 02216 dict_index_name_print(stderr, NULL, index); 02217 } 02218 02219 putc('\n', stderr); 02220 } 02221 02222 mem_free(index_ids); 02223 mem_free(counts); 02224 02225 ut_a(buf_validate()); 02226 } 02227 #endif /* UNIV_DEBUG */ 02228 02229 /************************************************************************* 02230 Returns the number of latched pages in the buffer pool. */ 02231 02232 ulint 02233 buf_get_latched_pages_number(void) 02234 { 02235 buf_block_t* block; 02236 ulint i; 02237 ulint fixed_pages_number = 0; 02238 02239 mutex_enter(&(buf_pool->mutex)); 02240 02241 for (i = 0; i < buf_pool->curr_size; i++) { 02242 02243 block = buf_pool_get_nth_block(buf_pool, i); 02244 02245 if (((block->buf_fix_count != 0) || (block->io_fix != 0)) && 02246 block->magic_n == BUF_BLOCK_MAGIC_N ) 02247 fixed_pages_number++; 02248 } 02249 02250 mutex_exit(&(buf_pool->mutex)); 02251 02252 return(fixed_pages_number); 02253 } 02254 02255 /************************************************************************* 02256 Returns the number of pending buf pool ios. */ 02257 02258 ulint 02259 buf_get_n_pending_ios(void) 02260 /*=======================*/ 02261 { 02262 return(buf_pool->n_pend_reads 02263 + buf_pool->n_flush[BUF_FLUSH_LRU] 02264 + buf_pool->n_flush[BUF_FLUSH_LIST] 02265 + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]); 02266 } 02267 02268 /************************************************************************* 02269 Returns the ratio in percents of modified pages in the buffer pool / 02270 database pages in the buffer pool. */ 02271 02272 ulint 02273 buf_get_modified_ratio_pct(void) 02274 /*============================*/ 02275 { 02276 ulint ratio; 02277 02278 mutex_enter(&(buf_pool->mutex)); 02279 02280 ratio = (100 * UT_LIST_GET_LEN(buf_pool->flush_list)) 02281 / (1 + UT_LIST_GET_LEN(buf_pool->LRU) 02282 + UT_LIST_GET_LEN(buf_pool->free)); 02283 02284 /* 1 + is there to avoid division by zero */ 02285 02286 mutex_exit(&(buf_pool->mutex)); 02287 02288 return(ratio); 02289 } 02290 02291 /************************************************************************* 02292 Prints info of the buffer i/o. */ 02293 02294 void 02295 buf_print_io( 02296 /*=========*/ 02297 FILE* file) /* in/out: buffer where to print */ 02298 { 02299 time_t current_time; 02300 double time_elapsed; 02301 ulint size; 02302 02303 ut_ad(buf_pool); 02304 size = buf_pool->curr_size; 02305 02306 mutex_enter(&(buf_pool->mutex)); 02307 02308 if (srv_use_awe) { 02309 fprintf(stderr, 02310 "AWE: Buffer pool memory frames %lu\n", 02311 (ulong) buf_pool->n_frames); 02312 02313 fprintf(stderr, 02314 "AWE: Database pages and free buffers mapped in frames %lu\n", 02315 (ulong) UT_LIST_GET_LEN(buf_pool->awe_LRU_free_mapped)); 02316 } 02317 fprintf(file, 02318 "Buffer pool size %lu\n" 02319 "Free buffers %lu\n" 02320 "Database pages %lu\n" 02321 "Modified db pages %lu\n" 02322 "Pending reads %lu\n" 02323 "Pending writes: LRU %lu, flush list %lu, single page %lu\n", 02324 (ulong) size, 02325 (ulong) UT_LIST_GET_LEN(buf_pool->free), 02326 (ulong) UT_LIST_GET_LEN(buf_pool->LRU), 02327 (ulong) UT_LIST_GET_LEN(buf_pool->flush_list), 02328 (ulong) buf_pool->n_pend_reads, 02329 (ulong) buf_pool->n_flush[BUF_FLUSH_LRU] 02330 + buf_pool->init_flush[BUF_FLUSH_LRU], 02331 (ulong) buf_pool->n_flush[BUF_FLUSH_LIST] 02332 + buf_pool->init_flush[BUF_FLUSH_LIST], 02333 (ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]); 02334 02335 current_time = time(NULL); 02336 time_elapsed = 0.001 + difftime(current_time, 02337 buf_pool->last_printout_time); 02338 buf_pool->last_printout_time = current_time; 02339 02340 fprintf(file, 02341 "Pages read %lu, created %lu, written %lu\n" 02342 "%.2f reads/s, %.2f creates/s, %.2f writes/s\n", 02343 (ulong) buf_pool->n_pages_read, 02344 (ulong) buf_pool->n_pages_created, 02345 (ulong) buf_pool->n_pages_written, 02346 (buf_pool->n_pages_read - buf_pool->n_pages_read_old) 02347 / time_elapsed, 02348 (buf_pool->n_pages_created - buf_pool->n_pages_created_old) 02349 / time_elapsed, 02350 (buf_pool->n_pages_written - buf_pool->n_pages_written_old) 02351 / time_elapsed); 02352 02353 if (srv_use_awe) { 02354 fprintf(file, "AWE: %.2f page remaps/s\n", 02355 (buf_pool->n_pages_awe_remapped 02356 - buf_pool->n_pages_awe_remapped_old) 02357 / time_elapsed); 02358 } 02359 02360 if (buf_pool->n_page_gets > buf_pool->n_page_gets_old) { 02361 fprintf(file, "Buffer pool hit rate %lu / 1000\n", 02362 (ulong) (1000 - 02363 ((1000 * (buf_pool->n_pages_read - buf_pool->n_pages_read_old)) 02364 / (buf_pool->n_page_gets - buf_pool->n_page_gets_old)))); 02365 } else { 02366 fputs("No buffer pool page gets since the last printout\n", 02367 file); 02368 } 02369 02370 buf_pool->n_page_gets_old = buf_pool->n_page_gets; 02371 buf_pool->n_pages_read_old = buf_pool->n_pages_read; 02372 buf_pool->n_pages_created_old = buf_pool->n_pages_created; 02373 buf_pool->n_pages_written_old = buf_pool->n_pages_written; 02374 buf_pool->n_pages_awe_remapped_old = buf_pool->n_pages_awe_remapped; 02375 02376 mutex_exit(&(buf_pool->mutex)); 02377 } 02378 02379 /************************************************************************** 02380 Refreshes the statistics used to print per-second averages. */ 02381 02382 void 02383 buf_refresh_io_stats(void) 02384 /*======================*/ 02385 { 02386 buf_pool->last_printout_time = time(NULL); 02387 buf_pool->n_page_gets_old = buf_pool->n_page_gets; 02388 buf_pool->n_pages_read_old = buf_pool->n_pages_read; 02389 buf_pool->n_pages_created_old = buf_pool->n_pages_created; 02390 buf_pool->n_pages_written_old = buf_pool->n_pages_written; 02391 buf_pool->n_pages_awe_remapped_old = buf_pool->n_pages_awe_remapped; 02392 } 02393 02394 /************************************************************************* 02395 Checks that all file pages in the buffer are in a replaceable state. */ 02396 02397 ibool 02398 buf_all_freed(void) 02399 /*===============*/ 02400 { 02401 buf_block_t* block; 02402 ulint i; 02403 02404 ut_ad(buf_pool); 02405 02406 mutex_enter(&(buf_pool->mutex)); 02407 02408 for (i = 0; i < buf_pool->curr_size; i++) { 02409 02410 block = buf_pool_get_nth_block(buf_pool, i); 02411 02412 if (block->state == BUF_BLOCK_FILE_PAGE) { 02413 02414 if (!buf_flush_ready_for_replace(block)) { 02415 02416 fprintf(stderr, 02417 "Page %lu %lu still fixed or dirty\n", 02418 (ulong) block->space, (ulong) block->offset); 02419 ut_error; 02420 } 02421 } 02422 } 02423 02424 mutex_exit(&(buf_pool->mutex)); 02425 02426 return(TRUE); 02427 } 02428 02429 /************************************************************************* 02430 Checks that there currently are no pending i/o-operations for the buffer 02431 pool. */ 02432 02433 ibool 02434 buf_pool_check_no_pending_io(void) 02435 /*==============================*/ 02436 /* out: TRUE if there is no pending i/o */ 02437 { 02438 ibool ret; 02439 02440 mutex_enter(&(buf_pool->mutex)); 02441 02442 if (buf_pool->n_pend_reads + buf_pool->n_flush[BUF_FLUSH_LRU] 02443 + buf_pool->n_flush[BUF_FLUSH_LIST] 02444 + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]) { 02445 ret = FALSE; 02446 } else { 02447 ret = TRUE; 02448 } 02449 02450 mutex_exit(&(buf_pool->mutex)); 02451 02452 return(ret); 02453 } 02454 02455 /************************************************************************* 02456 Gets the current length of the free list of buffer blocks. */ 02457 02458 ulint 02459 buf_get_free_list_len(void) 02460 /*=======================*/ 02461 { 02462 ulint len; 02463 02464 mutex_enter(&(buf_pool->mutex)); 02465 02466 len = UT_LIST_GET_LEN(buf_pool->free); 02467 02468 mutex_exit(&(buf_pool->mutex)); 02469 02470 return(len); 02471 }
1.4.7

