WL#3763: Supernodes in Falcon Indexes
Affects: Server-6.0 — Status: Complete — Priority: Medium
Each index page in a Falcon table contains an empty portion set aside to contain supernodes. The idea behind this feature is to improve performance when searching the nodes in an index page. Since Falcon indexes use prefix compression, to suppress the storage of repeated bytes leading bytes in successive key values, a Falcon index page cannot be searched by a binary search mechanism. Each node, if it has prefix bytes compressed/missing, depends on the previous node in order for it to be fully understood. However, if several of the nodes on a page which have no prefix compression are linked to directly from a supernode mapping at the top of the page, the supernode map can be searched using a binary search. This will speed up page searches where the node you are looking for happens to be near the end of the page. The supernode section of a Falcon index page is an array of 16 2-byte offsets. Some index pages have nearly all nodes completely different, such that there is no trouble finding 16 nodes distributed through the page with no prefix compression. Other pages contain very few fully unique key values. Still other pages have less than 16 nodes. All these variables must be taken into account when designing an algorithm which chooses nodes to be referenced in the supernode array. The supernode array must be adjusted as nodes are added or deleted from the page. This adjustment should not be too computationally heavy so that the write performance does not suffer. That said, in Falcon, it is the gopher thread, running in the background, that builds and adjusts index pages. Client threads work directly to the record cache and the deferred indexes. An intelligent algorithm must be designed to update the supernodes without causing client threads to wait too much. TBD: A sync object may need to coordinate access to the supernode portion of each Index Page. This needs to be decided since other read and write access to index pages is not coordinated by sync objects. Readers traverse the index page in the same direction as writers, and are prepared to read the next page in case the page splits in front of them. It may be that the reading of supernodes cannot be done while a supernode is being updated. Or, it may be that concurrent writers and readers can access the supernode array if it is read left to right instead of a binary search. This part of the design is unclear to me. TBD: In cases where there are a lot of near or partial duplicates on a page, some of the nodes must be chosen to be expanded so that they can become an entry in the supernode array. This expansion may cause the page to be split. It needs to be decided under what conditions a page split is worth populating a supernode array. TBD: The nodes chosen to be represented in a supernode array should be as evenly distributed as possible. Since every node in a Falcon index is variable length, the best distribution of supernode pointers is by the node sequence. But it may be simpler to use percentage offsets within the page. This choice should be discussed. Use of the supernode array in reading an index page shuld be an optional part of finding a node on a page so that the engine will work with existing database files and so that the implementation has the flexibilty to just skip the supernode array if the extra effort is not worth it. This may be the case when expansion of nodes will cause an unwanted page split.
array of 16 supernode offsets allow for 17 supernode segments.This makes pagesize/17 is the optimal size for one segment. Supernodes are created when nodes are added to page and deleted when corresponding node is deleted. On insert, there is a check whether creating a supernode at the insertion point makes sense or not. The distance between left and right neighbor supernodes is taken into account, as well as compression loss (supernode is not created if the prefix compression rate is too good). Supernode is created one of following conditions is true: a) Insertion in the middle of the page and distance between left and right neighbour is larger then optimal_segment_size * 1.5 and the nearest neighbor is not within optimal_segment_size*0.5 b) Appending to the end of page and distance from the end of page to last supernode is greater that optimal_segment_size c) Insertion at the start of page and first existing supernode or end of the page, if no supernode exist is greater than optimal_page_size. Then second node is made a supernode. This is necessary to prevent "anti-supernode" pattern,when inserting in descending sequence( here insertion point will be is always start of the page) In contrary to the original design, there is no rebalancing code (optimizing existing page).Rebalancing turned out to be very expensive on ordinary index update operations. If ever needed, this functionality would only be good for OPTIMIZE TABLE. Actually, also recreating index would do pretty a good job optimizing supernodes. Every search in the page (findNodeInLeaf, findNodeInBranch, findInsertionPoint) now starts findSupernode(). This function returns the "supernode lower bound" for given value, i.e supernode with maximal value that is smaller or equal to value. From that point, search is done as before - expand current node, compare to value, stop if node is greater or equal to value. findSupernode() utilizes binary search. Even if the number of elements is 16 at most, binary search still saves couple of memcmp operations and improves findSupernode performance by ~30%.
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.