WL#3763: Supernodes in Falcon Indexes

Affects: Server-6.0 — Status: Complete

Description
Low Level Design

   Each index page in a Falcon table contains an empty portion set aside to
contain supernodes.  The idea behind this feature is to improve performance when
searching the nodes in an index page.  Since Falcon indexes use prefix
compression, to suppress the storage of repeated bytes leading bytes in
successive key values, a Falcon index page cannot be searched by a binary search
mechanism.  Each node, if it has prefix bytes compressed/missing, depends on the
previous node in order for it to be fully understood.  

   However, if several of the nodes on a page which have no prefix compression
are linked to directly from a supernode mapping at the top of the page, the
supernode map can be searched using a binary search.  This will speed up page
searches where the node you are looking for happens to be near the end of the page.

   The supernode section of a Falcon index page is an array of 16 2-byte
offsets.  Some index pages have nearly all nodes completely different, such that
there is no trouble finding 16 nodes distributed through the page with no prefix
compression.  Other pages contain very few fully unique key values.  Still other
pages have less than 16 nodes.  All these variables must be taken into account
when designing an algorithm which chooses nodes to be referenced in the
supernode array.  

   The supernode array must be adjusted as nodes are added or deleted from the
page.  This adjustment should not be too computationally heavy so that the write
performance does not suffer.  That said, in Falcon, it is the gopher thread,
running in the background, that builds and adjusts index pages.  Client threads
work directly to the record cache and the deferred indexes.  An intelligent
algorithm must be designed to update the supernodes without causing client
threads to wait too much.

   TBD: A sync object may need to coordinate access to the supernode portion of
each Index Page.  This needs to be decided since other read and write access to
index pages is not coordinated by sync objects.  Readers traverse the index page
in the same direction as writers, and are prepared to read the next page in case
the page splits in front of them.  It may be that the reading of supernodes
cannot be done while a supernode is being updated.  Or, it may be that
concurrent writers and readers can access the supernode array if it is read left
to right instead of a binary search.  This part of the design is unclear to me.

   TBD: In cases where there are a lot of near or partial duplicates on a page,
some of the nodes must be chosen to be expanded so that they can become an entry
in the supernode array.  This expansion may cause the page to be split.  It
needs to be decided under what conditions a page split is worth populating a
supernode array.

   TBD:  The nodes chosen to be represented in a supernode array should be as
evenly distributed as possible. Since every node in a Falcon index is variable
length, the best distribution of supernode pointers is by the node sequence. 
But it may be simpler to use percentage offsets within the page.  This choice
should be discussed.

   Use of the supernode array in reading an index page shuld be an optional part
of finding a node on a page so that the engine will work with existing database
files and so that the implementation has the flexibilty to just skip the
supernode array if the extra effort is not worth it.  This may be the case when
expansion of nodes will cause an unwanted page split.

array of 16 supernode offsets allow for 17 supernode segments.This makes 
pagesize/17 is the optimal size for one segment. Supernodes are created when 
nodes are added to page and deleted when corresponding node is deleted. On 
insert, there is a check whether creating a supernode at the insertion point 
makes sense or not.
The distance between left and right neighbor supernodes is taken into account,
as well as compression loss (supernode is not created if the prefix compression 
rate is too good). 

Supernode is created one of following conditions is true:

a) Insertion in the middle of the page and distance between left and right 
neighbour is larger then optimal_segment_size * 1.5 and the nearest neighbor is 
not within optimal_segment_size*0.5

b) Appending to the end of page and  distance from the end of page to last 
supernode is greater that optimal_segment_size

c) Insertion at the start of page and first existing supernode or end of the 
page, if no supernode exist is greater than optimal_page_size. Then second node 
is made a supernode. This is necessary to prevent "anti-supernode" pattern,when 
inserting in descending sequence( here insertion point will be is always start 
of the page)

In contrary to the original design, there is no rebalancing code (optimizing 
existing page).Rebalancing turned out to be very expensive on ordinary index 
update operations. If ever needed, this functionality would only be good for 
OPTIMIZE TABLE. Actually, also recreating index would do pretty a good job 
optimizing supernodes.

Every search in the page (findNodeInLeaf, findNodeInBranch, findInsertionPoint) 
now starts findSupernode(). This  function  returns the "supernode lower bound" 
for given value, i.e supernode with maximal value that is smaller  or equal to 
value. From that point, search is done as before - expand current node, compare 
to value, stop if node  is greater or equal to value.

findSupernode() utilizes binary search. Even if the number of elements is 16 at 
most, binary search still saves couple of memcmp operations and improves 
findSupernode performance by ~30%.