WL#4210: new storage layer for QC page allocation + value adding monitor

Affects: Connector/.NET-5.2   —   Status: In-Progress

The Query Cache memory allocation routines are not well suited for a modern 
multicore architecture and big fragmented caches. Computer science has evolved new 
and better algorithms which are better suited to do the job faster and with more 
efficient code. The new design will be scalable to larger systems with much memory 
available (> 1GB) and also support small embedded system where memory is 
precious.

This is one of two worklogs outlining the work needed to be done.
This worklog outlines how the storage layer and page allocater should be designed 
and implemented.

As a bonus the new algorithms are regulated with parameters which can be manually 
configured or handled automatically as a value added service through the proxy, 
the monitor or as a pluggable server module. 

What are the top benefits of the new design?
* Better scalability.
* More efficient memory usage.
* Faster online maintenance operations.
* Non uniform memory access gives faster operation.
* Concurrent readers from the same result set prevents duplicate execution.
* Implemented as pluggable modules which can be used in differentiation.



== Storage layer components ==

The query cache is composed of modules: The server interface (API), the storage 
layer (uses page allocator), the look up service (uses hash containers).

This worklog addresses the storage layer.


NOTE: the look up service is described in a separate WL

== New SELECT or inserting into the cache ==

1. The look up service has determined that the requested statement should be 
cached. As a result we have the opened tables, the parsed statement data, the 
current environment. We register the statement for caching with a result set 
writer and execute it normally to get the result set.
2. The query cache object is allocated from the page allocater and initialized 
with the statement text string, the current database and significant environment 
variables. The query cache is marked as 'running' (or partial) and made 
available to the look up service.
3. The result set is copied as packages into the storage layer. The storage 
layer allocates memory from the page allocator.
4. The result writer returns rows which are put into the current allocated page 
until it overflows and a new page is requested. All pages are linked and 
attached to the query cache object.
5. Consolidation phase: The last and probably only _partially_ full page, will 
be merged with other partially full pages of a similar size (2^n partitioning). 
The reasons for this is to save memory. The minimum page that can be 
consolidated is tunable and this allow us to scale for embedded and large 
systems.
6. When the result set is complete the query cache object is marked as 
'complete'.


== Cached SELECT or retrieving from the cache ==

1. The lookup service has determined that the requested statement is in the 
cache.  As a result a pointer to the first allocated page is returned. This is 
also the same thing as the query cache object which can be retrieved at this 
time.
2. The results can be read from the cache and returned to the client 
immediately or waited upon if the result is not complete yet.


== Partially cached SELECT or waiting for the result set to finish ==

1. The lookup service has determined that the requested statement is in the 
cache but it holds only a partial result. The current thread waits on a 
condition variable of the query cache object currently collecting the unfinished 
result set.
2. The thread belonging to the running (partial) query cache object signals all 
waiting threads whenever a result set package (rows) is delivered. This package 
can then be immediately read by all waiting client threads and this process can 
be repeated until the entire result set has been processed.

== Invalidating ==

API: Invalidate( TABLE_LIST )

1. For each element in the TABLE_LIST: Find the corresponding QC-table object 
through the look up service (example: table hash)
2. Increase the instantiated number (example: table modification time) of the 
QC-table object found by the look up service. This number will be used to 
determine a cache hit.
  

TODO> explain how this system enable us to use fine grained locking and remove a 
top level mutex which is used in the current design.


== Page allocator ==

The page allocator is a page cache.
The API is simple and aims to maintain a group of free pages of fixed size:
1. free_page: Return a page to the allocator.
2. get_page: Handing out free pages for use as needed.

The page allocated structure is an array of pointers. Each pointer points to a 
free page of fixed size. There are five supporting parameter: Low watermark, 
high watermark, total pages count, a head and a tail. This topology is also 
known as a ring buffer.

The high- and low- watermark is used to determine whether pages should be 
returned to the OS kernel or if new pages should be allocated.

The head and tail should be implemented as an atomic "add and return" to 
optimize the access speed in a concurrent environment.