WL#6650: InnoDB_Memcached: support multiple get and range search

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

There have been requests on further enhance the InnoDB Memcached functionalities
to cope with various scenario. In this worklog, we would implement 2 prominent 
ones.

1) We will add support of "multiple get" operation to further improve the read 
performance. This means user can fetch multiple key value pairs in a single
memcached query. This is what HandlerSocket do when do their benchmarking. Our
1million QPS read only performance tests showed the bottleneck is now mostly
shifting to client (for issuing/processing so many requests). With this
"multiple get" option, it can alleviate frequent communication traffic between
client and server.


2) Add support for "range queries". This request is by Yoshinori of Facebook.
And this could further widen the usage scenario of InnoDB Memcached. With Range
queries, user can specify a particular range, and fetch all the qualified values
in this range. This provides something that far more than original memcached can
support, and shows the flexibility of InnoDB APIs and InnoDB Memcached
Following are functional requirement for range get:

1. Be able to get correct value corresponding to each key in multiple get 
command

2. If there is no corresponding value for a key, such key/value is not returned
in the result. But it does not affect other values to be reported

3. All the keys in the multiple get must belongs to the same table. We do not
support switch table in a multiple get scenario.

4.It does not support multiple range get. Only one single range is allowed for 
get

5. The perl test client needs to be extended to support testing multiple get

Following are functional requirement for range get:

1. Support 4 compare symbol (<, >, <= and >=), each needs to be preceded with a
marker of "@" symbol:

get @>D
VALUE H 0 7
Hamburg
END


2. One of less than (<) or less equal than(<=) symbol can be followed by a
larger than (>) or large equal than (>=) symbol or vice versa to form an
enclosed range

get @>B@D" is ignored.

7. If a compare operation is followed by table switch symbol "@@", the "@@" will
not be consider as sign for table switching, instead it will be consider as part
of key string.

8. For multi-get, there is no limitation in terms of number of keys enforced by
InnoDB memcached. InnoDB memcached does enforce a limit on result memory of
128MB to store the result and key value pair.
NF#1: For Performance Test, the multi-get should behave faster than (at least as
fast as) that of single get.
Both the "multi-value get" and "range query" made easier by our current
memcached design, which manages the result buffer ourselves. In this way, we
could keep "multi-results" across searches without worrying result being
overwritten.

And in both case, we will avoid setting up search tuple/trx for each key value,
and use the initial setup for all the searches. In this way, it will enhancement
the performance over single search.

1) For multi-value get, the "get" command will specify the values it try to get
in one single "get" call:

We use an example to demo the use of multi-get and range search. For example, we
have following table, column c1 is mapped to key, and c2 is the value:

+-----+--------------+------+------+------+
| c1  | c2           | c3   | c4   | c5   |
+-----+--------------+------+------+------+
| AA  | HELLO, HELLO |    8 |    0 |    0 |
| bb  | 123          |    0 |    1 |    0 |
| mm  | www          |    0 |    1 |    0 |
| xxx | 3333         |    0 |    1 |    0 |
+-----+--------------+------+------+------+

For multiple value search, if we want to get value for key that equals "aa",
"bb" and "xxx":

get aa bb xxx  
VALUE aa 8 12
HELLO, HELLO
VALUE bb 0 3
123
VALUE xxx 0 4
3333
END

The result will be return in sequence.

It is possible that you could switch table with the first key, but you are not
allow to switch table in subsequent key:

For example, if you issue :

get @@desct12.Koeln Oldenburg
VALUE @@desct12.Koeln 0 7
O|Olden
VALUE Oldenburg 0 11
O|Oldenburg
END

This will switch to table described by "desct12"

2) For range search, we will support single "<", ">", "<=", ">=" operators,
signified by an "@" symbol. For example, if we have following table, 

mysql> select * from t1;
+----+-----------+------+------+------+
| c1 | c2        | c3   | c4   | c5   |
+----+-----------+------+------+------+
| B  | Berlin    |    0 |    0 |    0 |
| C  | Cottbus   |    0 |    0 |    0 |
| D  | Darmstadt |    0 |    0 |    0 |
| H  | Hamburg   |    0 |    0 |    0 |
+----+-----------+------+------+------+
4 rows in set (0.00 sec)

if we want to search all values large than "C":

get @>C
VALUE D 0 9
Darmstadt
VALUE H 0 7
Hamburg
END

Notice all key value large than "C" are returned.

if we want to get values larger than "C" but smaller than H, you can do 
following:

get @>C@C
VALUE D 0 9
Darmstadt
END

There are maximum 2 compare symbols that will be parsed, one with less or less
equal, one with larger or larger equal, the others are treated are part of key.

For example, if you issue following command:

get @C@>D
VALUE D 0 9
Darmstadt
END

The third "@>D" is treated as part of key, so this clause will look for value
smaller than "H" but larger than "C@>D", in which case "D" fits.


3) Affected Module

The overall InnoDB memcached architecture and be categorized into following 
modules:

[memcached client] <---network---> [memcached daemon]<->[innodb memcached]
<-->[Innodb Storage]
                                   <----------- memcached plugin -------->

As expected, the changed code is mainly in the "memcached plugin" part, and in
the interfacing function of [memcached daemon]and [innodb memcached]. More
specifically, it is in "process_get_command()" of Memcached Daemon and
"innodb_get()" of InnoDB Memcached code. The "process_get_command()" will be
responsible to loop through each key in multiple get query, and ask "innodb_get"
to get all the corresponding value, and then append to the result set. This is
the same as for the range query.

Result are returned in key value pair sequence, if multiple key/value qualifies:

get @>=B   
VALUE BB 0 14
KASI VISWANATH
VALUE CC 0 17
PRAJWAL VISWANATH
VALUE DD 0 16
NIKHILA VISWANAT
END
1) 2 new bool flags are added in individual connection structure to indicating
if this is a range search or multiple get:

struct innodb_conn_data_struct {
        ....
	bool		range;		/*!< range search */
	bool		multi_get;	/*!< multiple get */

}

When these 2 bits are set, the connection data are held and the result buffer
are kept until all results are fetched.

2) For range search, the range key are parsed in innodb_get(). Other than the
original equal search, following additional modes are added:

/** @enum ib_srch_mode_t InnoDB cursor search modes for ib_cursor_moveto().
Note: Values must match those found in page0cur.h */
typedef enum {
        IB_CUR_G = 1,                   /*!< If search key is not found then
                                        position the cursor on the row that
                                        is greater than the search key */

        IB_CUR_GE = 2,                  /*!< If the search key not found then
                                        position the cursor on the row that
                                        is greater than or equal to the search
                                        key */

        IB_CUR_L = 3,                   /*!< If search key is not found then
                                        position the cursor on the row that
                                        is less than the search key */

        IB_CUR_LE = 4                   /*!< If search key is not found then
                                        position the cursor on the row that
                                        is less than or equal to the search
                                        key */
} ib_srch_mode_t;


3) Memcached function process_get_command() is modified a bit to support range
and multiple get feature. So it will continue loop to fetch each result,
packaging it, and send to connection (add_iov).


4) InnoDB API function ib_read_tuple() is modified to add column value check
(for scan), so when it do a range scan, it knows when to stop (when the search
criteria no long matches).

5) Similarly in ib_read_tuple(), it also makes sure the result buffer is enough
to hold all results. If not, it will automatically reallocates to extend the 
buffer.

6) InnoDB APIs function ib_cb_moveto() is modified to add a "direction"
parameter. So when it can scan forward to get next rows for qualification.