MySQL :: MySQL Cluster Multi-Range Read using NDB API

As described in “Batching – improving MySQL Cluster performance when using the NDB API“, reducing the number of times the application node has to access the data nodes can greatly improve performance and reduce latency. That article focussed on setting up multiple operatations (as part of a single transaction) and then executing them as a single batch sent by the NDB API library to the data nodes.

The purpose of this entry is to show how a single NDB API operation can access multiple rows from a table with a single index lookup. It goes on to explain the signifficance of this both now and in the future (much faster joins using SQL for MySQL Cluster tables).

There are several operation types to cover table scans and index lookups (refer to the “MySQL Cluster API Developer Guide” for detals). For this example, I use an NdbIndexScanOperation.

The code sample assumes that the following table structure and data has been set up for table “COUNTRY” in database “TEST_DB_1” using the NDB storage engine (Note that the full example application sets this up automatically):

+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| SUB_ID       | int(10) unsigned | NO   | PRI | NULL    |       |
| COUNTRY_CODE | int(10) unsigned | NO   |     | NULL    |       |
+--------------+------------------+------+-----+---------+-------+

+--------+--------------+
| SUB_ID | COUNTRY_CODE |
+--------+--------------+
|     13 |            1 |
|      2 |            1 |
|      4 |           61 |
|      7 |           46 |
|      9 |           44 |
|     10 |           33 |
|     12 |           44 |
|      5 |           33 |
|     14 |           61 |
|      1 |           44 |
|      8 |            1 |
+--------+--------------+

+--------------+------------------+------+-----+---------+-------+

+--------------+------------------+------+-----+---------+-------+

+--------------+------------------+------+-----+---------+-------+

+--------+--------------+

| SUB_ID | COUNTRY_CODE |

+--------+--------------+

| 13 | 1 |

| 2 | 1 |

| 4 | 61 |

| 7 | 46 |

| 9 | 44 |

| 10 | 33 |

| 12 | 44 |

| 5 | 33 |

| 14 | 61 |

| 1 | 44 |

| 8 | 1 |

+--------+--------------+

The following code causes the NDB API library to send a single request from the application to the data nodes to read the rows where the primary key “SUB_ID” falls into the ranges (2<= SUB_ID <4); (5 < SUB_ID <=9) or (SUB_ID == 13). Note that this is just a fragment of the code and the error checking has been removed for clarity (refer to full example application to see the rest of the code, including the error handling).

NdbIndexScanOperation *psop;

/* RecAttrs for NdbRecAttr Api */
NdbRecAttr *recAttrAttr1;
NdbRecAttr *recAttrAttr2;

psop=myTransaction->getNdbIndexScanOperation(myPIndex);

Uint32 scanFlags=
  NdbScanOperation::SF_OrderBy |
  NdbScanOperation::SF_MultiRange |
  NdbScanOperation::SF_ReadRangeNo;

psop->readTuples(NdbOperation::LM_Read,
                 scanFlags,
                 (Uint32) 0,          // batch
                 (Uint32) 0)          // parallel

/* Add a bound
* Tuples where SUB_ID >=2 and < 4
*/
Uint32 low=2;
Uint32 high=4;
Uint32 match=13;

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLE, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGT, (char*)&high);
psop->end_of_bound(0);

/* Second bound
* Tuples where SUB_ID > 5 and <=9
*/
low=5;
high=9;
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLT, (char*)&low);
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGE, (char*)&high);
psop->end_of_bound(1);

/* Third bound
* Tuples where SUB_ID == 13
*/
psop->setBound("SUB_ID", NdbIndexScanOperation::BoundEQ, (char*)&match);
psop->end_of_bound(2);

/* Read all columns */
recAttrAttr1=psop->getValue("SUB_ID");
recAttrAttr2=psop->getValue("COUNTRY_CODE");

myTransaction->execute( NdbTransaction::Commit);

while (psop->nextResult(true) == 0)
{
  printf(" %8d    %8d   Range no : %2dn",
  recAttrAttr1->u_32_value(),
  recAttrAttr2->u_32_value(),
  psop->get_range_no());
}
psop->close();

NdbIndexScanOperation *psop;

/* RecAttrs for NdbRecAttr Api */

NdbRecAttr *recAttrAttr1;

NdbRecAttr *recAttrAttr2;

psop=myTransaction->getNdbIndexScanOperation(myPIndex);

Uint32 scanFlags=

NdbScanOperation::SF_OrderBy |

NdbScanOperation::SF_MultiRange |

NdbScanOperation::SF_ReadRangeNo;

psop->readTuples(NdbOperation::LM_Read,

scanFlags,

(Uint32) 0, // batch

(Uint32) 0) // parallel

/* Add a bound

* Tuples where SUB_ID >=2 and < 4

Uint32 low=2;

Uint32 high=4;

Uint32 match=13;

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLE, (char*)&low);

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGT, (char*)&high);

psop->end_of_bound(0);

/* Second bound

* Tuples where SUB_ID > 5 and <=9

low=5;

high=9;

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundLT, (char*)&low);

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundGE, (char*)&high);

psop->end_of_bound(1);

/* Third bound

* Tuples where SUB_ID == 13

psop->setBound("SUB_ID", NdbIndexScanOperation::BoundEQ, (char*)&match);

psop->end_of_bound(2);

/* Read all columns */

recAttrAttr1=psop->getValue("SUB_ID");

recAttrAttr2=psop->getValue("COUNTRY_CODE");

myTransaction->execute( NdbTransaction::Commit);

while (psop->nextResult(true) == 0)

{

printf(" %8d %8d Range no : %2dn",

recAttrAttr1->u_32_value(),

recAttrAttr2->u_32_value(),

psop->get_range_no());

}

psop->close();

When run, this code produces the following output:

SUB_ID    COUNTRY_CODE
2           1   Range no :  0
7          46   Range no :  1
8           1   Range no :  1
9          44   Range no :  1
13          1   Range no :  2

SUB_ID COUNTRY_CODE

2 1 Range no : 0

7 46 Range no : 1

8 1 Range no : 1

9 44 Range no : 1

13 1 Range no : 2

Why is this signifficant?

This can be very useful for applications using the NDB API; imagine an application that wanted to find the birthdays for all of my friends. Assume that I have 2 tables of interest:

mysql> describe friends; describe birthday;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name   | varchar(30) | NO   | PRI | NULL    |       |
| friend | varchar(30) | NO   | PRI |         |       |
+--------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name  | varchar(30) | NO   | PRI | NULL    |       |
| day   | int(11)     | YES  |     | NULL    |       |
| month | int(11)     | YES  |     | NULL    |       |
| year  | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> describe friends; describe birthday;

+--------+-------------+------+-----+---------+-------+

+--------+-------------+------+-----+---------+-------+

| friend | varchar(30) | NO | PRI | | |

+--------+-------------+------+-----+---------+-------+

2 rows in set (0.00 sec)

+-------+-------------+------+-----+---------+-------+

+-------+-------------+------+-----+---------+-------+

| day | int(11) | YES | | NULL | |

| month | int(11) | YES | | NULL | |

| year | int(11) | YES | | NULL | |

+-------+-------------+------+-----+---------+-------+

4 rows in set (0.00 sec)

Using the NDB API, I can create 1 NdbIndexScanOperation operation to read all tuples from “friends” where the “name” field matches “Andrew” and then use the results to perform a second NdbIndexScanOperation to read the rows that match each of my friends’ names in the “birthday” table. In other words, performing a join using only 2 trips from the application to the data nodes.

Of course, it’s simple to get the same results using SQL…SELECT birthday.name, birthday.day, birthday.month FROM friends, birthday WHERE friend.name='Andrew' AND friends.friend=birthday.name;However, if the tables are very large and I have a lot of friends then performing this join using SQL can be expensive as it requires a separate trip to the data nodes to fetch each birthday. In the future, Batched Key Access (BKA) will optimise these joins by performing one of these NDB API Multi-Range Reads (MRR) to fetch all of the birthdays in one go! Until then, using the NDB API directly can deliver signifficantly faster results.