WL#9127: Define new handler API for sampling
Affects: Server-8.0
—
Status: Complete
This worklog specifies a new handler API for doing sampling of a given table. The main use case for the new API is for building histograms. Creating a histogram for a large table (terabyte size) can be very costly. Building a histogram over a sample of the data is much more efficient, and it will still give a reasonably good histogram. Another use case in the future, might be to implement the SQL standard feature T613, Sampling. This feature allows for "TABLESAMPLE" to be specified after a clause, and is implemented by several major DBMS (SQL Server, PostgreSQL, Oracle, DB2 etc.): SELECT * FROM t1 TABLESAMPLE SYSTEM (5); The above example would return approximately 5 percentage of the data in the table 't1'. User Documentation ================== No user-visible changes. No user documentation needed.F-1: The sampling percentage MUST be within the range [0.0 100.0] (inclusive). F-2: Given a sampling percentage S, the probability of a row being included in the result set MUST be S/100. This also means that given a sampling percentage of 0, no rows must be returned. F-3: If given the same sampling seed, subsequent calls to the sampling API SHOULD return the same rows given that the underlying table contents has not changed. That is, the sampling SHOULD be deterministic. F-4: The sampling API SHOULD have a default implementation which can be used if the storage engine has not provided an implementation. F-5: The sampling method MUST be sampling without replacement. That is, the same row must not be returned multiple times. F-6: If given a sampling percentage of 100, all of the rows SHOULD be returned. F-7: The sampling SHOULD be evenly distributed over the entire table. It is not sufficient to read i.e. the first 10% of the table. This could lead to important and significant data being left out. A simple example could be a table containing worldwide orders. 50% of the orders could probably be from the USA. If all of the USA rows are located last in the table, it could easily be missed if only the first 10% of the table is read.The SQL standard defines two methods of sampling: SYSTEM and BERNOULLI. For this worklog, we will only provide support for SYSTEM. The difference between SYSTEM and BERNOULLI is basically that SYSTEM is page level sampling while BERNOULLI is row level sampling. We refer to the SQL standard for more details if wanted. We introduce one new strongly typed enum and six new functions: enum class enum_sampling_method { SYSTEM }; class handler { public: int handler::ha_sample_init(double sampling_percentage, int sampling_seed, enum_sampling_method sampling_method); int handler::ha_sample_next(uchar *buf); int handler::ha_sample_end(); private: virtual int handler::sample_init(); virtual int handler::sample_next(uchar *buf); virtual int handler::sample_end(); }; It follows the same pattern as other API functions for reading from tables. The normal workflow will be something like the following code: // Specify through the tables "read_set" which columns we want to read. bitmap_set_all(table->read_set); bitmap_clear_all(table->write_set); // Initialize sampling, where approx. 10% of the rows will be read. table->file->ha_sample_init(10.0, 0, enum_sampling_method::SYSTEM); while (table->file->ha_sample_next(table->record[0])) { // Do stuff with the data. } // End the sampling/close the handler. table->file->ha_sample_end();The three new functions prefixed with ha_* are the functions that will be used by the server to read sampled data. The three new functions without the prefix, are the virtual functions that the storage engines need to override in order to support sampling. We will provide a default implementation for the three virtual functions, which is described in more detail later. Functions specification ----------------------- -- int handler::ha_sample_init(double sampling_percentage, int sampling_seed, enum_sampling_method sampling_method); Initialize sampling, where approx "sampling_percentage" of the rows in the table will be read. The parameter "sampling_percentage" must be greater than or equal to 0.0, and less than or equal to 100.0. The function returns 0 on success, and any other return value indicates an error. -- int handler::ha_sample_next(uchar *buffer); Read the next row via sampling. The row is read into the buffer given by the parameter "buffer". Returns 0 on success and HA_ERR_END_OF_FILE if no more rows are available. Any other return value indicates an error. -- int handler::ha_sample_end(); Ends the sampling. Returns 0 on success, and any other return value indicates an error. -- virtual int handler::sample_init(); -- virtual int handler::sample_next(uchar *buf); -- virtual int handler::sample_end(); Default implementation ---------------------- The three virtual functions will have a default implementation that will be used if the storage engine has not provided an implementation. The following sections describes the default implementation (basically a full table scan with skip-read). When calling "sample_init()", the following step happens: - Call and return the result from "ha_rnd_init(false)". When calling "sample_next()", the following steps happend: - Fetch the next row from the table scan by calling "ha_rnd_next". - Return a random number from the rand-structure, and check if it less or equal than the given sampling percentage. Remember that the rand-structure will return values in the range [0.0 100.0] inclusive, and the probability of including a row is "sampling percentage" / 100. - If the above point is satisfied, return the last row fetched from table scan. If not, fetch the next row from table scan and do the above point again. - If an error is detected at any point, the error is returned from "sample_next()" immediately. When calling "sample_end()", we simply call and return the result from "ha_rnd_end()" to properly end/close the table scan. All in all, the code will look something like the following: // *** CODE BEGIN *** \\ // Extend existing enum with "SAMPLING". enum {NONE=0, INDEX, RND, SAMPLING} inited; // Two member variables, that can be used by engine specific implementations // if desired. double m_sampling_percentage; rand_struct m_sampling_rand; int handler::ha_sample_init(double sampling_percentage, double sampling_seed, enum_sampling_method sampling_method) { DBUG_ENTER("handler::ha_sample_init"); DBUG_ASSERT(sampling_percentage >= 0.0); DBUG_ASSERT(sampling_percentage <= 100.0); DBUG_ASSERT(inited == NONE); // Initialise the random structure. randominit(&m_sampling_rand, sampling_seed, sampling_seed); m_sampling_percentage= sampling_percentage; int result; inited= (result= sample_init()) ? NONE : SAMPLING; DBUG_RETURN(result); } int handler::ha_sample_end() { DBUG_ENTER("handler::ha_sample_end"); DBUG_ASSERT(inited == SAMPLING); inited= NONE; DBUG_RETURN(sample_end()); } int handler::ha_sample_next(uchar *buf) { DBUG_ENTER("handler::ha_sample_next"); DBUG_ASSERT(inited == SAMPLING); m_update_generated_read_fields= table->has_gcol(); int result; MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, MAX_KEY, result, { result= sample_next(buf); }) DBUG_RETURN(result); } int handler::sample_init(double sampling_percentage, double sampling_seed, enum_sampling_method sampling_method) { return ha_rnd_init(false); } int handler::sample_end() { // Temporary set inited to RND, since we are calling ha_rnd_end(). inited= RND; int result= ha_rnd_end(); DBUG_ASSERT(inited == NONE); return result; } int handler::sample_next(uchar *buf) { // Temporary set inited to RND, since we are calling ha_rnd_next(). inited= RND; int res= ha_rnd_next(buf); // my_rnd will return a random number n in the range 0.0 <= n < 1.0. while (!res && my_rnd(&m_sampling_rand) > (m_sampling_percentage / 100.0)) res= ha_rnd_next(buf); inited= SAMPLING; return res; } // *** CODE END *** \\ Other remarks ------------- - The storage engine is free to return approximately correct number of rows. Returning 11 000 rows from an 1 000 000 row table given a sampling amount of 10% is OK. - The columns we want to read, will be marked by using the tables "read_set".Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.