WL#1524: Bitmap indexes
Affects: Server-7.1
—
Status: Un-Assigned
A bitmap index is a set of bit vectors, which are useful for a certain type of search on columns of a table. A bit vector is just a series of bits, e.g. 0101010100001111. Although I prefer the term "bit vector", others say "bitmap". An example ---------- Suppose there is a table T, with a column C. There are 5 rows in T. Column C can have any one of these values: 'mary', 'joe', NULL. The contents of T are: Row # column C ----- -------- 0 mary 1 mary 2 joe 3 joe 4 mary We create a bitmap index: CREATE BITMAP INDEX ON T (C); Now there are 3 bit vectors (there must be one bit vector for each possible value of the column, and there are three possible values). Each vector has 5 bits (there must be one bit for each row in the table, and there are 5 rows in the table). 'mary' bit vector 'joe' bit vector NULL bit vector ----------------- ---------------- --------------- 1 0 0 /* bit 0 */ 1 0 0 /* bit 1 */ 0 1 0 /* bit 2 */ 0 1 0 /* bit 3 */ 1 0 0 /* bit 4 */ The condition "C = 'mary'" is TRUE for rows 0, 1, and 4. Therefore in the 'mary' bit vector bits 0, 1, and 4 are TRUE (1) but bits 2, 3 are FALSE (0). Meanwhile, the condition "C = 'joe'" is TRUE for rows 2, 3 so in the 'joe' bit vector bits 2 and 3 are on. For any given bit position, there is a TRUE (1) bit in one and only one of the bit vectors. Now we search using the bitmap index. SELECT COUNT(*) FROM T WHERE C IS NULL; ... Do this by counting the 1 bits in the NULL bit vector. The good things that we can note from this example are that the index is small (a total of 15 bits for 5 records), and the search is easy for a certain kind of problem. Statement Format ---------------- CREATE BITMAP INDEX index_name ON table_name (column_name); If you don't say BITMAP, then MySQL makes a btree index. (BITMAP will never be the default index type.) Error if: index_name already exists, for any index type Error if: row is not fixed-length The syntax "CREATE BITMAP INDEX ..." is not standard, but does appear in Oracle. We also could add "BITMAP" to the syntax for the KEY clause: { KEY | INDEX } [ ident ] [ [ USING | TYPE ] [ BTREE | RTREE | HASH | BITMAP ] ] Notice that we do not have a clause for stating what the possible values are. For some MySQL data types (SET, ENUM) the range is limited anyway. For other data types, we should insist on the presence of a CHECK clause for the column definition, with "IN (...)" to list all possibilities except for NULL. That is not what Oracle does -- Oracle can make new bit vectors dynamically -- but things are much simpler if we know in advance what values are possible. Thus, for our mary/joe/NULL example, the creation statements would look like: CREATE TABLE T (C char(5), CHECK (C IN ('mary','joe'))); CREATE BITMAP INDEX I ON T (C); Internals note: the effect is that we will create three files file named T.MB0, T.MB1, T.MB2. It is not appropriate to store bit vectors in a .MYI file. It is possible to have a single file with large pages and with a part of each page reserved for each bit vector, but let's try it out with separate files first, it's easier that way. Use in queries -------------- When should the optimizer choose to use a bitmap index? That is, under what conditions is a bitmap advantageous? We will not know the answer until we have done some benchmarks with a real database. The likely cases are: - If there is a btree index for any column in the search, use the btree index and ignore the bitmap index - If the condition contains anything other than "column = literal" or "column IS [NOT] NULL", use a full-table scan and ignore the bitmap index. Bitmap indexes might also prove useful for EXISTS and COUNT. Sometimes the user will have to use a hint to "force" bitmap index use. The two most important operations involve converting SQL conditional expressions to bit vector positions, and back. Example: for "WHERE C='joe'" in a WHERE clause, (a) find out that 'joe' is the second item in the CHECK for the table and therefore the bit vector is T.MB1 (b) looking at byte #0 from T.MB1, see that the value is 0x30, and conclude (by a table lookup) that this means row #2 is TRUE, and row #3 is TRUE. If the value 0x30 were in byte #X, we would have to do the calculation as row #(2+(8*X)) is TRUE, and row #(3+(8*X)) is TRUE. Use in data-change statements ----------------------------- Notoriously, bitmap index maintenance is slow. Every time an indexed column value changes, we have to change every bit vector. An INSERT may cause an increase in the size of every bit vector. For a DELETE, just set the bit value to FALSE (0) for the appropriate bit position in every bit vector. Possible improvements ---------------------- Handling variable-length rows is harder, because the index key points directly to a position within the .MYI file -- there is no "rowid". Therefore, you need ANOTHER index, which answers the question: given that row number is 555, what is the position in the file? Use of such an index might slow access down, so the number of places where a bitmap index is useful is reduced when row length is variable. Handling columns without CHECK clauses is harder, because there has to be a list somewhere showing what value is associated with what bit vector. This list is dynamic; whenever a new value is inserted that is not in the list, we add to the list and add a new bit-vector file. Instead of multiple files, we could have one file with multiple pages. Say that a page has N bytes, and there are 3 possible values. Then the first N/3 bytes of the page are for value 1, the next N/3 bytes of the page are for value 2, and the final N/3 bytes of the page are for value 3. Indexing a condition, rather than a value, would be good for both btree and bitmap indexes. From my notes ------------- There have been suggestions from the user list about bitmaps. See: http://lists.mysql.com/mysql/125271 On June 20 2000, Monty said on the MySQL mailing list: "This [bitmap indexing] would be easy to do for fixed length rows; For dynamic length rows this would in many case take up even more space than the current packed index method!" SQL Performance Tuning mentions that Informix, Oracle, and Sybase support bitmap indexes. So does IBM, but DB2 bitmap indexes are not similar to the others. References ---------- Bug##30347 Support Bitmap Indexes (feature request)
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.