# WL#1524: Bitmap indexes

Affects: Server-7.1   —   Status: Un-Assigned

```A bitmap index is a set of bit vectors, which are useful
for a certain type of search on columns of a table.

A bit vector is just a series of bits, e.g. 0101010100001111.
Although I prefer the term "bit vector", others say "bitmap".

An example
----------

Suppose there is a table T, with a column C. There
are 5 rows in T. Column C can have any one of these values:
'mary', 'joe', NULL. The contents of T are:

Row #  column C
-----  --------
0      mary
1      mary
2      joe
3      joe
4      mary

We create a bitmap index:

CREATE BITMAP INDEX ON T (C);

Now there are 3 bit vectors (there must be one bit vector
for each possible value of the column, and there are three
possible values). Each vector has 5 bits (there must be one
bit for each row in the table, and there are 5 rows in the
table).

'mary' bit vector   'joe' bit vector   NULL bit vector
-----------------   ----------------   ---------------
1                   0                  0               /* bit 0 */
1                   0                  0               /* bit 1 */
0                   1                  0               /* bit 2 */
0                   1                  0               /* bit 3 */
1                   0                  0               /* bit 4 */

The condition "C = 'mary'" is TRUE for rows 0, 1, and 4.
Therefore in the 'mary' bit vector bits 0, 1, and 4 are
TRUE (1) but bits 2, 3 are FALSE (0). Meanwhile, the
condition "C = 'joe'" is TRUE for rows 2, 3 so in the
'joe' bit vector bits 2 and 3 are on. For any given bit
position, there is a TRUE (1) bit in one and only one
of the bit vectors.

Now we search using the bitmap index.

SELECT COUNT(*) FROM T WHERE C IS NULL;
... Do this by counting the 1 bits in the NULL bit vector.

The good things that we can note from this example are
that the index is small (a total of 15 bits for 5 records),
and the search is easy for a certain kind of problem.

Statement Format
----------------

CREATE BITMAP INDEX index_name ON table_name (column_name);

If you don't say BITMAP, then MySQL makes a btree index.
(BITMAP will never be the default index type.)

Error if: index_name already exists, for any index type

Error if: row is not fixed-length

The syntax "CREATE BITMAP INDEX ..." is not standard, but does
appear in Oracle.

We also could add "BITMAP" to the syntax for the KEY clause:
{ KEY | INDEX } [ ident ]
[ [ USING | TYPE ] [ BTREE | RTREE | HASH | BITMAP ] ]

Notice that we do not have a clause for stating what the possible
values are. For some MySQL data types (SET, ENUM) the
range is limited anyway. For other data types, we should insist
on the presence of a CHECK clause for the column definition,
with "IN (...)" to list all possibilities except for NULL.
That is not what Oracle does -- Oracle can make new bit vectors
dynamically -- but things are much simpler if we know in advance
what values are possible. Thus, for our mary/joe/NULL example,
the creation statements would look like:

CREATE TABLE T (C char(5), CHECK (C IN ('mary','joe')));
CREATE BITMAP INDEX I ON T (C);

Internals note: the effect is that we will create three files
file named T.MB0, T.MB1, T.MB2. It is not appropriate to store
bit vectors in a .MYI file. It is possible to have a single
file with large pages and with a part of each page reserved
for each bit vector, but let's try it out with separate files
first, it's easier that way.

Use in queries
--------------

When should the optimizer choose to use a bitmap index? That
is, under what conditions is a bitmap advantageous? We will
not know the answer until we have done some benchmarks with a
real database. The likely cases are:

- If there is a btree index for any column in the search,
use the btree index and ignore the bitmap index

- If the condition contains anything other than "column = literal"
or "column IS [NOT] NULL", use a full-table scan and ignore the
bitmap index.

Bitmap indexes might also prove useful for EXISTS and COUNT.
Sometimes the user will have to use a hint to "force" bitmap
index use.

The two most important operations involve converting SQL
conditional expressions to bit vector positions, and back.

Example: for "WHERE C='joe'" in a WHERE clause,
(a) find out that 'joe' is the second item in the CHECK for
the table and therefore the bit vector is T.MB1
(b) looking at byte #0 from T.MB1, see that the value is
0x30, and conclude (by a table lookup) that this means
row #2 is TRUE, and row #3 is TRUE. If the value 0x30
were in byte #X, we would have to do the calculation as
row #(2+(8*X)) is TRUE, and row #(3+(8*X)) is TRUE.

Use in data-change statements
-----------------------------

Notoriously, bitmap index maintenance is slow. Every time an
indexed column value changes, we have to change every bit
vector. An INSERT may cause an increase in the size of every
bit vector. For a DELETE, just set the bit value to FALSE (0)
for the appropriate bit position in every bit vector.

Possible improvements
----------------------

Handling variable-length rows is harder, because the index
key points directly to a position within the .MYI file --
there is no "rowid". Therefore, you need ANOTHER index,
which answers the question: given that row number is 555,
what is the position in the file? Use of such an index
might slow access down, so the number of places where a
bitmap index is useful is reduced when row length is variable.

Handling columns without CHECK clauses is harder, because
there has to be a list somewhere showing what value is
associated with what bit vector. This list is dynamic;
whenever a new value is inserted that is not in the list,

Instead of multiple files, we could have one file with
multiple pages. Say that a page has N bytes, and there
are 3 possible values. Then the first N/3 bytes of the
page are for value 1, the next N/3 bytes of the page are
for value 2, and the final N/3 bytes of the page are for
value 3.

Indexing a condition, rather than a value, would be good
for both btree and bitmap indexes.

From my notes
-------------

There have been suggestions from the user list about bitmaps. See:
http://lists.mysql.com/mysql/125271

On June 20 2000, Monty said on the MySQL mailing list:
"This [bitmap indexing] would be easy to do for fixed length
rows; For dynamic length rows this would in many case take up
even more space than the current packed index method!"

SQL Performance Tuning mentions that Informix, Oracle, and Sybase
support bitmap indexes. So does IBM, but DB2 bitmap indexes are
not similar to the others.

References
----------

Bug##30347 Support Bitmap Indexes (feature request)```