Basic information about the join buffer cache:
The size of each join buffer is determined by the value of the
join_buffer_size system variable.
This buffer is used only when the join is of type
ALL or index (in other
words, when no possible keys can be used).
A join buffer is never allocated for the first non-const
table, even if it would be of type ALL or
index.
The buffer is allocated when we need to do a full join between two tables, and freed after the query is done.
Accepted row combinations of tables before the
ALL/index are stored in
the cache and are used to compare against each read row in the
ALL table.
We only store the used columns in the join buffer, not the whole rows.
Assume you have the following join:
Table name Type t1 range t2 ref t3 ALL
The join is then done as follows:
- While rows in t1 matching range
- Read through all rows in t2 according to reference key
- Store used fields from t1, t2 in cache
- If cache is full
- Read through all rows in t3
- Compare t3 row against all t1, t2 combinations in cache
- If row satisfies join condition, send it to client
- Empty cache
- Read through all rows in t3
- Compare t3 row against all stored t1, t2 combinations in cache
- If row satisfies join condition, send it to client
The preceding description means that the number of times table
t3 is scanned is determined as follows:
S = size-of-stored-row(t1,t2) C = accepted-row-combinations(t1,t2) scans = (S * C)/join_buffer_size + 1
Some conclusions:
The larger the value of join_buffer_size,
the fewer the scans of t3. If
join_buffer_size is already large enough to
hold all previous row combinations, there is no speed to be
gained by making it larger.
If there are several tables of join type
ALL or index, then we
allocate one buffer of size
join_buffer_size for each of them and use
the same algorithm described above to handle it. (In other
words, we store the same row combination several times into
different buffers.)
