MySQL Internals Manual  /  Important Algorithms and Structures  /  How MySQL Uses the Join Buffer Cache

10.5 How MySQL Uses the Join Buffer Cache

Basic information about the join buffer cache:

  • The size of each join buffer is determined by the value of the join_buffer_size system variable.

  • This buffer is used only when the join is of type ALL or index (in other words, when no possible keys can be used).

  • A join buffer is never allocated for the first non-const table, even if it would be of type ALL or index.

  • The buffer is allocated when we need to do a full join between two tables, and freed after the query is done.

  • Accepted row combinations of tables before the ALL/index are stored in the cache and are used to compare against each read row in the ALL table.

  • We only store the used columns in the join buffer, not the whole rows.

Assume you have the following join:

Table name      Type
t1              range
t2              ref
t3              ALL

The join is then done as follows:

- While rows in t1 matching range
 - Read through all rows in t2 according to reference key
  - Store used fields from t1, t2 in cache
  - If cache is full
    - Read through all rows in t3
      - Compare t3 row against all t1, t2 combinations in cache
        - If row satisfies join condition, send it to client
    - Empty cache

- Read through all rows in t3
 - Compare t3 row against all stored t1, t2 combinations in cache
   - If row satisfies join condition, send it to client

The preceding description means that the number of times table t3 is scanned is determined as follows:

S = size-of-stored-row(t1,t2)
C = accepted-row-combinations(t1,t2)
scans = (S * C)/join_buffer_size + 1

Some conclusions:

  • The larger the value of join_buffer_size, the fewer the scans of t3. If join_buffer_size is already large enough to hold all previous row combinations, there is no speed to be gained by making it larger.

  • If there are several tables of join type ALL or index, then we allocate one buffer of size join_buffer_size for each of them and use the same algorithm described above to handle it. (In other words, we store the same row combination several times into different buffers.)


User Comments
Sign Up Login You must be logged in to post a comment.