One of problems of current optimizer is joining the tables where lookup one has very different data distribution. Consider: select * from a,b where a.c1=const and b.c=a.c2; In this case optimizer can't use constant based estimation for table "b" but only to use cardinality value. If table b will have very uneven value distribution, for example 1010 rows, 11 distinct values. 1 met in 1000 rows and others each in one. The MySQL optimizer in such case will do index in lookup tables for all values, which is not good. The same can happen other way around - full table scan can be selected for value which really has just a few rows matching. The fix: MySQL shall be able to detect what for this case the variance of distribution is large and so it shall select table scan or range scan for each outer loop constant. There are two solutions how to select it - real data based. Where we do the range check for each value to decide if we're to use index or full table scan. Is generally slow. Usable for some cases however (ie small outer table) More general approach is to use some sort of extended statistics, which covers these "exceptional" values. It can be done as simple excpetion list + cardinality as well as histogram. The histogram is usually build for some fixed, size a way to have same amount of rows for each histogram part. We shall also store amount of NULL values to be able to handle NULL searches as well. The histogram building example: (There are variations possible) NULL NULL 1 1 2 2 2 2 2 3 4 5 5 5 Total 12 values + 2 NULLs Let us build histogram with 4 parts. Each part matched to 4 rows: 1 1 2 ------- 2 2 2 ------- 2 3 4 ------- 5 5 5 Which can be stored as: NULL=2 TOTAL=14 1-2|2-2|2-3|5-5 For large row number we can assume number hi(n)=low(n+1) and store just min=1|2|2|3|5 which means: 1-2|2-2|2-3|3-5| Having such histogram we can check How many rows matches 2? 2 is met in 3/4 ranges -> 9 rows how many rows matches 1? 1 is met in 1/4 ranger -> 3 rows How many rows have value 8? It is not in the histogram -> 0 rows (note we shall be careful as MySQL in some cases assumes 0 is accurate) how many rows are in range 1-4 4/4 -> 12 rows We possible can have histogram size adjustable on index level in long run. Histograms allows you to cover "exceptions" as well. As having histogram with 10 values would have dedicated positions for exceptions having 10+% values, basically having the same info as exception list.