WL#3732: Information schema optimization

Affects: Server-5.5   —   Status: Complete

The current implementation of I_S retrieves the contents of several of
its tables directly from FRM, MYD, and MYI database files and the file
system itself. These tables are:
COLUMNS, KEY_COLUMN_USAGE, PARTITIONS, REFERENTIAL_CONSTRAINTS,
STATISTICS, TABLES, TABLE_CONSTRAINTS, TRIGGERS and VIEWS.

In order to retrieve the contents of any of the tables, the function
sql_show.cc:get_all_tables() needs to perform many my_dir() and
open_table() calls that require costly disc access. As a result the
current implementation results in orders of magnitude slower query
times over these I_S tables compared to accessing regular tables.

The goal of this task is to improve query times over the tables
above by reducing the accesses to FRM, MYD, and MYI files and the
file system to the possible minimum.

This is needed to help solve BUG#19588.
The information schema tables are internally represented as
temporary tables that are filled at query execution time.
Those I_S tables whose data is retrieved from
FRM, MYD, and MYI files are filled by the function get_all_tables().
The function get_all_tables() is called from 
'get_schema_tables_result()' which can be called in two places:
JOIN::exec() or create_sort_index().

Here is simplified schema of this function:

int get_all_tables(THD *thd, TABLE_LIST *tables, COND *cond)
{
  
  ...
  get_list_of_database_names();
  while ((db_name= it_db_names++))
  {
    ...
    get_list_of_table_names_in_the_database();
    while ((table_name= it_table_names++))
    {
      ....
      open_table()
      store_data_into_I_S_table()
    }
  }
}


Functions get_list_of_database_names() & 
get_list_of_table_names_in_the_database()
use find_files() function which uses my_dir() function. Function 'find_files()'
retrieves list of database name from mysql system directory or
list of table_names from database directory depending on parameters.


There are several problems which make described above I_S tables slow:

1. The process of obtaining of db names & table names list.
   my_dir() function is used for this process.
   The function scans mysql system directory or database
   directory and creates list of database names or file names 
   It's slow because it uses disk access operations.

2. Function open_table() is slow too. This function 
   read and parses  FRM & MYD & MYI files.
   This is a very slow.

   
We need to do following improvements to speed up slow I_S tables:

1. Try to avoid the use of the my_dir() function.
   We can obtain some data(db name & table name)
   from 'where' condition(COND *cond).   
   If the query supplies constant(s) for the
   database and/or table name(s),
   we directly try to access them, 
   and if no such database(s) and/or table(s) exist,
   open_table()|open_frm() returns an error,
   which we map to FALSE for the corresponding predicate.
   
   Example1:
   select table_name from information_schema.tables
   where table_schema='test';
   
   Existing situation:
   For example we have 5 databases & 20 tables in each.
   We have to perform
   1(list of db names)+5(list of table names for each db) calls of my_dir();
   
   But we can get db_name 'test' from where condition and
   we have to perform only one  my_dir() call
   (to get table names from 'test' db);
   
   Example2:
   select table_name from information_schema.tables
   where table_schema='test' and table_name='t1';
   
   We can get table_schema='test' and table_name='t1' and
   perform further operations only for 'test/t1.frm' table. 
   Count of my_dir() call is 0 in this case.
   See 'modified schema of get_all_tables() function' below.
   
   
2. Try to avoid the use of open_table() function

   We don't need to execute open_table() at all  for some queries
   Example:
   select table_name from information_schema.tables;
   
   The values of 'table_name' are already known after my_dir() function call.
   So we can skip open_table() call.
   See 'modified schema of get_all_tables() function' below.
   
3. In some cases it's enough to open FRM file only.
   
   Example:
   select table_name, table_type from information_schema.tables
   where table_schema='test';
   
   we get 'table_name' during get_list_of_table_names_in_the_database() call
   and we can get 'table_type' from FRM file.
   So we don't need to touch MYD and MYI files.
   See 'modified schema of get_all_tables() function' below.

4. We don't need to open views for I_S tables:
   KEY_COLUMN_USAGE, PARTITIONS, REFERENTIAL_CONSTRAINTS,
   STATISTICS, TABLES, TABLE_CONSTRAINTS, TRIGGERS.
   See 'modified schema of get_all_tables() function' below.

5. We don't need to open tables for I_S table VIEW.
   See 'modified schema of get_all_tables() function' below.

6. Use LEX_STRING instead of char *db_name, *table_name to avoid strlen().


Here is modified schema of get_all_tables() function:

get_all_tables()
{
  /* Calculate lookup values form 'where' cond */
  get_lookup_field_values(thd, cond, tables, &lookup_field_vals);

  /*
    Check which I_S table fields are used in 'select'
    and set table_open_method depending on these fields.
  */
  get_table_open_method(TABLE_LIST *tables,
                        ST_SCHEMA_TABLE *schema_table,
                        enum enum_schema_tables schema_table_idx);

  /*
    Create db names list. if we have database lookup value
    then we just add it to list and skip my_dir() function call.
  */
  make_db_list(lookup_field_vals)  // see schema of this function below
  while ((db_name= it_db_names++))
  {
    ...
    /*
      Create table names list. if we have table lookup value
      then we just add it to list and skip find_files() function call.
    */
    make_table_list(lookup_field_vals, db_name); // see schema below
    while ((table_name= it_table_names++))
    {
      ....
      /*
        Open table depending on 'table_open_method'
        SKIP_OPEN_TABLE  - do not need to open table
	OPEN_FRM_ONLY    - open FRM file only
	OPEN_FULL_TABLE  - open FRM, MYD, MYI
      
      */
      if (SKIP_OPEN_TABLE)
        store_data_into_I_S_table()
      else if (OPEN_FRM_ONLY)
      {
        open_frm_only();
      }
      else
      {
        if (I_S table is one of
	    KEY_COLUMN_USAGE, PARTITIONS, REFERENTIAL_CONSTRAINTS,
            STATISTICS, TABLES, TABLE_CONSTRAINTS, TRIGGERS)
          open_table_only();
	else if (I_S table is VIEW)
	  open_view_only();
	else
	  open_table();
      }
      store_data_into_I_S_table()
    }
  }
}


The schema of make_[db]|[table]_list() function:

make_[db]|[table]_list()
{
  if (lookup value)
  {
    add lookup value into list;
    return
  }
  else
  {
    find_files();
  }
}

The schema of get_lookup_field_values() function:

get_lookup_field_values(...)
{
  if (SHOW command)
    get wild str values for database|table names from LEX
  else // 'select from I_S table'
    calculate lookup values from 'WHERE' condition
}
table.h

1. add new constants(open_method)

#define SKIP_OPEN_TABLE 0
#define OPEN_FRM_ONLY   1
#define OPEN_FULL_TABLE 2

2. Add new field 'open_method' into struct st_field_info;
   uint open_method;
   
3. Add new field into ST_SCHEMA_TABLE struct
   uint i_s_requested_object;  /* the object we need to open(TABLE | VIEW) */.
   
4. Add new field to TABLE_LIST struct.
   uint i_s_requested_object;
   This field is set from ST_SCHEMA_TABLE.i_s_requested_object
   for processed table before opening.

5. Add new field i_s_table_state to TABLE_LIST struct.
   uint i_s_table_state[3];
   this field is used for 'explain select' from I_S tablese

sql_show.cc


1. Add initialization of 'open_method' to 'st_field_info' structs.
   Example:
   
ST_FIELD_INFO tables_fields_info[]=
{
  {"TABLE_CATALOG", FN_REFLEN, MYSQL_TYPE_STRING, 0, 1, 0, SKIP_OPEN_TABLE},
  {"TABLE_SCHEMA",NAME_LEN, MYSQL_TYPE_STRING, 0, 0, 0, SKIP_OPEN_TABLE},
  {"TABLE_NAME", NAME_LEN, MYSQL_TYPE_STRING, 0, 0, "Name", SKIP_OPEN_TABLE},
  {"TABLE_TYPE", NAME_LEN, MYSQL_TYPE_STRING, 0, 0, 0, OPEN_FRM_ONLY},
  {"ENGINE", NAME_LEN, MYSQL_TYPE_STRING, 0, 1, "Engine", OPEN_FRM_ONLY},
  {"VERSION", 21 , MYSQL_TYPE_LONG, 0, 1, "Version", OPEN_FRM_ONLY},
  {"ROW_FORMAT", 10, MYSQL_TYPE_STRING, 0, 1, "Row_format", OPEN_FULL_TABLE},
  {"TABLE_ROWS", 21 , MYSQL_TYPE_LONG, 0, 1, "Rows", OPEN_FULL_TABLE},
  {"AVG_ROW_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Avg_row_length",
   OPEN_FULL_TABLE},
  {"DATA_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Data_length", OPEN_FULL_TABLE},
  {"MAX_DATA_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Max_data_length",
   OPEN_FULL_TABLE},
  {"INDEX_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Index_length", OPEN_FULL_TABLE},
  {"DATA_FREE", 21 , MYSQL_TYPE_LONG, 0, 1, "Data_free", OPEN_FULL_TABLE},
  {"AUTO_INCREMENT", 21 , MYSQL_TYPE_LONG, 0, 1, "Auto_increment",
   OPEN_FULL_TABLE},
  {"CREATE_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Create_time", 
OPEN_FULL_TABLE},
  {"UPDATE_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Update_time", 
OPEN_FULL_TABLE},
  {"CHECK_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Check_time", OPEN_FULL_TABLE},
  {"TABLE_COLLATION", 64, MYSQL_TYPE_STRING, 0, 1, "Collation", OPEN_FRM_ONLY},
  {"CHECKSUM", 21 , MYSQL_TYPE_LONG, 0, 1, "Checksum", OPEN_FULL_TABLE},
  {"CREATE_OPTIONS", 255, MYSQL_TYPE_STRING, 0, 1, "Create_options",
   OPEN_FRM_ONLY},
  {"TABLE_COMMENT", 80, MYSQL_TYPE_STRING, 0, 0, "Comment", OPEN_FULL_TABLE},
  {0, 0, MYSQL_TYPE_STRING, 0, 0, 0, SKIP_OPEN_TABLE}
};				     

2. Add initialization of 'i_s_requested_object' to 'ST_SCHEMA_TABLE' structs.
   Example:

ST_SCHEMA_TABLE schema_tables[]=
{
 ....
  {"VIEWS", view_fields_info, create_schema_table, 
   get_all_tables, 0, get_schema_views_record, 1, 2, 0, OPEN_VIEW_ONLY},
 ....
}

3. Modify LOOKUP_FIELD_VALUES struct

  /*
    values which are extracted from where condition or
    values from LEX struct for some 'SHOW' commands
  */

  typedef struct st_lookup_field_values
  {
    LEX_STRING db_value, table_value;
    bool wild_db_value, wild_table_value;
  } LOOKUP_FIELD_VALUES;


4. New function which calculates database name and table name values 
   from 'where' condition if it's possible
   void get_lookup_field_values(THD *thd, COND *cond, TABLE_LIST *table,
                                LOOKUP_FIELD_VALUES *lookup_field_vals);

5. New function which set table open method
   setup_table_open_method(TABLE_LIST *tables,
                           ST_SCHEMA_TABLE *schema_table,
                           enum enum_schema_tables schema_table_idx)
6. New function
   int make_db_list(THD *thd, List *files,
                    LOOKUP_FIELD_VALUES *lookup_field_vals,
                    bool *with_i_schema, bool is_wild_value)
   see 'schema of make_[db]|[table]_list()'
   
7. New function
   int make_table_name_list(THD *thd, List *files, LEX *lex,
                            LOOKUP_FIELD_VALUES *lookup_field_vals,
                            bool with_i_schema, LEX_STRING *db_name)
   see 'schema of make_[db]|[table]_list()'
   
8. Modify 'get_all_tables' function according to new schema.

sql_base.cc

1. Modify functions which are used during open table process
   according to table opening method and requested_object.

sql_select.cc

1. Add support for I_S tables into select_describe() function