WL#7593: New data dictionary: don't hold LOCK_open while reading table definition.

Affects: Server-5.7   —   Status: Complete

With new data-dictionary TABLE_SHARE objects are constructed from table
definitions coming from data-dictionary subsystem. This subsystem may
need to read information about tables from data-dictionary tables which
means that it might need to construct TABLE_SHARE objects for them.
(There is no problem with reading definitions for data-dictionary
tables from data-dictionary tables thanks to the fact that we have
hard-coded definitions of these tables around). Since TABLE_SHARE
objects are created under protection of LOCK_open mutex such recursion
is impossible currently. To allow it we need to change code not to
hold LOCK_open during most of the process of TABLE_SHARE creation.
Since after this step we will still need some way to avoid access
to TABLE_SHARE which was not fully constructed a new flag indicating
such TABLE_SHARE objects is required.

User Documentation

No user-visible effects. No user documentation required.
FR1) There should be almost no user-visible changes in behavior. After this task
is implemented it becomes possible to read more than one table definition from
DD concurrently (or read it from .FRMs concurrently) is mostly testable with
Basically we need to change get_table_share() implementation to
release LOCK_open mutex for duration of open_table_def() call.

There are some issues though:

- We need to ensure that concurrent get_table_share() calls won't
  try to create another TABLE_SHARE for the same table and won't
  see incomplete TABLE_SHARE object. So share to be loaded needs
  to be inserted and kept in share hash during its loading and new
  member to TABLE_SHARE needs to be added to detect incomplete shares.
  This member, e.g. called m_open_in_progress, can be protected by
  LOCK_open. New condition variable (possibly global?) needs to be added
  to allow concurrent threads try to access share to wait until loading
  is completed.
- To avoid incomplete shares causing problems with code in
  tdc_remove_table() and TABLE_SHARE::wait_for_old_version()
  we need to change code to ensure that ref_count is non-0 during
  call to open_table_def().
- Also we need to ensure that when DD subsystem will open DD
  tables it won't wait for table flushes and/or MDL.
  As otherwise there is a risk of introducing deadlock which
  won't be properly detected.
- Releasing LOCK_open inside of get_table_share() can break
  invariants of code calling this function. We should analyze
  possible issues and take corrective actions if necessary.
  Possibly acquiring/releasing of LOCK_open should be moved
  into get_table_share().
The issues listed in the HLD are addressed in the following way: 

1. Control concurrent get_table_share() calls for the same share
1.1 Add bool TABLE_SHARE::m_open_in_progress

1.2 Add mysql_cond_t COND_open, init and destroy in same places as LOCK_open

1.3 Rewrite get_table_share:

1.3.1 When looking up in hash table, loop and check m_open_in_progress:
      while ((share= my_hash_search(...))
        if (!share->m_open_in_progress)
          goto found;

1.3.2 Set TABLE_SHARE::m_open_in_progress= true after alloc_table_share()

1.3.3 Temporarily release LOCK_open while calling open_table_def:
      open_table_err= open_table_def(...);

1.3.4 Lock LOCK_open after reading table def, signal condition: 
      share->m_open_in_progress= false;

2. Avoid incomplete shares causing problems
2.1 share->ref_counter++ while holding LOCK_open before calling open_table_def
2.2 share->ref_counter-- after re-locking LOCK_open if error in open_table_def
to allow the share to be destroyed when deleting it from the TDC

3. Handle DD subsystem opening DD tables
This is not relevant as long as this code is only pushed to trunk, but it will
become relevant when the code is merged with the new data dictionary.

4, Analyze current invariants regarding mutex usage
Scenarios involving flush operations running concurrently with opening tables
have been considered. Test cases have also been written to verify behavior. No
issues have been identified.