WL#4443: Remove scalability problem with many locks and external locks on partitioned tables with many partitions

Affects: Server-Prototype Only   —   Status: Complete

Since tables are locked before any preparation and optimizing is done, it is not
possible to prune partition locks in the optimizing phase.

This leads to bad performance and unnecessary work (all partitions are locked
for each and every query, if not explicit pruning, WL#5217, is used).

Therefore if we delay the locking until after prune_partitions is called we can
skip locking of pruned/non used partitions.

This can be done by also call prune_partitions in JOIN::prepare() to prune on
constants (non constants and subqueries cannot be locked pruned).

We must also enforce that nothing in the prepare phase rely on locked tables.


One side effect of this worklog will be shorter table locks too, since the
prepare phase will be done before
locking all tables. This may increase the overall performance.

This will fix BUG#37252.

Since locking is the major problem for many partitions, this worklog will not
try to delay any open calls (since open tables are cached). That will be
wl#3513, which may need refactoring of the table cache.
There should be no user visible change by this worklog.

There is no syntax changes.

To make it possible to prune away locks of unused partitions, locking have to be
placed after prune_partitions is done. Currently prune_partitions is called in
JOIN::optimize after tables are both opened and locked. This worklog will change
this so lock_tables() will be called after prune_partitions() are executed.

We should not use open_and_lock_tables(), but instead use
open_normal_and_derived_tables(), do
preparations and pre-locking optimizations, call prune_partitions, and then call
lock_tables() and continue with post-locking optimizations and execution.


Affected DML statements:

SELECT (including UNION and JOIN) should only lock the same partitions that
needs to read.

single table UPDATE will only be pruned for locking if no partitioning field is
given for update. (Not part of this worklog: One could optimize this if the
partitioning fields to be updated is constants only, then one could add those
matching partitions to be locked too).

multi table UPDATE will only prune locks for tables that does not update any
partitioning fields. (for enhancements see single table UPDATE). 

REPLACE/INSERT ... VALUES should only lock the partitions having rows to be
inserted/replaced. This means that we need to iterate over every row to be
inserted/replaced and evaluate which partition it belongs to and mark that
partition for use/lock.
NOTE: If a AUTO_INCREMENT will be generated and it is a part of the
partitioning fields then all partitions must be locked (Not yet part of this
worklog: Optimization may be possible for range partitioning).

INSERT ... VALUES ... ON DUPLICATE KEY will only be pruned for use/lock if no
partition field is given for update (for optimization, see single-table UPDATE)

INSERT SELECT, for the SELECT clause see SELECT above, for the INSERT table all
partitions needs to be locked. (One could optimize this if the SELECT will
return a known/constant set of values for the partitioning key, but that is not
a part of this worklog).

LOAD will not be pruned, since it is mainly used for inserting a large batch of
rows. It would need the input file to be read and parsed and each row to
evaluate its partitioning function to determine which partitions to use prior
locking. In case of large batches the gain (pruning of locks) seems smaller than
the cost (reading, parsing and calculate the partition id). (One could optimize
this if all partitioning fields are static, and only non partitioning fields are
read from the file. Not part of this worklog. Note: special care with
REPLACE, see INSERT).


There might be other positive performance side effects, since locks on
table/partition level is held for a shorter period of time, which may affect
concurrent queries.


If there are BEFORE INSERT/UPDATE triggers on a table which uses any
partitioning column, it cannot be pruned for
INSERT/UPDATE, since the trigger can alter its values:
If there are BEFORE INSERT trigger, then INSERT/REPLACE cannot be pruned, since
the BEFORE INSERT trigger may change the partitioning columns before it is
inserted, forcing it to a different partition.
If there are BEFORE UPDATE trigger, then INSERT ON DUPLICATE KEY UPDATE cannot
be pruned and UPDATE cannot prune locks.
The main files that needs changes is:

runtime area:
* sql_base.cc - Create open_query_tables() and lock_query_tables() which stores
its state in Query_tables_list. move the explicit lock pruning added in WL#5217
from open_tables() to open_table().
* sql_parse.cc - remove open_and_lock_tables() where partitioning lock pruning
should be enables and use open_query_tables() instead and making use of the
modified handle_select.

optimizer area:
* sql_select.cc - Split the body of mysql_select() into two parts,
mysql_prepare_select() and mysql_execute_select() where locking can be done in
between.
* sql_union.cc - Add open and lock handling with
open_query_tables/lock_query_tables to mysql_union()..
* sql_update.cc - same as select but for updates.
* sql_insert.cc - same as select but for inserts.
* opt_range.cc - prune locks by clearing bits in part_info->lock_partitions bitmap.
* item* - fix various problems with the requirement that prepare and fix_fields
only should access table metadata, as the tables are only opened, not locked.
Currently it is possible that during the prepare phase/fix_fields call,
subqueries is executed etc. which needs the tables to be locked.



This worklog will not affect Database Administration statements.
It will affect the following DDL's:
* CREATE VIEW will no longer lock any tables, since it is not needed. Side
effect of using open_query_tables.
* EXCHANGE PARTITION does now prune locks (i.e. only the exchanged table and the
exchanged partition is locked.
* ALTER TABLE t TRUNCATE PARTITION does now prune locks.

NOTEs:
* INSERT DELAYED is not supported by partitioned tables (so it will not need to
be enhanced).
* LOCK TABLES cannot prune partition locks.
* CALL stored_procedure(arg_expr) supports lock pruning, but evaluating the
arg_expr does not.
* DO does NOT support partitioning lock pruning.
* SET does NOT support partitioning lock pruning. NO - it is a DB Admin
statement.

To investigate:
* effects on replication (statement and row)