Topics in this section:
In the MySQL Client/Server Protocol, a
PREPARE
/EXECUTE
cycle
required two round-trips as the
COM_STMT_EXECUTE
requires data from the
COM_STMT_PREPARE-ok
packet.
The X Protocol is designed in a way that the
EXECUTE
stage doesn't depend on the response
of the PREPARE
stage.
That allows the client to send both PREPARE
and EXECUTE
after each other without waiting
for the server's response.
See the Implementation Notes about how to efficiently implement pipelining.
When inserting a large set of data (data import), make a trade-off among:
memory usage on client and server side
network round-trips
error reporting
For this example it is assumed that 1 million rows, each 1024 bytes in size have to be transferred to the server.
Optimizing for Network Round-Trips
(Assuming the MySQL Client/Server Protocol in this case) Network round-trips can be minimized by creating a huge SQL statements of up to 1Gbyte in chunks of 16Mbyte (the protocol's maximum frame size) and sending it over the wire and letting the server execute it.
INSERT INTO tbl VALUES
( 1, "foo", "bar" ),
( 2, "baz", "fuz" ),
...;
In this case:
the client can generate the SQL statement in chunks of 16Mbyte and write them to the network
(memory usage) the server waits until the full 1GByte is received
(execution delay) before it can start parsing and executing it
(error-reporting) in case an error (parse-error, duplicate key error, ...) the whole 1Gbyte message will be denied without any good way to know where the error in that big message happened
The Execution Time for inserting all rows in one batch is:
1 * RTT +
(num_rows * Row Size / Network Bandwidth) +
num_rows * Row Parse Time +
num_rows * Row Execution Time
Optimizing for Memory Usage and Error-Reporting
The other extreme is using single row INSERT
statements:
INSERT INTO tbl VALUES
( 1, "foo", "bar" );
INSERT INTO tbl VALUES
( 2, "baz", "fuz" );
...
client can generate statements as it receives data
streams it to the server
(execution delay) server starts executing statements as soon as it receives the first row
(memory usage) server only has to buffer a single row
(error-reporting) if inserting one row fails, the client knows about it when it happens
as each statement results in its own round-trip, the network-latency is applied for each row instead of once
each statement has to be parsed and executed in the server
Using Prepared Statements solves the last bullet point:
The Execution Time for inserting all rows using prepared statements and the MySQL Client/Server Protocol is:
num_rows * RTT +
(num_rows * Row Size / Network Bandwidth) +
1 * Row Parse Time +
num_rows * Row Execution Time
Optimizing for Execution Time and Error-Reporting
In the X Protocol, a pipeline can be used to stream messages to the server while the server is executing the previous message.
The Execution Time for inserting all rows using prepared statements and using pipelining is (assuming that the network is not saturated):
1 * RTT +
(1 * Row Size / Network Bandwidth) +
1 * Row Parse Time +
num_rows * Row Execution Time
one
network latency to get the initialprepare
/execute
across the wireone
network bandwith to get the initialprepare
/execute
across the wire. All further commands arrive at the server before the executor needs them thanks to pipelining.one
row parse time to parse theprepare
num_rows
row execution time stays as before
In case error reporting isn't a major topic
one can combine multi-row INSERT
with
pipelining and reduce the per-row network overhead. This is
important in case the network is saturated.
Inserting multiple documents into a collection
col1
is a two-step process:.
prepare the insert
pipeline the execute messages
By utilizing pipelining the execute
message
can be batched without waiting for the corresponding
executeOk
message to be returned.
Deleting documents from collection col2
based
on data from collection col1
.
Instead of fetching all rows from col1
first
to construct a big delete
message it can also
be run in nested loop:
Crud.PrepareDelete(stmt_id=2, Collection(name="col2"), filter={ id=? })
Crud.PrepareFind(stmt_id=1, Collection(name="col1"), filter={ ... })
Sql.PreparedStmtExecute(stmt_id=1, cursor_id=2)
while ((rows = Sql.CursorFetch(cursor_id=2))):
Sql.PreparedStmtExecute(stmt_id=2, values = [ rows.col2_id ])
Sql.PreparedStmtClose(stmt_id=2)
Sql.PreparedStmtClose(stmt_id=1)