WL#1272: User timeouts on queries

Affects: Server-7.1   —   Status: Un-Assigned

This comes from a customer.
(To see the customer name, check progress report for 2009-02-09.)

They would like to be able to set how long of a query a given user can use.
That is, you trust some users to run queries that will go for hours,
and for other users you want the queries to die if they go on longer
then 20 seconds. 
Contents
--------

Never mind all the other timeouts
What does "time" mean?
A timeout on what?
Starting when?
Where can the user set the timeout?
What values can be in statement_timeout?
What happens when timeout happens?
How does the server detect timeout?
Should MySQL use the optimizer trick that other DBMSs use?
Replication
Workaround
References

Never mind all the other timeouts
---------------------------------

This is a server task. The distinction between server timeouts
and "connector timeouts" is: the server does the timing. So
forget about ODBC's SQLSetConnectAttr, etc.

The title says "timeouts on queries".
Therefore it has nothing to do with wait_timeout, interactive_timeout,
net_read_timeout, innodb_lock_wait_timeout, delayed_insert_timeout,
NDBCluster TransactionDeadlockDetectionTimeout or TransactionInactiveTimeout,
slave_net_timeout, table_lock_wait_timeout, or storage engines that
have timeouts as a means of deadlock resolution.

What does "time" mean?
----------------------

Another DBMS has a limit by "CPU time" (CPU_PER_CALL and CPU_PER_SESSION).
I've read that "Elapsed time of the query tells you nothing about
the actual CPU time"
(http://www.dbasupport.com/forums/archive/index.php/t-23463.html).
To Peter Gulutzan that means that the DBMS is checking per-CPU user-mode
time and not kernel-mode time. (There's explicit mention of this distinction
in discussion of "CPU statistics".) CPU time is usually (always?)
available with an operating system call, but I don't see why
the other DBMS distinguishes. (They also have a maximum IDLE_TIME but
I'd say that's more comparable to MySQL's wait_timeout.)
A DBMS which uses processes can get meaningful information
from getrusage().

Recommendation: "time" will mean "elapsed time".

A timeout on what?
------------------

Exactly what is it that times out?

Perhaps: a transaction. But transaction time can include a lot
of idle time, in which case there's no point in limiting it.

Perhaps: a stored procedure. Per-Erik Martin said several times
that there should be a time limit at the procedure level -- if
every statement in the procedure is short, but the procedure is
in a loop, then you need timeouts at the procedure level. There
was some discussion of this in Zelenogorsk; one idea was to
make timeout an optional "characteristic" clause.
The timeout would only apply for the top level of a stored
procedure, so a procedure that's invoked from another routine
doesn't count.

Perhaps: a compound statement within a stored procedure, that
is, a BEGIN/END block. Peter suggested once that BEGIN ATOMIC
should be the signal that a timeout applies for the whole
block, rather than its individual statements. That statement
is applicable for long_query_timeout too, the variable that
controls whether writing happens to the slow log.

Perhaps: a statement. The title of this worklog task says
"queries", but many people use "query" as a synonym for
"statement". Probably this is what most people will think of.

Perhaps: a SELECT statement. Dmitri Lenev would like to limit
timeouts to SELECT only, see his reasoning in WL#2814.
But there are many statements that don't write (e.g. FETCH
and SHOW). And there are some some storage engines
which can abort a statement smoothly even if it's updating.

Perhaps: a part of a statement. This would allow for timeouts
on subqueries and on statements within triggers or functions
-- "statements" in a sense, but not independent statements
like those issued on a command line.

Recommendation: timeout will be possible for any top-level
statements; atomic compound statements count as statements;
there is only one item to check: statement timeout.

Starting when?
--------------

For long_query_time, "the time to acquire the initial table
locks is not counted as execution time", says the manual.
If someone thinks this is a good idea, then statement
timeout should start at that time too.

Recommendation: start timing "immediately" after receiving
the statement text.

Where can the user set the timeout?
-----------------------------------

1. Dmitri Lenev's suggestion is to put the specification in
the statement, for example (see WL#2814):
SELECT SQL_TIMEOUT = 30 * FROM foo.bar;
(Presumably it could be either a variable or a literal after '='.)
This is perfectly okay as long as one rejects the recommendations
in the last section. But if other statements can time out, there
has to be a TIMEOUT clause in the syntax for all the other
statements too, including BEGIN.

2. Another way to do it is with session and global variables, e.g.:
SET @@STATEMENT_TIMEOUT = 30;
This has the advantage that a user with SUPER privileges could
control the timeout for all statements at arbitrary times.

This task's high-level description says the customer wants timeouts
for "a given user", so alternatives 1. and 2. should be unacceptable.

3. The way another DBMS does it is with CREATE PROFILE or ALTER PROFILE.
These statements can have a setting for controlling maximum CPU
time for each "profile", and a "profile" is an object that can
be associated with a user. Thus each user or group of users can
have a different number for the statement timeout. MySQL could
have profiles too, but it doesn't plan to, one could accomplish
the same effect by allowing a ROLE to have a statement timeout.

4. Yet another way to do it is with CREATE USER or ALTER USER.
This has some of the advantages of the "profile" setting, and
MySQL already has per-user maxima, like "max_updates" in
mysql.user. Possible syntax is CREATE USER ... STATEMENT_TIMEOUT=30.
(MaxDB also has CREATE USER ... TIMEOUT but that's for idle
time not running time.) Or GRANT STATEMENT_TIMEOUT=30 ....
But it's easiest to say: there's a new column in mysql.user
named statement_timeout, and if you can edit it, you can change
the timeout values. There is no guarantee that changing the
value will affect statements that are already running. The
timeout checking will occur for all statements in all sessions
that this user is running; exact check on host name and user name is
necessary; the check is for the running user and not for
a user in the "DEFINER=" clause of a procedure.

The name statement_timeout happens to be the name that
PostgreSQL uses. There is no need to make it a reserved word.

Recommendation: add a column in mysql.user.

What values can be in statement_timeout?
----------------------------------------

Regardless of the way that you set it, you'll end up with some
sort of numeric "variable" that has the timeout.

Maybe it's an integer. That's probably what most people expect.

Maybe it's decimal or float. This is useful if the unit is
"seconds" but there's a desire for fractions of seconds.
Fractions are good because:
- Another DBMS can set timeout for 1/100s
- PostgreSQL can set timeout for 1/1000s
- This would be a step toward WL#1338.
  According to WL#1338 "Change timers in MySQL to have better resolution",
  We have another customer (not the same as the one mentioned in the HLD
  but again you can find the customer name in the progress report for
  2009-02-09) which has asked for sub-second connect timeout (apparently).
Such precision is also possible by saying the statement_timeout
variable units are not seconds but fractions of seconds.

There must be some way to represent "infinity" (no timeout).
That's possible by saying 0 = infinity (see WL#1003), but
that's contradictory: zero should mean zero. So let infinity
be the maximum possible value for the data type, or let
infinity be -1, in which case the variable is signed.

Recommendation: the variable is a 64-bit signed integer,
the units are cycles, default is -1, and -1 means infinity.

What happens when timeout happens?
----------------------------------

Perhaps the statement should not terminate but go into a low
priority, so that it sleeps if there are other jobs running.
But nobody has suggested that seriously.

So timeout will cause termination, statement rollback if
possible (or rollback to an implicit savepoint if the
termination is of a procedure), and an error message.
The SQLSTATE value could be '61000' or 'S1T00'.
For inspiration, look at another DBMS's error messages.
(SQLSTATE is '61000' i.e. shared server and detached process errors.)

This error should be subject to handling. That is, if the
statement times out while in a stored procedure, and there
is a handler, then the handler catches the error. Perhaps
that seems obvious, but errors are not handled for a
statement which is KILLed.

Recommendation: there's an error, it's subject to handling.

How does the server detect timeout?
-----------------------------------

The best way of finding out "what time is it" is a subject
of argument, let's just say that gettimeofday() can be slow
but it's acceptable if it only occurs a few thousand times
per second. See the timing charts for WL#2373.

There are two main ways to detect, which I'll call
inline checkers and separate-thread alarmists.

With an inline checker, a piece of code appears in many
places for the running thread, asking: has the time elapsed?
If so, return error. Advantages of an inline checker are:
- it can be very precise
- checking code is outside critical+unstoppable sections
- code exists anyway to check for 'kill'
Disadvantages are:
- code must appear in many places
- MySQL can't force all storage engine makers to add code
- a bug in the server code, with a tight loop, will be missed.

With a separate-thread alarmist, the timing is done not
within the code that's processing the statement, but in
a thread for background maintenance, possibly a thread
dedicated to timeout checking. This thread can stop the
running process. Advantages of a separate-thread alarmist are: 
- only one piece of small non-repetitive code goes in
- loops in the server can't be missed
- Dmitri Lenev has already written some code
Disadvantages are:
- the thread that's running the process won't yield to the
  monitoring thread except (usually, vaguely) every 1/100
  second, so it's impossible to use a timeout value with
  a precision smaller than 1/100 second
- Holyfoot (Alexey Botchkov) once said:
  "Actually embedded server doesn't use alarm_thread at all.
  I don't know about compilations when one knowingly specifies
  DONT_USE_THR_ALARM,
  but there's the line
  #define DONT_USE_THR_ALARM 1
  in include/config-netware.h.
  So probably it's not used on netware"
  And Currently MySQL's connection times aren't working for Darwin:
  http://dev.mysql.com/doc/refman/5.1/en/mac-os-x-10-x.html
  or for FreeBSD with LinuxThreads:
  http://dev.mysql.com/doc/refman/5.1/en/freebsd.html
- stopping another thread can disrupt sections of code that
  the coders assumed were unstoppable, with unsafe consequence
- see BUG#54135 "setQueryTimeout unsafe across VIP".

Recommendation: use inline checking if the statement timeout
value has fractional seconds, but have a separate-thread alarmist
too, so that if the inline checking fails (doesn't stop for a
full second after the timeout expiry), it can still be stopped.

Should MySQL use the optimizer trick that other DBMSs use?
----------------------------------------------------------

Here's a clever thing:

In another DBMS, you can set a variable named MAX_ESTIMATED_EXEC_TIME.
Suppose it's 5 minutes. Then the statement starts, and the
optimizer estimates that the statement will take 15 minutes.
In that case: don't even bother to start.

Ditto with SQL Server 2005:
SET QUERY_GOVERNOR_COST_LIMIT = n;
means "if the statement might take longer than this, don't
even start".

So if MySQL optimizer can decide firmly that a
statement will take longer than the statement_timeout limit,
abort immediately with the usual timeout error. The error
message will say "timeout exceeded" when in truth it's only
a guess that it would be exceeded. The statement won't be
logged in the slow query log even though it's deemed "slow".

Recommendation: ask the optimizer folks what's reliable.

Replication
-----------

By the way, a slave does not bother to check for timeout.

Workaround
----------

If nobody wants to do this worklog task ...

Anybody can create an event that checks SHOW PROCESSLIST and
sends KILL to threads that are taking too long. MySQL could
provide an example.

References
----------

WL#1003 Allow wait_timeout to be unlimitted by setting to 0
WL#2814 Query timeouts (SQL_TIMEOUT option) (2814)
WL#3824 Resource Limits

Email "Re: Newly Added (by DmitriLenev): Query timeouts (SQL_TIMEOUT option) (2814)"
[mysql intranet]/secure/mailarchive/mail.php?folder=5&mail=42031

Email "[Fwd: implementing query timeouts]" (7 messages in thread)
[mysql intranet]/secure/mailarchive/mail.php?folder=5&mail=15473

Email "0 = infinity"
[mysql intranet]/secure/mailarchive/mail.php?folder=4&mail=22243

Oracle CREATE PROFILE:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_6010.htm

PostgreSQL Client Connection Defaults (including statement_timeout)
http://www.postgresql.org/docs/8.2/interactive/runtime-config-client.html

SQL Server's SET LOCK TIMEOUT
http://msdn2.microsoft.com/en-us/library/ms189470.aspx

SQL Server's SET QUERY_GOVERNOR_COST_LIMIT
http://msdn2.microsoft.com/en-us/library/ms176100.aspx

BUG#3654 ODBC timeouts doesn't work with MyODBC
BUG#9906 Connector/J: QueryTimeout is ignored in Statement
BUG#28481 No Query Timeouts (which also asks for timeouts per procedure)
BUG#33414 Backup: DDL hangs indefinitely if ongoing backup