WL#1272: User timeouts on queries
Affects: Server-7.1 — Status: Un-Assigned — Priority: Medium
This comes from a customer. (To see the customer name, check progress report for 2009-02-09.) They would like to be able to set how long of a query a given user can use. That is, you trust some users to run queries that will go for hours, and for other users you want the queries to die if they go on longer then 20 seconds.
Contents -------- Never mind all the other timeouts What does "time" mean? A timeout on what? Starting when? Where can the user set the timeout? What values can be in statement_timeout? What happens when timeout happens? How does the server detect timeout? Should MySQL use the optimizer trick that other DBMSs use? Replication Workaround References Never mind all the other timeouts --------------------------------- This is a server task. The distinction between server timeouts and "connector timeouts" is: the server does the timing. So forget about ODBC's SQLSetConnectAttr, etc. The title says "timeouts on queries". Therefore it has nothing to do with wait_timeout, interactive_timeout, net_read_timeout, innodb_lock_wait_timeout, delayed_insert_timeout, NDBCluster TransactionDeadlockDetectionTimeout or TransactionInactiveTimeout, slave_net_timeout, table_lock_wait_timeout, or storage engines that have timeouts as a means of deadlock resolution. What does "time" mean? ---------------------- Another DBMS has a limit by "CPU time" (CPU_PER_CALL and CPU_PER_SESSION). I've read that "Elapsed time of the query tells you nothing about the actual CPU time" (http://www.dbasupport.com/forums/archive/index.php/t-23463.html). To Peter Gulutzan that means that the DBMS is checking per-CPU user-mode time and not kernel-mode time. (There's explicit mention of this distinction in discussion of "CPU statistics".) CPU time is usually (always?) available with an operating system call, but I don't see why the other DBMS distinguishes. (They also have a maximum IDLE_TIME but I'd say that's more comparable to MySQL's wait_timeout.) A DBMS which uses processes can get meaningful information from getrusage(). Recommendation: "time" will mean "elapsed time". A timeout on what? ------------------ Exactly what is it that times out? Perhaps: a transaction. But transaction time can include a lot of idle time, in which case there's no point in limiting it. Perhaps: a stored procedure. Per-Erik Martin said several times that there should be a time limit at the procedure level -- if every statement in the procedure is short, but the procedure is in a loop, then you need timeouts at the procedure level. There was some discussion of this in Zelenogorsk; one idea was to make timeout an optional "characteristic" clause. The timeout would only apply for the top level of a stored procedure, so a procedure that's invoked from another routine doesn't count. Perhaps: a compound statement within a stored procedure, that is, a BEGIN/END block. Peter suggested once that BEGIN ATOMIC should be the signal that a timeout applies for the whole block, rather than its individual statements. That statement is applicable for long_query_timeout too, the variable that controls whether writing happens to the slow log. Perhaps: a statement. The title of this worklog task says "queries", but many people use "query" as a synonym for "statement". Probably this is what most people will think of. Perhaps: a SELECT statement. Dmitri Lenev would like to limit timeouts to SELECT only, see his reasoning in WL#2814. But there are many statements that don't write (e.g. FETCH and SHOW). And there are some some storage engines which can abort a statement smoothly even if it's updating. Perhaps: a part of a statement. This would allow for timeouts on subqueries and on statements within triggers or functions -- "statements" in a sense, but not independent statements like those issued on a command line. Recommendation: timeout will be possible for any top-level statements; atomic compound statements count as statements; there is only one item to check: statement timeout. Starting when? -------------- For long_query_time, "the time to acquire the initial table locks is not counted as execution time", says the manual. If someone thinks this is a good idea, then statement timeout should start at that time too. Recommendation: start timing "immediately" after receiving the statement text. Where can the user set the timeout? ----------------------------------- 1. Dmitri Lenev's suggestion is to put the specification in the statement, for example (see WL#2814): SELECT SQL_TIMEOUT = 30 * FROM foo.bar; (Presumably it could be either a variable or a literal after '='.) This is perfectly okay as long as one rejects the recommendations in the last section. But if other statements can time out, there has to be a TIMEOUT clause in the syntax for all the other statements too, including BEGIN. 2. Another way to do it is with session and global variables, e.g.: SET @@STATEMENT_TIMEOUT = 30; This has the advantage that a user with SUPER privileges could control the timeout for all statements at arbitrary times. This task's high-level description says the customer wants timeouts for "a given user", so alternatives 1. and 2. should be unacceptable. 3. The way another DBMS does it is with CREATE PROFILE or ALTER PROFILE. These statements can have a setting for controlling maximum CPU time for each "profile", and a "profile" is an object that can be associated with a user. Thus each user or group of users can have a different number for the statement timeout. MySQL could have profiles too, but it doesn't plan to, one could accomplish the same effect by allowing a ROLE to have a statement timeout. 4. Yet another way to do it is with CREATE USER or ALTER USER. This has some of the advantages of the "profile" setting, and MySQL already has per-user maxima, like "max_updates" in mysql.user. Possible syntax is CREATE USER ... STATEMENT_TIMEOUT=30. (MaxDB also has CREATE USER ... TIMEOUT but that's for idle time not running time.) Or GRANT STATEMENT_TIMEOUT=30 .... But it's easiest to say: there's a new column in mysql.user named statement_timeout, and if you can edit it, you can change the timeout values. There is no guarantee that changing the value will affect statements that are already running. The timeout checking will occur for all statements in all sessions that this user is running; exact check on host name and user name is necessary; the check is for the running user and not for a user in the "DEFINER=" clause of a procedure. The name statement_timeout happens to be the name that PostgreSQL uses. There is no need to make it a reserved word. Recommendation: add a column in mysql.user. What values can be in statement_timeout? ---------------------------------------- Regardless of the way that you set it, you'll end up with some sort of numeric "variable" that has the timeout. Maybe it's an integer. That's probably what most people expect. Maybe it's decimal or float. This is useful if the unit is "seconds" but there's a desire for fractions of seconds. Fractions are good because: - Another DBMS can set timeout for 1/100s - PostgreSQL can set timeout for 1/1000s - This would be a step toward WL#1338. According to WL#1338 "Change timers in MySQL to have better resolution", We have another customer (not the same as the one mentioned in the HLD but again you can find the customer name in the progress report for 2009-02-09) which has asked for sub-second connect timeout (apparently). Such precision is also possible by saying the statement_timeout variable units are not seconds but fractions of seconds. There must be some way to represent "infinity" (no timeout). That's possible by saying 0 = infinity (see WL#1003), but that's contradictory: zero should mean zero. So let infinity be the maximum possible value for the data type, or let infinity be -1, in which case the variable is signed. Recommendation: the variable is a 64-bit signed integer, the units are cycles, default is -1, and -1 means infinity. What happens when timeout happens? ---------------------------------- Perhaps the statement should not terminate but go into a low priority, so that it sleeps if there are other jobs running. But nobody has suggested that seriously. So timeout will cause termination, statement rollback if possible (or rollback to an implicit savepoint if the termination is of a procedure), and an error message. The SQLSTATE value could be '61000' or 'S1T00'. For inspiration, look at another DBMS's error messages. (SQLSTATE is '61000' i.e. shared server and detached process errors.) This error should be subject to handling. That is, if the statement times out while in a stored procedure, and there is a handler, then the handler catches the error. Perhaps that seems obvious, but errors are not handled for a statement which is KILLed. Recommendation: there's an error, it's subject to handling. How does the server detect timeout? ----------------------------------- The best way of finding out "what time is it" is a subject of argument, let's just say that gettimeofday() can be slow but it's acceptable if it only occurs a few thousand times per second. See the timing charts for WL#2373. There are two main ways to detect, which I'll call inline checkers and separate-thread alarmists. With an inline checker, a piece of code appears in many places for the running thread, asking: has the time elapsed? If so, return error. Advantages of an inline checker are: - it can be very precise - checking code is outside critical+unstoppable sections - code exists anyway to check for 'kill' Disadvantages are: - code must appear in many places - MySQL can't force all storage engine makers to add code - a bug in the server code, with a tight loop, will be missed. With a separate-thread alarmist, the timing is done not within the code that's processing the statement, but in a thread for background maintenance, possibly a thread dedicated to timeout checking. This thread can stop the running process. Advantages of a separate-thread alarmist are: - only one piece of small non-repetitive code goes in - loops in the server can't be missed - Dmitri Lenev has already written some code Disadvantages are: - the thread that's running the process won't yield to the monitoring thread except (usually, vaguely) every 1/100 second, so it's impossible to use a timeout value with a precision smaller than 1/100 second - Holyfoot (Alexey Botchkov) once said: "Actually embedded server doesn't use alarm_thread at all. I don't know about compilations when one knowingly specifies DONT_USE_THR_ALARM, but there's the line #define DONT_USE_THR_ALARM 1 in include/config-netware.h. So probably it's not used on netware" And Currently MySQL's connection times aren't working for Darwin: http://dev.mysql.com/doc/refman/5.1/en/mac-os-x-10-x.html or for FreeBSD with LinuxThreads: http://dev.mysql.com/doc/refman/5.1/en/freebsd.html - stopping another thread can disrupt sections of code that the coders assumed were unstoppable, with unsafe consequence - see BUG#54135 "setQueryTimeout unsafe across VIP". Recommendation: use inline checking if the statement timeout value has fractional seconds, but have a separate-thread alarmist too, so that if the inline checking fails (doesn't stop for a full second after the timeout expiry), it can still be stopped. Should MySQL use the optimizer trick that other DBMSs use? ---------------------------------------------------------- Here's a clever thing: In another DBMS, you can set a variable named MAX_ESTIMATED_EXEC_TIME. Suppose it's 5 minutes. Then the statement starts, and the optimizer estimates that the statement will take 15 minutes. In that case: don't even bother to start. Ditto with SQL Server 2005: SET QUERY_GOVERNOR_COST_LIMIT = n; means "if the statement might take longer than this, don't even start". So if MySQL optimizer can decide firmly that a statement will take longer than the statement_timeout limit, abort immediately with the usual timeout error. The error message will say "timeout exceeded" when in truth it's only a guess that it would be exceeded. The statement won't be logged in the slow query log even though it's deemed "slow". Recommendation: ask the optimizer folks what's reliable. Replication ----------- By the way, a slave does not bother to check for timeout. Workaround ---------- If nobody wants to do this worklog task ... Anybody can create an event that checks SHOW PROCESSLIST and sends KILL to threads that are taking too long. MySQL could provide an example. References ---------- WL#1003 Allow wait_timeout to be unlimitted by setting to 0 WL#2814 Query timeouts (SQL_TIMEOUT option) (2814) WL#3824 Resource Limits Email "Re: Newly Added (by DmitriLenev): Query timeouts (SQL_TIMEOUT option) (2814)" [mysql intranet]/secure/mailarchive/mail.php?folder=5&mail=42031 Email "[Fwd: implementing query timeouts]" (7 messages in thread) [mysql intranet]/secure/mailarchive/mail.php?folder=5&mail=15473 Email "0 = infinity" [mysql intranet]/secure/mailarchive/mail.php?folder=4&mail=22243 Oracle CREATE PROFILE: http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_6010.htm PostgreSQL Client Connection Defaults (including statement_timeout) http://www.postgresql.org/docs/8.2/interactive/runtime-config-client.html SQL Server's SET LOCK TIMEOUT http://msdn2.microsoft.com/en-us/library/ms189470.aspx SQL Server's SET QUERY_GOVERNOR_COST_LIMIT http://msdn2.microsoft.com/en-us/library/ms176100.aspx BUG#3654 ODBC timeouts doesn't work with MyODBC BUG#9906 Connector/J: QueryTimeout is ignored in Statement BUG#28481 No Query Timeouts (which also asks for timeouts per procedure) BUG#33414 Backup: DDL hangs indefinitely if ongoing backup
Copyright (c) 2000, 2016, Oracle Corporation and/or its affiliates. All rights reserved.