WL#4674: PERFORMANCE_SCHEMA Setup For Actors
Affects: Server-5.6
—
Status: Complete
For monitoring Performance Schema events, we want to pre-filter by threads and actors (users, roles). This involves: (a) make a THREADS table showing a thread-local instrumenting flag (b) make a SETUP_ACTORS table specifying initial THREADS settings.
THREADS table ------------- THREADS has a row for each active server thread. Column name Data type ----------- --------- THREAD_ID INTEGER NAME VARCHAR(128) TYPE VARCHAR(10) PROCESSLIST_ID INTEGER /* not BIGINT as in information_schema */ PROCESSLIST_USER CHAR(16) PROCESSLIST_HOST CHAR(60) PROCESSLIST_DB VARCHAR(64) PROCESSLIST_COMMAND VARCHAR(16) PROCESSLIST_TIME BIGINT PROCESSLIST_STATE VARCHAR(64) PROCESSLIST_INFO LONGTEXT PARENT_THREAD_ID INTEGER ROLE CHAR(16) value currently irrelevant INSTRUMENTED ENUM('YES','NO') THREAD_ID is the unique identifier for a THREADS row. If a thread dies, we don't expect its THREAD_ID value will be re-used. NAME is Marc's idea. Sample values are "thread/sql/OneConnection", "thread/sql/Main". A "OneConnection" thread will always have a non-null PROCESSLIST_ID value. Perhaps it will be useful to identify background threads with NAME. NAME is the only way to differentiate between background threads. TYPE is BACKGROUND or FOREGROUND. (Someday there might also be multiple subthreads, for example with partitions, but there is no need at this time for a MAIN / SUBTHREAD property.) The NAME column could also use naming like: "thread/foreground/sql/one_connection" "thread/background/sql/main" "thread/background/maria/check_point" so that the type is the second part of the name. PROCESSLIST_ columns are: (if name like '%OneConnection') all the same as what you'd get if you joined with INFORMATION_SCHEMA.PROCESSLIST. We want to avoid joins with PROCESSLIST. (otherwise / background thread) PROCESSLIST_ID is NULL, PROCESSLIST_USER and PROCESSLIST_HOST may be NULL, but some of the other columns PROCESSLIST_DB, PROCESSLIST_COMMAND, PROCESSLIST_TIME, PROCESSLIST_STATE, PROCESSLIST_INFO will very likely be populated. THREADS makes PROCESSLIST redundant unless performance schema is off. Re PROCESSLIST_USER: Preferably this refers to USER(), the user who connected, not the user we might be temporarily acting as (with definer's rights). What if something is a background thread? Then PROCESSLIST_ID is NULL, PROCESSLIST_USER is ''. (Well, Marc and Peter don't agree about what a background thread's PROCESSLIST_USER value is yet.) What if something is a subthread? Then, as noted earlier, PROCESSLIST_ID is NULL, but PROCESSLIST_USER and PROCESSLIST_HOST may be the same as in the parent thread. (This makes it easy to identify all threads associated with a given user, without search complexity.) Other columns -- PROCESSLIST_STATE PROCESSLIST_INFO PROCESSLISTS_TIME etc. -- may have values which differ from the values in the parent thread. Re ROLE: There are no roles. It seems necessary to mention roles because there's a chance WL#988 Roles will be in some future release. We only anticipate a ROLE requirement; we won't have it initially. INSTRUMENTED is 'YES' if monitoring is enabled for this thread because it is unchanged from its initial value (which is determined by an entry in SETUP_ACTORS), or because somebody said UPDATE THREADS SET INSTRUMENTED = 'YES'. This is the only updatable column in the THREADS table. (It would be confusing to use the word ENABLED rather than INSTRUMENTED, because a column THREADS.ENABLED with a value 'YES' should mean "the thread is enabled".) There are many other columns we could add here, taken from SHOW STATUS or taken from other PERFORMANCE_SCHEMA tables to show initial/current values. See discussion in WL#2360 about "pre-joined" columns. One thing we thought of adding initially was EVENT_NAME (or OBJECT_NAME), but we won't. As with other PERFORMANCE_SCHEMA tables, the implementor must ensure that accesses are done without mutexes. This will be an improvement over the way that INFORMATION_SCHEMA.PROCESSLIST works. SETUP_ACTORS table ------------------ SETUP_ACTORS contains essential connection characteristics which can be used to determine at connect time whether a thread's instrumentation should be enabled. `Host` char(60) COLLATE utf8_bin NOT NULL DEFAULT '%' `User` char(16) COLLATE utf8_bin NOT NULL DEFAULT '%' `ROLE` char(16) COLLATE utf8_bin DEFAULT '%' `Host` may identify a client's host computer in the usual way, or may be '%' meaning 'any host'. Data type is as in mysql.user. `User` may identify a client's login id in the usual way, or may be '%' meaning 'any user'. Data type is as in mysql.user. ROLE may be anything. We want to have it for when we add roles (WL#988), but we currently ignore it. The choice of utf8_bin is significant. It means that we will distinguish by case. User 'x' and user 'X' are not the same. The maximum number of rows is 10. (This is arbitrary but if it is not sufficient it is changeable at server startup time.) The initial number of rows is 1. The initial row always looks like this: mysql> select * from performance_schema.SETUP_ACTORS; +------+------+------+ | Host | User | ROLE | +------+------+------+ | % | % | % | +------+------+------+ 1 row in set (0.01 sec) As we will see, this means "all new connections should have instruments enabled". Users with appropriate privileges may INSERT, UPDATE, or DELETE rows. MySQL determines at "connect" time whether a new thread matches a row in SETUP_ACTORS. There's effectively a check on a new THREADS row's columns: CREATE FUNCTION matches (threads_host CHAR(60), threads_user CHAR(16)) BEGIN DECLARE setup_actors_host CHAR(60); DECLARE setup_actors_user CHAR(16); DECLARE c CURSOR FOR SELECT HOST, USER FROM SETUP_ACTORS; DECLARE EXIT CONDITION FOR NOT FOUND SET return FALSE; open c; WHILE TRUE DO FETCH c INTO setup_actors_host, setup_actors_user; IF threads_host LIKE setup_actors_host AND threads_user LIKE setup_user THEN RETURN TRUE; END WHILE; END This is only intended to be "effectively" and "approximately" true. It should not be interpreted as a requirement that there be a full-table scan; the implementor may search for a better means of searching. The illustration merely shows that the implementation should find a row in SETUP_ACTORS which matches both thread host and thread user (which means thread instrumentation should be enabled), or should fail to find such a row (which means thread instrumentation should not be enabled). Suppose that someone says DELETE FROM SETUP_ACTORS; INSERT INTO SETUP_ACTORS VALUES ('joe','localhost','%'); /* Row #1 */ INSERT INTO SETUP_ACTORS VALUES ('sam','%','%'); /* Row #2 */ There will be a match if user joe@localhost connects, due to Row #1. There will be a match if user sam@mysql.net connects, due to Row #2. There will be no match if joe@mysql.net or samuel@localhost connects. The effect is: * For any new connection, there is an effective SET clause UPDATE THREADS SET INSTRUMENTED = CASE WHEN matches(PROCESSLIST_HOST,PROCESSLIST_USER) THEN 'YES' ELSE 'NO' END WHERE THREAD_ID = thread_id_of_the_new_connection; * This applies for sub-threads too because they have the same PROCESSLIST_USER value as the main thread. Updating SETUP_ACTORS does not cause any change in existing THREADS rows. There is no need for such automatic action. If you want to change both existing and new threads, say (in this order): INSERT INTO performance_schema.SETUP_ACTORS VALUES ('Joe','localhost'); UPDATE performance_schema.THREADS SET INSTRUMENTED = CASE WHEN PROCESSLIST_HOST LIKE 'localhost' AND PROCESSLIST_USER LIKE 'Joe' THEN 'YES' ELSE 'NO' END; Putting it another way: Values of USER, HOST, and ROLE are known at connection time (that's a must). If they change later, that doesn't affect monitoring of existing connections. Notice that there is no need, while updating SETUP_ACTORS, to test whether there is a user named 'Joe'. He may come into existence later, or disappear. If a running thread has PROCESSLIST_USER='Joe', then MySQL does not care. If no running thread has these identifiers, that's not an error. However: TIP: one can make a fixed SETUP_ACTORS list by selecting Host and User from mysql.user. Overlap is possible, for example one row could have {host='%',user='Joe'}' and another row could have {host='localhost','user='%'}, which means joe@localhost could match either row. That's okay, the instrument will only count once. We don't need an ENABLED column to say whether matching causes monitoring this or not, because it's always ENABLED. There's no way via SETUP_ACTORS to say "if user=joe@localhost, DISABLE instrumenting". There is a limit on the size of SETUP_ACTORS, which is good, because there's some performance impact. The minimal impact occurs when there's one row with Host=User=Role=%. So that's the initial value, and it can't be changed with a mysqld --option, and it's what we recommend. Alternative names for SETUP_ACTORS table: AGENTS, TASKS, THREADS. Rejected Alternatives --------------------- Rejected: The original suggestion involved no THREADS table and a SETUP_ACTORS table which had a different view of the world. Its visible if you look at 'Progress' before a dae in March 2009. Rejected: Peter suggested a single-row table with a CASE clause, for example UPDATE performance_schema.THREADS_CASE SET CASE_CLAUSE = "CASE WHEN PROCESSLIST_USER LIKE 'JOE' THEN 'YES' ELSE 'NO' END"; but this could give users the wrong idea, that matching can happen on any expression. No matrix --------- There were four ways to merge "List of Actors" (THREADS+THREADS_CASE) with "List of Instruments" (SETUP_INSTRUMENTS). 1. List of Actors + List of Instruments 2. List of Actors and Instruments 3. List of Instruments 4. List of Instruments, with session/global flags See email thread [MySQL internal archive]/secure/mailarchive/mail.php?folder=4&mail=28053 Sergei Golubchik likes "2.". Peter Gulutzan and Marc Alff like "1.". Dean Ellis and Mark Leith will accept "1." now but want "2." later. Therefore we will do "1." and will treat "2." as a feature request (WL#4868 PERFORMANCE_SCHEMA Setup For Actors Matrix). As explained for SETUP_INSTRUMENTS, instruments only work if both: THREADS.INSTRUMENTED = 'YES', that is, this running task is being monitored, SETUP.INSTRUMENTS.ENABLED='YES', that is, this piece of code is being monitored. A way to think of the difference between actors and instruments is: for instruments the question is "where am I?", (possible answer "you're in a routine waiting for a mutex"), for actors the question is "who am I?", (possible answer "you're Joe the accountant who's causing thread #55"). Thus there is no way to: Set SETUP_INSTRUMENTS.TIMED='Yes' for one thread but not for another thread (that is, if thread 13 and thread 17 are both in SETUP_ACTORS, and INSTRUMENTS.TIMED applies to both). Set SETUP_INSTRUMENTS.TRACE_ENABLED='yes' for a particular thread (if we ever have such a column). See also support-private thread "WL#260 Performance session_waits, requirements" starting with [MySQL internal archive]/secure/mailarchive/mail.php?folder=52&mail=26044 See also dev-private threads starting with ""WL#2360 Performance session_waits, ..." for November/December 2008. Privileges ---------- Changing SETUP_ACTORS affects whether THREADS.INSTRUMENTED goes on for thread #x, which is another connection's thread. But it does not affect whether you can observe the results of the instrumenting. To observe other processes you need PROCESS. Effectively, there is a script which says GRANT SELECT ON THREADS TO [user]; GRANT UPDATE (INSTRUMENTED) ON THREADS TO [user]; GRANT INSERT,UPDATE,DELETE ON SETUP_ACTORS to [user]; Scenarios --------- 1. Joe's job is slow. We want to start monitoring his thread(s) to see what's happening. To maximize the information that we'll get specifically about Joe's job, we want to turn monitoring off for all other threads. UPDATE performance_schema.THREADS SET INSTRUMENTED = CASE WHEN PROCESSLIST_USER='Joe' THEN 'YES' ELSE 'NO' END; 2. We expect all Joe's jobs to be slow. Whenever he logs on, for the duration of his session, we want his low-level activity monitored. INSERT INTO performance_schema.SETUP_ACTORS VALUES ('Joe','%','%'); 3. We suspect that something's wrong with a background task, because most users are idle and yet some operating-system utility suggests CPU activity is high. Remember that background tasks don't show as 'connections' and not all are visible via SHOW PROCESSLIST. UPDATE performance_schema.THREADS SET INSTRUMENTED = CASE WHEN TYPE='BACKGROUND' THEN 'YES' ELSE 'NO' END; Remember these scenarios only matter for instruments whose SETUP_INSTRUMENTS.ENABLED column has 'YES'. 4. We want to restore the threads to their initial setting. (This is no longer possible due to specification changes 2010-05-07.) 5. We want to restore SETUP_ACTORS to its initial setting. DELETE FROM SETUP_ACTORS; INSERT INTO SETUP_ACTORS VALUES ('%','%','%'); The thread's real "instrumentation enabled" flag ------------------------------------------------ We use THREADS to set an internal flag for every thread. We'll call it 'ENABLED_FLAG'. This flag -- think of it as a "thread#x is enabled for instrumenting" flag -- can change in two circumstances and only in two circumstances: 1 The thread is starting, for example because a user is connecting or a thread is spawning a subthread. 2 The user is executing an "UPDATE THREADS ..." statement. These circumstances do not cause an immediate change to 'ENABLED_FLAG': * Identification is changing, for example SET ROLE happens. * The user is inserting, updating, or deleting any SETUP_ACTORS row. Such circumstances affect what 'ENABLED_FLAG' will be when a new connect happens, but they do not change an existing value. The method is up to the implementor. Effectively we expect this: If (user successfully updates SETUP_ACTORS) Keep a simplified set of conditions that apply for a new thread's PROCESSLIST_HOST and PROCESSLIST_USER columns. If (new connection is happening) or (re-connection is happening) If (matches(PROCESSLIST_HOST,PROCESSLIST_USER) returns TRUE) THREADS.INSTRUMENTED = 'YES' Turn on thread's 'ENABLED_FLAG' If (subthread is being spawned) Copy PROCESSLIST column values from the thread whence the spawning happens. If (user successfully updates THREADS) For (each changed thread) Set 'ENABLED_FLAG' = true iff INSTRUMENTED = 'YES'. /* So instrumenting via update statement overrides, supersedes any setting done at connection time as caused by SETUP_ACTORS settings. */ /* Enabling for a main thread doesn't automatically enable for its children. */ For example, when a thread starts, it will determine: "I am user u." Then it will scan the SETUP_ACTORS rows. If it finds "truth" anywhere, then it knows "I am instrumented", it sets a flag to remember that, and it ceases scanning. If (writing performance-structure fields), the instrumentation will have to include code like: if (performance monitoring is enabled) and (this instrument is enabled due to SETUP_INSTRUMENTS.ENABLED='YES') and (this thread's 'ENABLED_FLAG' is true) start instrumenting do code stop instrumenting else do code References ---------- Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150) http://lists.mysql.com/commits/78433
The code in bzr mysql-next-mr-perfschema contains the low-level structures and main algorithms, commented and set apart from other code with if/endif, to support the situation as described in the HLS. That suffices as an LLD for a task of this nature.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.