WL#4674: PERFORMANCE_SCHEMA Setup For Actors

Affects: Server-5.6   —   Status: Complete

For monitoring Performance Schema events, we want to pre-filter
by threads and actors (users, roles). This involves:
(a) make a THREADS table showing a thread-local instrumenting flag
(b) make a SETUP_ACTORS table specifying initial THREADS settings.

THREADS table
-------------

THREADS has a row for each active server thread.

Column name                  Data type
-----------                  ---------

THREAD_ID                    INTEGER
NAME                         VARCHAR(128)
TYPE                         VARCHAR(10)
PROCESSLIST_ID               INTEGER /* not BIGINT as in information_schema */
PROCESSLIST_USER             CHAR(16)
PROCESSLIST_HOST             CHAR(60)
PROCESSLIST_DB               VARCHAR(64)
PROCESSLIST_COMMAND          VARCHAR(16)
PROCESSLIST_TIME             BIGINT
PROCESSLIST_STATE            VARCHAR(64)
PROCESSLIST_INFO             LONGTEXT
PARENT_THREAD_ID             INTEGER
ROLE                         CHAR(16) value currently irrelevant
INSTRUMENTED                 ENUM('YES','NO')

THREAD_ID is the unique identifier for a THREADS row.
If a thread dies, we don't expect its THREAD_ID value will be re-used.

NAME is Marc's idea. Sample values are "thread/sql/OneConnection",
"thread/sql/Main". A "OneConnection" thread will always have a
non-null PROCESSLIST_ID value. Perhaps it will be useful to identify
background threads with NAME. NAME is the only way to differentiate
between background threads.

TYPE is BACKGROUND or FOREGROUND.
(Someday there might also be multiple subthreads, for example with
partitions, but there is no need at this time for a
MAIN / SUBTHREAD property.)
The NAME column could also use naming like:
"thread/foreground/sql/one_connection"
"thread/background/sql/main"
"thread/background/maria/check_point"
so that the type is the second part of the name.

PROCESSLIST_ columns are:
  (if name like '%OneConnection') all the same as what you'd get if you
  joined with INFORMATION_SCHEMA.PROCESSLIST. We want to avoid joins
  with PROCESSLIST.
  (otherwise / background thread) PROCESSLIST_ID is NULL, PROCESSLIST_USER
  and PROCESSLIST_HOST may be NULL, but some of the other columns
  PROCESSLIST_DB, PROCESSLIST_COMMAND, PROCESSLIST_TIME,
 PROCESSLIST_STATE, PROCESSLIST_INFO will very likely be populated.
THREADS makes PROCESSLIST redundant unless performance schema is off.

Re PROCESSLIST_USER: Preferably this refers to USER(), the user who
connected, not the user we might be temporarily acting as (with
definer's rights).
What if something is a background thread? Then PROCESSLIST_ID is NULL,
PROCESSLIST_USER is ''. (Well, Marc and Peter don't agree about what a
background thread's PROCESSLIST_USER value is yet.)

What if something is a subthread? Then, as noted earlier, PROCESSLIST_ID
is NULL, but PROCESSLIST_USER and PROCESSLIST_HOST may be the same as in
the parent thread. (This makes it easy to identify all threads
associated with a given user, without search complexity.)
Other columns -- PROCESSLIST_STATE PROCESSLIST_INFO PROCESSLISTS_TIME etc. --
may have values which differ from the values in the parent thread.

Re ROLE: There are no roles. It seems necessary to mention roles
because there's a chance WL#988 Roles will be in some future release.
We only anticipate a ROLE requirement; we won't have it initially.

INSTRUMENTED is 'YES' if monitoring is enabled for this thread
because it is unchanged from its initial value (which is
determined by an entry in SETUP_ACTORS), or because somebody said
UPDATE THREADS SET INSTRUMENTED = 'YES'.
This is the only updatable column in the THREADS table.
(It would be confusing to use the word ENABLED rather than INSTRUMENTED,
because a column THREADS.ENABLED with a value 'YES' should mean "the
thread is enabled".)

There are many other columns we could add here, taken from SHOW STATUS
or taken from other PERFORMANCE_SCHEMA tables to show initial/current
values. See discussion in WL#2360 about "pre-joined" columns. One thing
we thought of adding initially was EVENT_NAME (or OBJECT_NAME), but we won't.

As with other PERFORMANCE_SCHEMA tables, the implementor must
ensure that accesses are done without mutexes. This will be an
improvement over the way that INFORMATION_SCHEMA.PROCESSLIST works.

SETUP_ACTORS table
------------------

SETUP_ACTORS contains essential connection characteristics
which can be used to determine at connect time whether a
thread's instrumentation should be enabled.

`Host` char(60) COLLATE utf8_bin NOT NULL DEFAULT '%'      
`User` char(16) COLLATE utf8_bin NOT NULL DEFAULT '%'
`ROLE` char(16) COLLATE utf8_bin DEFAULT '%'

`Host` may identify a client's host computer in the usual way,
or may be '%' meaning 'any host'. Data type is as in mysql.user.

`User` may identify a client's login id in the usual way,
or may be '%' meaning 'any user'. Data type is as in mysql.user.

ROLE may be anything. We want to have it for when we add
roles (WL#988), but we currently ignore it.

The choice of utf8_bin is significant. It means that
we will distinguish by case. User 'x' and user 'X' are not the same.

The maximum number of rows is 10. (This is arbitrary but if it
is not sufficient it is changeable at server startup time.)

The initial number of rows is 1. The initial row always looks like this:
mysql> select * from performance_schema.SETUP_ACTORS;
+------+------+------+
| Host | User | ROLE |
+------+------+------+
| %    | %    | %    |
+------+------+------+
1 row in set (0.01 sec)
As we will see, this means "all new connections should have instruments
enabled".

Users with appropriate privileges may INSERT, UPDATE, or DELETE rows.

MySQL determines at "connect" time whether a new thread matches a
row in SETUP_ACTORS. There's effectively a check on a new THREADS row's
columns:

CREATE FUNCTION matches (threads_host CHAR(60), threads_user CHAR(16))
BEGIN
  DECLARE setup_actors_host CHAR(60);
  DECLARE setup_actors_user CHAR(16);
  DECLARE c CURSOR FOR SELECT HOST, USER FROM SETUP_ACTORS;
  DECLARE EXIT CONDITION FOR NOT FOUND SET return FALSE;
  open c;
  WHILE TRUE DO
    FETCH c INTO setup_actors_host, setup_actors_user;
    IF threads_host LIKE setup_actors_host
    AND threads_user LIKE setup_user
    THEN RETURN TRUE;
    END WHILE;
  END
This is only intended to be "effectively" and "approximately" true.
It should not be interpreted as a requirement that there be a full-table
scan; the implementor may search for a better means of searching.
The illustration merely shows that the implementation should find a row
in SETUP_ACTORS which matches both thread host and thread user (which means
thread instrumentation should be enabled), or should fail to find
such a row (which means thread instrumentation should not be enabled).

Suppose that someone says
DELETE FROM SETUP_ACTORS;
INSERT INTO SETUP_ACTORS VALUES ('joe','localhost','%'); /* Row #1 */
INSERT INTO SETUP_ACTORS VALUES ('sam','%','%');         /* Row #2 */
There will be a match if user joe@localhost connects, due to Row #1.
There will be a match if user sam@mysql.net connects, due to Row #2.
There will be no match if joe@mysql.net or samuel@localhost connects.

The effect is:
* For any new connection, there is an effective SET clause
  UPDATE THREADS
  SET INSTRUMENTED =
  CASE WHEN matches(PROCESSLIST_HOST,PROCESSLIST_USER) THEN 'YES'
  ELSE 'NO'
  END
  WHERE THREAD_ID = thread_id_of_the_new_connection;
* This applies for sub-threads too because they have the same
  PROCESSLIST_USER value as the main thread.

Updating SETUP_ACTORS does not cause any change in existing
THREADS rows. There is no need for such automatic action. If you
want to change both existing and new threads, say (in this order):
INSERT INTO performance_schema.SETUP_ACTORS VALUES ('Joe','localhost');
UPDATE performance_schema.THREADS
SET INSTRUMENTED =
  CASE WHEN PROCESSLIST_HOST LIKE 'localhost'
    AND PROCESSLIST_USER LIKE 'Joe' THEN 'YES'
  ELSE 'NO'
  END;

Putting it another way:
Values of USER, HOST, and ROLE are known at connection time
(that's a must). If they change later, that doesn't affect
monitoring of existing connections.

Notice that there is no need, while updating SETUP_ACTORS,
to test whether there is a user named 'Joe'.
He may come into existence later, or disappear.
If a running thread has PROCESSLIST_USER='Joe', then MySQL does not care.
If no running thread has these identifiers, that's not an error.
However: TIP: one can make a fixed SETUP_ACTORS list by selecting
Host and User from mysql.user.

Overlap is possible, for example one row could have {host='%',user='Joe'}'
and another row could have {host='localhost','user='%'}, which means
joe@localhost could match either row. That's okay, the
instrument will only count once.

We don't need an ENABLED column to say whether matching causes
monitoring this or not, because it's always ENABLED. There's no
way via SETUP_ACTORS to say "if user=joe@localhost, DISABLE instrumenting".

There is a limit on the size of SETUP_ACTORS, which is good, because
there's some performance impact. The minimal impact occurs when
there's one row with Host=User=Role=%. So that's the initial
value, and it can't be changed with a mysqld --option, and it's
what we recommend.

Alternative names for SETUP_ACTORS table:
AGENTS, TASKS, THREADS.

Rejected Alternatives
---------------------

Rejected:
The original suggestion involved no THREADS table and a SETUP_ACTORS
table which had a different view of the world.
Its visible if you look at 'Progress' before a dae in March 2009.

Rejected:
Peter suggested a single-row table with a CASE clause, for example
UPDATE performance_schema.THREADS_CASE
SET CASE_CLAUSE =
  "CASE WHEN PROCESSLIST_USER LIKE 'JOE' THEN 'YES'
  ELSE 'NO'
  END";
but this could give users the wrong idea, that matching can happen
on any expression.

No matrix
---------

There were four ways to merge "List of Actors" (THREADS+THREADS_CASE)
with "List of Instruments" (SETUP_INSTRUMENTS).
1. List of Actors + List of Instruments
2. List of Actors and Instruments
3. List of Instruments
4. List of Instruments, with session/global flags
See email thread
[MySQL internal archive]/secure/mailarchive/mail.php?folder=4&mail=28053
Sergei Golubchik likes "2.".
Peter Gulutzan and Marc Alff like "1.".
Dean Ellis and Mark Leith will accept "1." now but want "2." later.
Therefore we will do "1." and will treat "2." as a feature request
(WL#4868 PERFORMANCE_SCHEMA Setup For Actors Matrix).

As explained for SETUP_INSTRUMENTS, instruments only work if both:
THREADS.INSTRUMENTED = 'YES', that is, this running task is
being monitored,
SETUP.INSTRUMENTS.ENABLED='YES', that is, this piece of code is
being monitored.
A way to think of the difference between actors and instruments is:
for instruments the question is "where am I?",
(possible answer "you're in a routine waiting for a mutex"),
for actors the question is "who am I?",
(possible answer "you're Joe the accountant who's causing thread #55").

Thus there is no way to:
Set SETUP_INSTRUMENTS.TIMED='Yes' for one thread but not for another thread
(that is, if thread 13 and thread 17 are both in SETUP_ACTORS, and
INSTRUMENTS.TIMED applies to both).
Set SETUP_INSTRUMENTS.TRACE_ENABLED='yes' for a particular thread
(if we ever have such a column).

See also support-private thread
"WL#260 Performance session_waits, requirements"
starting with
[MySQL internal archive]/secure/mailarchive/mail.php?folder=52&mail=26044

See also dev-private threads starting with
""WL#2360 Performance session_waits, ..."
for November/December 2008.

Privileges
----------

Changing SETUP_ACTORS affects whether THREADS.INSTRUMENTED
goes on for thread #x, which is another connection's thread.
But it does not affect whether you can observe the results of
the instrumenting. To observe other processes you need PROCESS.

Effectively, there is a script which says
GRANT SELECT ON THREADS TO [user];
GRANT UPDATE (INSTRUMENTED) ON THREADS TO [user];
GRANT INSERT,UPDATE,DELETE ON SETUP_ACTORS to [user];

Scenarios
---------

1. Joe's job is slow.
We want to start monitoring his thread(s) to see what's happening.
To maximize the information that we'll get specifically about
Joe's job, we want to turn monitoring off for all other threads.
UPDATE performance_schema.THREADS
SET INSTRUMENTED =
CASE WHEN PROCESSLIST_USER='Joe' THEN 'YES' ELSE 'NO' END;

2. We expect all Joe's jobs to be slow. Whenever he logs on, for the
duration of his session, we want his low-level activity monitored.
INSERT INTO performance_schema.SETUP_ACTORS VALUES ('Joe','%','%');

3. We suspect that something's wrong with a background task, because
most users are idle and yet some operating-system utility suggests
CPU activity is high. Remember that background tasks don't show as
'connections' and not all are visible via SHOW PROCESSLIST.
UPDATE performance_schema.THREADS
SET INSTRUMENTED =
CASE WHEN TYPE='BACKGROUND' THEN 'YES' ELSE 'NO' END;

Remember these scenarios only matter for instruments whose
SETUP_INSTRUMENTS.ENABLED column has 'YES'.

4. We want to restore the threads to their initial setting.
(This is no longer possible due to specification changes 2010-05-07.)

5. We want to restore SETUP_ACTORS to its initial setting.
DELETE FROM SETUP_ACTORS;
INSERT INTO SETUP_ACTORS VALUES ('%','%','%');

The thread's real "instrumentation enabled" flag
------------------------------------------------

We use THREADS to set an internal flag for every thread.
We'll call it 'ENABLED_FLAG'.
This flag -- think of it as a "thread#x is enabled for instrumenting"
flag -- can change in two circumstances and only in two circumstances:
1 The thread is starting, for example because a user is connecting
  or a thread is spawning a subthread.
2 The user is executing an "UPDATE THREADS ..." statement.

These circumstances do not cause an immediate change to 'ENABLED_FLAG':
* Identification is changing, for example SET ROLE happens.
* The user is inserting, updating, or deleting any SETUP_ACTORS row.
Such circumstances affect what 'ENABLED_FLAG' will be when a new
connect happens, but they do not change an existing value.

The method is up to the implementor. Effectively we expect this:

If (user successfully updates SETUP_ACTORS)
  Keep a simplified set of conditions that apply for
  a new thread's PROCESSLIST_HOST and PROCESSLIST_USER columns.

If (new connection is happening) or (re-connection is happening)
  If (matches(PROCESSLIST_HOST,PROCESSLIST_USER) returns TRUE)
    THREADS.INSTRUMENTED = 'YES'
    Turn on thread's 'ENABLED_FLAG'

If (subthread is being spawned)
  Copy PROCESSLIST column values from the thread whence the spawning
  happens.

If (user successfully updates THREADS)
  For (each changed thread)
    Set 'ENABLED_FLAG' = true iff INSTRUMENTED = 'YES'.
    /* So instrumenting via update statement overrides, supersedes
       any setting done at connection time as caused by
       SETUP_ACTORS settings. */
    /* Enabling for a main thread doesn't automatically enable for
       its children. */

For example, when a thread starts, it will determine: "I am user u."
Then it will scan the SETUP_ACTORS rows. If it finds "truth" anywhere,
then it knows "I am instrumented", it sets a flag to remember that,
and it ceases scanning.

If (writing performance-structure fields), the instrumentation will
have to include code like:
if (performance monitoring is enabled)
and (this instrument is enabled due to SETUP_INSTRUMENTS.ENABLED='YES')
and (this thread's 'ENABLED_FLAG' is true)
  start instrumenting
  do code
  stop instrumenting
else
  do code

References
----------

Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)
http://lists.mysql.com/commits/78433

The code in bzr mysql-next-mr-perfschema contains the low-level
structures and main algorithms, commented and set
apart from other code with if/endif, to support the
situation as described in the HLS.
That suffices as an LLD for a task of this nature.