WL#4878: PERFORMANCE_SCHEMA Trace
Affects: Server-9.x — Status: Un-Assigned — Priority: Very High
The initial description here has been shifted from the "Trace" section of WL#2360 Performance Schema. This task depends on WL#2360. "Trace" is not a requirement. However, this task description must be here to show that, with WL#2360 as a basis, we could produce trace files if we wanted. Providers could dump to a trace file as well as copy to performance-structures. It's certain to be slow (10%?) so doing it from the start would be an error. The benefit is mostly to professional performance analysts. Enabling Trace -------------- There will be a new column in SETUP_INSTRUMENTS: TRACE_ENABLED ENUM('yes','no') It's in SETUP_INSTRUMENTS rather than SETUP_CONSUMERS because it affects the instrument code path, and because we want to enable/disable depending whether the event is a WAIT, a STATEMENT, etc. The default value is 'no'. It is an error to set TRACE_ENABLED='yes' if ENABLED='no'. There will be a new column in the SETUP_ACTORS table (see WL#4674): TRACE_ENABLED ENUM('yes','no') The default value is 'no'. Use of this column is affected by the FILE privilege and --secure-file-priv. Trace file writing will occur for a particular instrumentation if: SETUP_INSTRUMENTS.TRACE_ENABLED='yes', and SETUP_ACTORS.TRACE_ENABLED='yes'. There is no maximum length of a trace file. Creating a trace file --------------------- For each new thread, or for each existing thread when a user updates a TRACE_ENABLED value: If (this thread is a thread for this setup_actors value) Set thread file name = datadir/"thread".actor.thread. For example, for datadir 'd' actor 'a' thread 1, = d/thread.a.1 Create this file. If a thread is associated with multiple actors, for example host 'h' user 'a' role 'r', MySQL will combine all names that fit the thread. One File Per Thread ------------------- We turn on tracing "per actor" but we have one file "per thread". This is necessary because we don't want two threads writing to the same file at the same time -- either thread#1 would wait for thread#2 (which is bad), or thread#1 would not wait for thread#2 (which is bad). It should be simple to make a utility which combines all per-thread files and sorts the rows by one of the timer columns (but not EVENT_ID because EVENT_ID is now a per-thread counter). Writing ------- The instrument start_wait() and instrument end_wait() code (both!) will write to the trace file. There is no attempt to fsync, flush, or whatever. You crash, you lose some of the trace. The writing looks like write(buffer); This is a good possible spot for a plugin, that is, we could replace "write(buffer)" with "plugin(buffer)" if instead of writing we wanted to examine the contents, cause a crash, send an email, etc. We'll decide not to do this. File Errors ----------- All kinds of errors can occur. For example, the create fails due to privileges missing, or the write fails due to full disk. MySQL will not display an error message and stop the statement processing. That is not what the user, who is issuing the statement, expects. MySQL will disable the trace for all threads for the current actor. Thus, users will be able to discover that a failure happened by seeing that SETUP_ACTORS.TRACE_ENABLED becomes 'no'. Thus, any error will disable tracing permanently. Content and Format ------------------ This is a text file containing lines which are equivalent to "events" rows in PERFORMANCE_SCHEMA tables. Doubtless it would be quicker to dump raw data from underlying structures, but when you ask for a trace file you've given up on speed. Doubtless it would be interesting to output things in summary or state tables, but then it wouldn't be tracing. The header row is: "/* MySQL Trace File for (actor_name) on (datetime) */". This is redundant. Users can find out the time the file was created, the actor, and the thread, by saying 'ls -l'. Each detail row takes one line and looks like an INSERT statement, with all the columns of a row of EVENTS_WAITS_CURRENT, EVENTS_STATEMENTS_CURRENT, etc. Since different lines might be for rows of different 'tables', a simple CSV format is not possible. But it's easy to produce a database file from INSERT statements. Since writing occurs both before and after the instrumentation point, everything is duplicated. Example ------- Suppose user Harry has thread 5. Harry has privileges to update performance_schema tables. Harry says: UPDATE performance_schema.SETUP_INSTRUMENTS SET enabled='yes', trace_enabled='yes'; INSERT INTO performance_schema.SETUP_ACTORS VALUES ('%','%','%','yes'); This causes creation of trace files. Harry now executes a statement SET @a=5; which happens to cause one mutex lock, no stages. As a result, there is a trace file named /usr/local/mysql/var/trace.harry.5 It contains /* MySQL Trace File for Harry on 2009-04-07 14:18:00 */ INSERT INTO events_statements_current VALUES ('SET @a=5',...); INSERT INTO events_waits_current VALUES ('mutex',...); INSERT INTO events_waits_current VALUES ('mutex',...); INSERT INTO events_statements_current VALUES ('SET @a=5',...); To distinguish between 'start' and 'end' rows, check if TIMER_END is NULL. Overhead -------- The cost of taking a performance-struc and formatting it as an INSERT statement is between 100 and 200 cycles. Most of the time is string-move instructions. The cost of invoking a write function is about 20 cycles on Peter's x86. The writing is deferred so you don't immediately see the actual cost of writing to disk. We have Kelly Long's test for "how many transactions per second on a loaded system". We'll run it with "trace on and no update of performance schema tables" versus "trace off and update of performance schema tables". mysqld option ------------- Marc Alff pointed out: if one can enable a trace file with a mysqld option, there's no need to worry about adding dozens more mysqld options that would enable particular performance schema instruments for analyzing operations that happen before the first user connects, for example crash recoveries. For this, we could say that, if the server started with mysqld --performance-schema-trace-enabled=true, then any instrument which is 'enabled' is considered to be 'trace_enabled'. More settings ------------- We've said that there is no maximum length for a trace file, and said that the destination is datadir/"thread".actor.thread. But we know that users sometimes need to choose such things somehow. So we could say that these things are (a) specifiable in SETUP_INSTRUMENTS, (b) specifiable via global variables similar to Oracle 11g's USER_DUMP_DEST and MAX_DUMP_FILE_SIZE, (c) specifiable in one of the ways described by WL#3824 Resource Limits. We've said that errors will be visible because SETUP_ACTORS.TRACE_ENABLED becomes 'no'. But we can guess that users sometimes want to choose what action to take on error. They could want to increment a counter, or change file destination, or cause a warning from the SQL statement.
Copyright (c) 2000, 2017, Oracle Corporation and/or its affiliates. All rights reserved.