Make tables in PERFORMANCE_SCHEMA which summarize instrumented data. The initial description here was shifted from the "Summaries" section of WL#2360 Performance Schema, which is now complete. The description existed to show that, with WL#2360 as a basis, we can produce what we'll want. Now we commit to the plan.
[ ADVANCE PSEUDO GROUP BY ] Suppose the EVENTS_WAITS_HISTORY table contained rows, not just for the last 10 events, but for all events since the instance started. Then we could say: SELECT EVENT_NAME, COUNT(*) AS COUNT_STAR, SUM(TIMER_WAIT) AS SUM_TIMER_WAIT FROM EVENTS_WAITS_HISTORY WHERE /* filter_condition, e.g. "in a previous statement" */ GROUP BY EVENT_NAME And we'd have a lovely table like this: EVENT_NAME COUNT_STAR SUM_TIMER_WAIT ---------- ---------- -------------- wait/synch/mutex/sql/LOCK_open 14 140 wait/synch/rwlock/innodb/buf_block_lock 8 80 wait/io/file/csv/data 22 220 Terminology notes: here EVENT_NAME is the "grouping column", COUNT and SUM are "aggregates", TIMER_WAIT is an "aggregating column", the result set is a "summary table". You'll see those terms several times. All summary information is based on groupings and aggregatings. The only flaw is that it's "pseudo". There is no table that goes back forever. Shortly after producing an EVENTS_WAITS_HISTORY row, we risk overwriting it (see section "Ring Buffers ..." in WL#2360). So, to provide summary information, we'll have to do the grouping at the same time as, or shortly after, the production of the EVENTS_WAITS_CURRENT row. To some extent, that means we must guess what the user will want. But we can guess well. We can provide a lot of tables for the user to pick from (by enabling in SETUP_CONSUMERS). We can make sure that the tables include anything that's been requested before or provided before for MySQL or for some other DBMS. [ WHAT WE CAN USE FOR AGGREGATING ] These are the possible aggregations, in order of increasing difficulty. COUNT(*) Easy. If there's a row, then ++COUNT_STAR somewhere. SUM(TIMER_WAIT) Easy. Subtract TIMER_START from TIMER_END (or from current time if TIMER_END is still null), and add that to SUM_TIMER_WAIT somewhere. AVG(TIMER_WAIT) Easy. The consumer routine must calculate averages. MIN(TIMER_WAIT) A bit harder. The typical algorithm for calculating or minimum|maximum in a list will have a line like this: MAX(TIMER_WAIT) "if (curr_value < min_value) then min_value = curr_value;" but that might cause a 'jump' instruction at assembly level. See "#72 Replace the min+max calculations with tricky alternatives" in WL#5414. STDDEV(TIMER_WAIT) Impossible. We won't do equivalents for any of the other MySQL aggregating functions like BIT_XOR, GROUP_CONCAT, STDDEV, and the others listed in the reference manual http://dev.mysql.com/doc/refman/6.0/en/group-by-functions.html because even if we could do them, who would care? TIMER_WAIT is the basis for most aggregations. There were other possibilities: MIN|MAX(START_TIME|END_TIME) if the timer is wall-clock this makes some sense; SUM(SPINS) remembering that there might be other 'extra' columns like SPINS. We ignore these other possibilities. [ WHAT WE CAN USE FOR GROUPING ] These are the possible grouping columns: THREAD_ID Actually grouping "by thread" is not a problem because it's implicit. Almost all summarizations are done within a thread. We do not update counters of other threads, and we do not update global counters, except in two special cases (see later discussion of "global summaries", and notice that object summary tables are global). This is one of the reasons we won't need mutexes. EVENT_NAME The components of EVENT_NAME are all candidates for grouping. To answer questions like "how many mutex_locks occurred for LOCK_open" or "what is the average time for each mutex event", we'd need a summary table that has a row for each species. And specifies could be accumulated into genera which could be accumulated into orders, which could be accumulated into classes, all the way up the taxonomy line. But we'll leave the higher accumulation levels out of WL#4816, we'll go by species. OPERATION Originally this was part of EVENT_NAME, and originally the idea was to group by operations as well. But that idea ran into some opposition. Notice that EVENT_NAME, our favourite grouping column, has a fixed number of possible values and they are all knowable in advance. (Actually a few can be added by storage engines during server initialization but that's still "in advance" for most practical purposes.) This is an important consideration. It means that we can preallocate memory for the grouping permanently, and it means that we can point to the memory quickly, because it's fixed offset. We might get a request for a "per minute" grouping, based on ROUND(TIMER_START/60000000000) or something similar. We won't do it because we can't precalculate the possible values. But it's irrelevant anyway, because groupings by time periods can be done with "Samplings". (See WL#2360 section "Sampling".) So we won't use any timer columns as grouping columns. We might get a request for a grouping based on OBJECT_INSTANCE_BEGIN. This would be great for answering really detailed questions like "how many times did somebody access row#5" or "what was the time elapsed when we wrote to a file starting at byte#16384". Sorry, there are just too many possible values, Nobody saw a way to do it. Histograms for OBJECT_INSTANCE_BEGIN might be a substitute, but we won't have them either. I (Peter) will talk later about "per statement" and "per object" groupings. I think it's unlikely that people would want a grouping by EVENT_ID or SPINS. And when something's unlikely, it's out of the question. Combining the considerations about aggregating and grouping, we know therefore there is an 'aggregators' struc which will contain these fields: NAME LENGTH DESCRIPTION grouping_type 2 Identifies the event, correlates with event_name count_star 8 We ++count_star whenever we end a wait sum_time_elapsed 8 We add to sum_time_elapsed when we end a wait min_time_elapsed 8 We calculate min_time_elapsed when we end a wait max_time_elapsed 8 We calculate max_time_elapsed when we end a wait Total length = 34 bytes for one aggregators-struc occurrence. [ DEFINITION OF A TYPICAL TABLE ] (Note added 2010-06-30: Formerly there were references to "by genus" and EVENTS_WAITS_SUMMARY_BY_THREAD_BY_GENUS. Now it's "by species", that is, by the whole event name.) Table name = EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME List of columns and data types: THREAD_ID BIGINT /* or INTEGER */ NOT NULL EVENT_NAME VARCHAR(128) NOT NULL COUNT_STAR BIGINT UNSIGNED NOT NULL SUM_TIMER_WAIT BIGINT UNSIGNED NOT NULL MIN_TIMER_WAIT BIGINT UNSIGNED NOT NULL AVG_TIMER_WAIT BIGINT UNSIGNED NOT NULL MAX_TIMER_WAIT BIGINT UNSIGNED NOT NULL Compare the definition of EVENTS_WAITS_CURRENT: columns EVENT_ID + SOURCE + SPINS + OBJECT_SCHEMA + OBJECT_NAME + OBJECT_TYPE + OBJECT_INSTANCE_BEGIN + OPERATION + NUMBER_OF_BYTES are gone, column TIMER_WAIT is summarized, column COUNT_STAR is added. We prefer to have the columns for the effective GROUP BY clause together and in order according to the effective groupings, followed by the aggregate columns. We prefer the aggregate columns to follow the effective GROUP BY columns, in the order -- this is an arbitrary choice -- COUNT, SUM, MIN, AVG, MAX. [ NESTING-EVENT SUMMARIES ] We have moved the specification for nesting-event summaries to WL#4813 PERFORMANCE_SCHEMA Instrumenting Stages. At time of writing we believe that "Stages" will be the first nesting event which we implement, so nesting-event summaries fit better in that worklog task, which will come later. [ HISTOGRAMS ] We have moved the specification for histograms to WL#5384 PERFORMANCE_SCHEMA Histograms. At time of writing we believe that we will not implement histograms for a long time, but they're there for later consideration. [ SETUP_OBJECTS ] (2010-07-12 This section was moved out of OBJECT_SUMMARIES because it is not specifically for OBJECT_SUMMARIES any more.) We have a SETUP table. Call it SETUP_OBJECTS. It contains: OBJECT_TYPE ENUM('TABLE') NOT NULL DEFAULT 'TABLE' OBJECT_SCHEMA VARCHAR(64) DEFAULT '%' OBJECT_NAME VARCHAR(64) NOT NULL DEFAULT '%' TIMED ENUM('YES','NO') NOT NULL DEFAULT 'YES' The OBJECT_TYPE column may only contain 'TABLE'. 'TABLE' meaning base table is the only possible type but the OBJECT_TYPE column is just here for future use. The OBJECT_SCHEMA column identifies the schema that an object is in. It is nullable in case someday we decide to include objects which are not part of schemas. If the value is the wildcard '%', it means "any schema". The OBJECT_NAME column identifies the object. If the value is the wildcard '%', it means "any object". The TIMED column is 'YES' if events for the object are to be timed, rather than merely counted. One may SELECT, INSERT, UPDATE, or DELETE from SETUP_OBJECTS provided one has the appropriate privilege. The presence of a row in SETUP_OBJECTS means that we want to enable instrumentation for events which occur on the object identified by OBJECT_TYPE plus OBJECT_SCHEMA plus OBJECT_NAME. This affects instrumentation, not merely summaries. The enablement is combined (ANDed) with the settings in SETUP_INSTRUMENTS for the event names that begin with 'wait/io/table...' or 'wait/lock/table...'. Also assume that the thread is enabled, see WL#4674. Examples: 1. INSERT INTO SETUP_OBJECTS ('TABLE','TEST','T','YES'); This means "enable instrumentation for events on table TEST.T if SETUP_INSTRUMENTS.ENABLED = 'YES', and enable timing as well if SETUP_INSTRUMENTS.TIMED = 'YES'. 2. INSERT INTO SETUP_OBJECTS ('TABLE','TEST','%','NO'); This means "enable instrumentation for events on any table in schema TEST (again, provided SETUP_INSTRUMENTS also says enabled). Do not turn the timer on, even if SETUP_INSTRUMENTS.TIMED is 'YES'. 3. INSERT INTO SETUP_OBJECTS ('TABLE','TEST','T','YES'); INSERT INTO SETUP_OBJECTS ('TABLE','TEST','%','NO'); This enables just as indicated for examples (1.) and (2.). The question might arise what happens for table TEST.T since it is true for both rows that we're inserting, but one row says TIMED='YES' while the other row says TIMED='NO'. The answer is: it's still enabled, but whether it's timed or not is implementation-dependent. Implementor choice: 'TEST', 'T' is more specific than 'TEST', '%', so the more specific row prevails, and the table is timed in this example. Likewise, 'TEST', '%' is more specific than '%', '%' and takes precedence. 4. DELETE FROM SETUP_OBJECTS; When SETUP_OBJECTS is empty, instrumentation is off. Turning off instrumentation will not affect any summaries which already exist, however. Initially there is one row with ('TABLE','%','%','YES'). We won't instrument for tables in PERFORMANCE_SCHEMA itself, or in INFORMATION_SCHEMA, regardless of what values are in SETUP_OBJECTS. The maximum number of rows in SETUP_OBJECTS is (performance_schema_setup_objects_size) 100. We use the term "matching" when asking "how does MySQL compare the table name it's working on with the table name in SETUP_OBJECTS". The matching effectively takes place at the time the event happens (although in practice the matching only has to happen once unless SETUP_OBJECTS changes.) The matching uses an implementation-dependent collation. For tables, matching happens on both persistent and temporary tables. For a wildcard, matching is always true. Summaries only take place for events which are instrumented. [ OBJECT SUMMARIES ] A grouping "by object" is what people are asking for when they say "* Track blocks read per file * Track blocks written per file * Track I/O per second average per file" [mysql inside intranet address]/wiki/ServerMonitoringWishlist And one of the Google patches can give counts of reads and writes per table, or per index. I've also heard a request for "how often has a procedure been called". Generally we want to summarize for "database objects" (base tables, indexes, stored procedures, functions, events, triggers, modules, constraints) or named "session objects" (prepared statements, cursors, conditions, variables, savepoints). We don't worry about summarizing for things like mutexes because they'd be both easy and unnecessary. We can get summaries for all the lock_open mutexes without needing to know the address that was used for lock_open. At first glance it looks simple: just declare that OBJECT_NAME etc. are grouping columns, or do the same things that we did for "nesting-event summaries". But both those methods fail because: * the number of objects isn't fixed and might be large * the location of the object struc isn't known, it must be looked up * the object name columns are strings, so they take time to compare. So we need WL#2760."Super-Database (real Data Dictionary)". I'm assuming that WL#2760 will give us persistent in-memory descriptors for all schema objects, so we'll be able to use an address or offset whenever we're using the object. When we have that, object summaries will be easy. However, WL#2760 will not be available for a long time. In the meantime, we offer a restricted-functionality solution. This works only for base tables. MySQL sets up an internal list for each table that MySQL becomes aware of, in a way which the implementor defines. The maximum number of rows in this internal table is (performance_schema_max_table_handles) 10000. The internal table includes information to identify the object. If an event takes place which affects a table, if internal-table schema+name match schema+name in any SETUP_OBJECTS row, if instrumentation is enabled as described in section SETUP_OBJECTS, then instrumentation and aggregation takes place (MySQL adds to the aggregator fields, in the same manner as when it adds to nesting-event aggregator fields). So now we can describe what is in the summary table. Table name = OBJECTS_SUMMARY_GLOBAL_BY_TYPE List of columns and data types: OBJECT_TYPE VARCHAR(64) NOT NULL OBJECT_SCHEMA VARCHAR(64) NOT NULL OBJECT_NAME VARCHAR(64) NOT NULL COUNT_STAR BIGINT UNSIGNED NOT NULL SUM_TIMER_WAIT BIGINT UNSIGNED NOT NULL MIN_TIMER_WAIT BIGINT UNSIGNED NOT NULL AVG_TIMER_WAIT BIGINT UNSIGNED NOT NULL MAX_TIMER_WAIT BIGINT UNSIGNED NOT NULL The values for OBJECT_TYPE, OBJECT_SCHEMA, and OBJECT_NAME are as in SETUP_OBJECTS. The values for COUNT_STAR, SUM_TIMER_WAIT, MIN_TIMER_WAIT, AVG_TIMER_WAIT, MAX_TIMER_WAIT are as in other summary tables such as EVENTS_WAITS_SUMMARY_BY_EVENT_NAME. By searching the global table, you'll have counting and timing caused by all the table io done on the table. There are two matters which the implementor may decide: * If the SETUP_OBJECTS row is '%','%', and the event is on table TEST.T, shall the OBJECTS_SUMMARY_GLOBAL_BY_TYPE row be 'TABLE','TEST','T',1,2,3,4,5 or 'TABLE','%','%',1,2,3,4,5 Implementor choice: 'TABLE','TEST','T',1,2,3,4,5 The real object schema and object name is used. * If a row is dropped from the internal table, shall it still appear in OBJECTS_SUMMARY_GLOBAL_BY_TYPE, or shall it disappear at any implementation-dependent time, or shall its counters be added to a "summary summary" row. Implementor choice: When a row is dropped from the internal table, it is also dropped from OBJECTS_SUMMARY_GLOBAL_BY_TYPE. One may SELECT or TRUNCATE from OBJECTS_SUMMARY_GLOBAL_BY_TYPE provided one has the appropriate privileges. Only one data-change statement is legal: TRUNCATE OBJECTS_SUMMARY_GLOBAL_BY_TYPE; which will reset all the bigint columns to 0 or NULL, but will not actually remove any rows. The implementor will doubtless have some trouble with robustness, given that users might be updating SETUP_OBJECTS, and the contents of the internal table might be changing, or another thread might be working on the same object, while aggregating occurs. Once again, we will just say: very occasionally there will be erroneous information, we just document that's the way things are. Once WL#2760 is done and we add support for other object types besides 'TABLE', very little will change from the user's point of view. The table and column descriptions will be the same. For comparison with another DBMS, see dev-private email [ mysql intranet archive ] /secure/mailarchive/mail.php?folder=4&mail=35591 [ INFINITE HISTORIES ] The proposal is "no infinite histories", but let's consider them. At the outset, I said that we have to update summaries during the instrumentation because "There is no table that goes back forever." But what if there was? There are two possibilities: * Add 12GB of memory to your machine. Now there is enough space for making EVENTS_WAITS_HISTORY_LONG into a "long long long" table, by changing performance_schema_events_waits_history_long_size. * Enable file trace. Now there is a disk file instead of memory. We can convert it to a MySQL table. With such scenarios, the instrumentation doesn't need to do any summarizing at all. We can assume that users will do their own ad-hoc grouping and aggregating for everything. But, if we require lots of memory, we make the instrumentation obtrusive. Our objective is to make all users comfortable with monitoring always on. And benchmarks now show that the performance schema collects up to 800,000 events per second, which would overflow a 12GB history in one minute. Really, this indicates that EVENTS_WAITS_HISTORY_LONG is "mostly for debug". And trace files have high overhead, as mentioned elsewhere (WL#4878). So assume there are no infinite histories. [ HIGHER-LEVEL SUMMARIES ] This section is informative; it is not part of the WL#4816 requirement. Earlier we said that there could be multiple summary tables, each for a different EVENT_NAME level, for example EVENTS_WAITS_SUMMARY_BY_FAMILY and EVENTS_WAITS_SUMMARY_BY_GENUS. but there is only one underlying structure, the summaries for "by species". For all tables above the species level (genus, family, order, class, etc.), the instrumentation does nothing at all. We know that the Consumer can calculate the per-genus summaries by adding up the per-species summaries. Higher-level summaries are calculated, so they're disabled if the lower-level summaries are disabled. [ GLOBAL SUMMARIES ] A global summary would be processed thus: There are also occurrences of performance-structure-summary in global memory. But these global summaries are not totals for all current jobs. They are only for non-current jobs, that is, jobs that have terminated. Why don't we update the global summaries at the same time as the per-thread summaries? Because we fear conflicts, with two threads updating the global summaries at the same time. That might be harmless if the only operation is ADD. But the second reason is that we want the Provider's End procedure ("end_wait") to be as fast as possible, and so we'll pass extra work to the Consumer. When the Provider thread is about to terminate, it gets a mutex (call it Terminator mutex) and then adds all per-thread summaries to the global summaries. When a Consumer thread is about to display global information, for example "the total number of mutex acquisitions for all sessions since the server started", it must go through all the per-thread summaries and add them in its temporary storage, and then add the global summaries. So the important thing to understand is that all such summmaries are done "within a thread", thus removing contention problems, which is important because Providers never use mutexes. But if a thread is killed and its memory is deallocated, then all its tables re copied to a single "global summaries" area. This copying has to be under mutex control, and Consumers must see the mutex, but it's not an exception to our "Providers never use mutexes" rules. I mean the instrumentation has no mutexes. This does mean that monitoring a dead thread is impossible. But we have merely decided that information about dead threads is less important, so it ends up in summaries, samples, and user-generated histograms. Its details disappear. The consumer of a "global" summary table will have to go through all the threads and add them up, and then add the summaries from the dead-threads global area. Global summaries are calculated, so they're disabled if the by-thread summaries are disabled. For example, EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME would have the same columns as EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME except for the first column, THREAD_ID. [ CLEARING ] Low-level summary tables can be cleared by saying TRUNCATE EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME (or some other summary-table name). Appropriate privileges on the table are necessary. The effect is zeroing of the aggregate columns, not removal of the rows. This functionality is apparently desirable -- the InnoDB metrics table has a "reset" feature, and Marc Alff says "we need a way to clean summaries after the configuration of SETUP_INSTRUMENTS / SETUP_CONSUMERS / etc has changed, so that the results can be meaningful and not tainted by previous aggregates made using different conditions." [ THE ROUTINE ] Simply put, a per-thread summary would be processed thus: In the End procedure ("end_wait"), the Provider takes values from performance-structure and adds them to a performance-structure-aggregator. The summary is "by class of event", still within the thread. For example, when we finish a mutex call, we calculate the total canonical time units waited (.timer_end - .timer_start) and add it to performance-structure-aggregators [mutex] . sum_timer_wait. And we increment performance-structure-summary [mutex] . count_star. We will need this summary for another table, too. Remember that intrumentation has a 'start' (before the instrumented code) and an 'end' (after the instrumented code). All summarizing is in the 'end' ("end_wait"). The pseudocode for a WAIT instrumentation is: /* Do the non-summary instrumentation code. */ if (summary table for statement is enabled) ++statement.summary.count_star; statement.summary.sum_elapsed+=elapsed-time-for-this-instrument; conditional-move(statement.summary.min_elapsed,elapsed-time-for-this-instrument); conditional-move(statement.summary.max_elapsed,elapsed-time-for-this-instrument); if (summary table for this wait is enabled) ++wait.summary[event].count_star; wait.summary[event].sum_elapsed+=elapsed-time-for-this-instrument; conditional-move(wait.summary[event].min_elapsed,elapsed-time-for-this-instrument); conditional-move(wait.summary[event].max_elapsed,elapsed-time-for-this-instrument); In assembler, with wonderful optimization, with all summaries enabled, this will take 2 not-taken conditional-jump instructions, 4 increment or add instructions, 1 multiplication to get canonical time units from the units that the timer returns, 4 conditional moves, and 1/5 cache miss. 55 cycles. Eventually there will be 2 other nesting-event classes besides 'statements', so multiply by 2, and say 110 cycles. This is a good time to estimate the total cost. It's 110 cycles overhead in speed terms, and in memory terms it's 34 bytes per 'aggregators' struc times 500 occurrences of 'aggregators' in genus-level summaries plus 2 occurrences of 'aggregators' in statement or stage summaries times 200 threads plus 500 occurrences of 'aggregators' in global summaries Typically, then, it will all add up to somewhere around 1MB. Our cost estimate, then, is 110 extra cycles per instrument and 1MB extra memory. (Peter did some tests to see whether the above estimating is realistic nowadays. The practice doesn't match the theory, but the architecture review box can be ticked anyway. Ignore the estimating, it is useless for review purposes.) [ TRICKS ] These are tricks; they are recommendations; they are not requirements. 1. Be cheap with SETUP_CONSUMERS. Effectively SETUP_CONSUMERS acts as a filter on production of summary tables. Disabling saves a tiny amount of time, but more importantly, it can save memory. Suppose that there are 1000 summary tables, but we have a fixed limit, we say that users cannot enable more than 100. Now, allocate memory for only 100 strucs, and one list of pointers from the enabled-summaries list to the strucs themselves. 2. Group by genus or family not by species. This is just because we might have too many species (so far we have about 200). So another way to accomplish the same purpose is to disable instrumenting for many species, permanently. See also WL#5414 PERFORMANCE_SCHEMA Optimize Instrumentation. [ EXAMPLES ] Assuming default schema = performance schema. "What is the total number of wait calls of any type since instance start?" SELECT SUM(COUNT_STAR) FROM EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME; "For the current statement on thread 17, show the statement elapsed time, the total time spent in the last 10 mutexes, and the average time spent on all mutexes of type 'lock_open'". SELECT TIMER_WAIT FROM EVENTS_STATEMENTS_CURRENT WHERE THREAD_ID = 17 UNION ALL SELECT SUM(TIMER_WAIT) FROM EVENTS_WAITS_HISTORY WHERE THREAD_ID = 17 UNION ALL SELECT WAIT_SYNCH_MUTEX_AVG_TIMER_WAIT FROM EVENTS_STATEMENTS_CURRENT WHERE WAIT_SYNCH_MUTEX_EVENT_NAME = 'wait/synch/mutex/sql/LOCK_open'; "Show contents of all waits summary tables." (Note added 2010-06-30: removed some class/family/order/class examples.) SELECT * FROM EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME; ... SELECT * FROM EVENTS_STAGES_SUMMARY_BY_THREAD_BY_EVENT_NAME; ... SELECT * FROM EVENTS_STATEMENTS_SUMMARY_BY_THREAD_BY_EVENT_NAME; ... SELECT * FROM EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME; ... SELECT * FROM EVENTS_STAGES_SUMMARY_GLOBAL_BY_EVENT_NAME; ... SELECT * FROM EVENTS_STATEMENTS_SUMMARY_GLOBAL_BY_EVENT_NAME; /* many tables in all */ [ REQUIREMENTS ] Note: This section is refining the general specification written in this HLS into more detailed technical or functional requirements, to help scope the tests. This is *not* yet another spec, just some logical consequences of the HLS sections written above. (1) Table performance_schema.setup_objects. ------------------------------------------- Func-Req (1.1): A fresh MySQL installation of 5.6 must create the table performance_schema.setup_objects. See the HLS for the table structure. Func-Req (1.2): An upgrade from 5.5 to 5.6 must create the table performance_schema.setup_objects. Func-Req (1.3): Security privileges for performance_schema.setup_objects are enforced. Legal operations are SELECT, INSERT, UPDATE, DELETE, TRUNCATE. CREATE TABLE and DROP TABLE are currently also legal, as they are used during install/upgrade. Func-Req (1.4): Table performance_schema.setup_objects is visible in the information_schema. Func-Req (1.5): Table performance_schema.setup_objects is visible in SHOW TABLES. Func-Req (1.6): Rows in performance_schema.setup_objects controls whether or not TABLE instrumentation is enabled and/or timed. See the HLS for details. (2) Table performance_schema.objects_summary_global_by_type. ------------------------------------------------------------ Func-Req (2.1): A fresh MySQL installation of 5.6 must create the table performance_schema.objects_summary_global_by_type. See the HLS for the table structure. Func-Req (2.2): An upgrade from 5.5 to 5.6 must create the table performance_schema.objects_summary_global_by_type. Func-Req (2.3): Security privileges for performance_schema.objects_summary_global_by_type are enforced. Legal operations are SELECT, and TRUNCATE. CREATE TABLE and DROP TABLE are currently also legal, as they are used during install/upgrade. Func-Req (2.4): Table performance_schema.objects_summary_global_by_type is visible in the information_schema. Func-Req (2.5): Table performance_schema.objects_summary_global_by_type is visible in SHOW TABLES. Func-Req (2.6): Rows in performance_schema.objects_summary_global_by_type are "per objects" aggregations. This aggregation is the logical equivalent of: SELECT object_type, object_schema, object_name, count(*) as count_star, sum(timer_wait) as sum_timer_wait, min(timer_wait) as min_timer_wait, avg(timer_wait) as avg_timer_wait, max(timer_wait) as max_timer_wait FROM performance_schema.events_waits_history_infinite where object_type='TABLE' group by object_type, object_schema, object_name. Func-Req (2.7): Truncate table performance_schema.objects_summary_global_by_type resets the statistics collected. (3) Existing performance schema tables. --------------------------------------- Func-Req (3.1): Table performance_schema.events_waits_summary_global_by_event_name shows aggregations for the table io instrument "wait/io/table/sql/handler". Func-Req (3.2): Table performance_schema.events_waits_summary_by_thread_by_event_name shows aggregations for the table io instrument "wait/io/table/sql/handler". (4) Server variables. --------------------- Func-Req (4.1): There is a new server variable performance_schema_setup_objects_size. This variable is global and readonly. Func-Req (4.2): There is a new server command line option --performance-schema-setup-objects-size. Server implementation. ---------------------- Tech-Req: The internal implementation of summaries in the performance schema has (significantly) changed with this WL. This is a *technical* requirement only that affects design, that can not be tested by a functional test. Note from Marc: I consider this one already covered by code review.
The detailed design is documented in doxygen format. See instructions in the file Doxyfile-perfschema. In particular, see the doxygen page about aggregation.