Monitoring Table and Index IO with PERFORMANCE_SCHEMA

Another new feature of performance_schema within 5.6 is the ability to monitor both table and index IO – this was added in two parts:

If you read my blog on graphing statement wait times – some of the data that WL#4895 exposes is seen within the examples there, they are within the “light coral” coloured blocks, such as the following from the InnoDB PK lookup example I gave:

Index IO Example

Index IO Example

This shows the level of detail you can get per table IO operation, per statement.

However, as Mark Callaghan noted recently, some of the more valuable insight on how tables / indexes are accessed can be gleaned by looking at these statistics, with some further aggregation, over time.

This is what WL#5379 covers, and what I’ll focus on here. It implements two new tables within performance_schema:

Both tables have this basic structure (although table_io_waits_summary_by_index_usage also adds an INDEX_NAME column, and performs it’s aggregation grouping by the index name as well):

mysql> DESC table_io_waits_summary_by_table;
+------------------+---------------------+------+-----+---------+-------+
| Field            | Type                | Null | Key | Default | Extra |
+------------------+---------------------+------+-----+---------+-------+
| OBJECT_TYPE      | varchar(64)         | YES  |     | NULL    |       |
| OBJECT_SCHEMA    | varchar(64)         | YES  |     | NULL    |       |
| OBJECT_NAME      | varchar(64)         | YES  |     | NULL    |       |
| COUNT_STAR       | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_READ       | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_WRITE      | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_FETCH      | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_INSERT     | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_UPDATE     | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_DELETE     | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
+------------------+---------------------+------+-----+---------+-------+

We aggregate usage data at three different levels within these tables, per table / index, for your convenience:

All ops, with the COUNT_STAR and *_WAIT columns
   Read ops, with the *_READ columns
       Fetch ops, with the *_FETCH columns
   Write ops, with the *_WRITE columns
       Insert ops, with the *_INSERT columns
       Update ops, with the *_UPDATE columns
       Delete ops, with the *_DELETE columns

For those of you that are used to the TABLE_STATISTICS / INDEX_STATISTICS patches from Google, or one of our “siblings” that has incorporated these, you may see the similarity in some of these statistics.

Mark also noted about these patches:

It also is very easy to use — select * from easy_to_remember_table_name. I hope the P_S can also provide that.

I think people will get used to the names of the tables over time (I certainly have, in the testing that I have been doing). However, if these tables either are forgettable (erm, SHOW TABLES, or use a client with auto-completing), or provide too much data by default, or you already have tooling around the Google patches, I would suggest using something that databases are good at – Views.

Here’s how to return the same data as the TABLE_STATISTICS patch:

CREATE DATABASE IF NOT EXISTS ps_helper;
USE ps_helper;

DROP VIEW IF EXISTS table_statistics;

CREATE VIEW table_statistics AS 
SELECT pst.object_schema AS table_schema, 
       pst.object_name AS table_name, 
       pst.count_read AS rows_read, 
       pst.count_write AS rows_changed,
       (pst.count_write * COUNT(psi.index_name)) AS rows_changed_x_indexes
  FROM performance_schema.table_io_waits_summary_by_table AS pst
  LEFT JOIN performance_schema.table_io_waits_summary_by_index_usage AS psi 
    ON pst.object_schema = psi.object_schema AND pst.object_name = psi.object_name
   AND psi.index_name IS NOT NULL
 GROUP BY pst.object_schema, pst.object_name
 ORDER BY pst.sum_timer_wait DESC;

mysql> SELECT * FROM ps_helper.table_statistics LIMIT 5;
+--------------+-----------------------+-----------+--------------+------------------------+
| table_schema | table_name            | rows_read | rows_changed | rows_changed_x_indexes |
+--------------+-----------------------+-----------+--------------+------------------------+
| mysql        | time_zone_leap_second |        23 |            0 |                      0 |
| test         | t1                    |         4 |            1 |                      2 |
| mysql        | slave_master_info     |         1 |            0 |                      0 |
| mysql        | user                  |         5 |            0 |                      0 |
| mysql        | db                    |         3 |            0 |                      0 |
+--------------+-----------------------+-----------+--------------+------------------------+
5 rows in set (0.00 sec)

This solution also orders the results by the SUM_TIMER_WAIT column for each table in the background, which is where we offer more insight in to table usage with performance_schema.

Not only do we count the operations against the tables, but we can also time them as well (if timing is enabled), allowing you to track higher contended tables, not just highly used tables – the result always returns the highest overhead tables at the top, making “top N” queries much easier than with TABLE_STATISTICS (which appears to return rows in some random order).

Personally I’d be tempted to add more columns from our tables though – breaking down by operation type for a start.

And for INDEX_STATISTICS:

DROP VIEW IF EXISTS index_statistics;

CREATE VIEW index_statistics AS
SELECT object_schema AS table_schema,
       object_name AS table_name,
       index_name,
       count_read AS rows_read
  FROM performance_schema.table_io_waits_summary_by_index_usage
 WHERE index_name IS NOT NULL
 ORDER BY sum_timer_wait DESC;

mysql> select * from index_statistics limit 5;
+--------------+-----------------------+------------+-----------+
| table_schema | table_name            | index_name | rows_read |
+--------------+-----------------------+------------+-----------+
| mysql        | time_zone_leap_second | PRIMARY    |        23 |
| mysql        | slave_master_info     | PRIMARY    |         1 |
| test         | t1                    | PRIMARY    |         1 |
| test         | t1                    | j          |         1 |
| mysql        | tables_priv           | PRIMARY    |         1 |
+--------------+-----------------------+------------+-----------+
5 rows in set (0.01 sec)

Again, we order by the SUM_TIMER_WAIT column, to find the highest overhead indexes first, for an easy “top N”.

However, there is another WHERE clause here – WHERE index_name IS NOT NULL – which is worth mentioning. Within the table_io_waits_summary_by_index_usage table, not only do we aggregate operations per index, we also aggregate how many operations did not use indexes when accessing the tables. We do this with an extra row per table, where the INDEX_NAME column is NULL:

mysql> SHOW CREATE TABLE test.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `j` varchar(10) DEFAULT NULL,
  `k` int(11) DEFAULT NULL,
  PRIMARY KEY (`i`),
  KEY `j` (`j`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SELECT object_schema, 
    ->        object_name, 
    ->        index_name, 
    ->        count_fetch,
    ->        count_insert,
    ->        count_update,
    ->        count_delete
    ->   FROM performance_schema.table_io_waits_summary_by_index_usage 
    ->  WHERE object_schema = 'test' 
    ->    AND object_name = 't1';
+---------------+-------------+------------+-------------+--------------+--------------+--------------+
| object_schema | object_name | index_name | count_fetch | count_insert | count_update | count_delete |
+---------------+-------------+------------+-------------+--------------+--------------+--------------+
| test          | t1          | PRIMARY    |           9 |            0 |            0 |            0 |
| test          | t1          | j          |       30803 |            0 |            0 |            0 |
| test          | t1          | NULL       |         101 |        30722 |            4 |            3 |
+---------------+-------------+------------+-------------+--------------+--------------+--------------+
3 rows in set (0.01 sec)

INSERT statements will always be counted against the “null row” per table (an INSERT never directly uses an index, although it may add to them). Currently for the *_UPDATE and *_DELETE columns, we also seem to be putting all aggregation in to the “null row” as well – though I think this is a bug (it is at least undefined in the WL), so I’ve opened Bug#60905 for this. That would allow us to also get more information on which specific access methods, per table, are causing the scans.

Like the Google patches, it is now easy to find out which indexes are not being used as well. However, the performance_schema implementation allows this in a much more elegant way. Where previously you would have had to LEFT JOIN against the INFORMATION_SCHEMA.STATISTICS table as Vadim shows in his example – which incurs a huge penalty on systems with a large number of tables, just look at the statement runtime in the examples there – performance_schema has one row per index in each table, no matter whether it has been read or not, and does not need to hit the disk to gather information around the tables – all of it’s data is entirely in memory, no high overhead JOIN operations are necessary.

To give a view example again:

DROP VIEW IF EXISTS unused_indexes;
 
CREATE VIEW unused_indexes AS
SELECT object_schema,
       object_name,
       index_name
  FROM performance_schema.table_io_waits_summary_by_index_usage 
 WHERE index_name IS NOT NULL
   AND count_star = 0
 ORDER BY object_schema, object_name;

mysql> SELECT * FROM ps_helper.unused_indexes WHERE object_schema = 'test';
+---------------+-------------+------------+
| object_schema | object_name | index_name |
+---------------+-------------+------------+
| test          | t2          | PRIMARY    |
| test          | t2          | j          |
+---------------+-------------+------------+
2 rows in set (0.00 sec)

But given the extra data within performance_schema, it is also possible to find the tables which cause full table scans:


DROP VIEW IF EXISTS table_scans;

CREATE VIEW table_scans AS
SELECT object_schema, 
       object_name,
       count_read AS rows_full_scanned
  FROM performance_schema.table_io_waits_summary_by_index_usage 
 WHERE index_name IS NULL
   AND count_read > 0
 ORDER BY count_read DESC;

mysql> SELECT * FROM ps_helper.table_scans WHERE object_schema = 'test';
+---------------+-------------+-------------------+
| object_schema | object_name | rows_full_scanned |
+---------------+-------------+-------------------+
| test          | t1          |                 5 |
| test          | t2          |                 3 |
+---------------+-------------+-------------------+
2 rows in set (0.01 sec)

We also track temporary tables within this instrumentation:

mysql> SELECT * 
    ->   FROM performance_schema.table_io_waits_summary_by_table 
    ->  WHERE  object_type = 'TEMPORARY TABLE'\G
*************************** 1. row ***************************
     OBJECT_TYPE: TEMPORARY TABLE
   OBJECT_SCHEMA: test
     OBJECT_NAME: t1_temp
      COUNT_STAR: 18
  SUM_TIMER_WAIT: 117369000
  MIN_TIMER_WAIT: 117180
  AVG_TIMER_WAIT: 6520500
  MAX_TIMER_WAIT: 31952340
      COUNT_READ: 12
  SUM_TIMER_READ: 94311000
  MIN_TIMER_READ: 117180
  AVG_TIMER_READ: 7858998
  MAX_TIMER_READ: 31952340
     COUNT_WRITE: 6
 SUM_TIMER_WRITE: 23058000
 MIN_TIMER_WRITE: 147420
 AVG_TIMER_WRITE: 3842748
 MAX_TIMER_WRITE: 13400100
...

All on top of the per-wait information that can be tied to threads, stages, and statements! Possibilities abound! :)

MySQL system status snapshots using the MySQL Enterprise Monitor Agent

MySQL Enterprise Monitor collects a huge amount of information about MySQL and the systems that it runs on. To do this, it employs an “Agent” to collect these statistics.

This can either sit locally to the database server, or on a remote host – perhaps even the same host as the Dashboard server if you decide you don’t want to distribute it to many boxes – and checks when it is local or remote, to decide whether to collect OS statistics or not. A single agent can monitor either a single or multiple instances from the same process.

Not all of the information that it can collect is fully exposed within the UI yet however (there are so many things to do lately!), so some people don’t realize the kind of data that we can gather on top of the default that is exposed. Another thing, that is even more non-obvious, is that the MEM Agent also collects all of the data for the various things that it has scheduled to run (either statements against MySQL, or calls to SIGAR for abstracted OS data, or data from MySQL Proxy for Query Analysis), even if it isn’t actually required to send all of that data back to the Dashboard server.

It keeps all of this as an internal “inventory”, in what we term the “item hash”. A few months ago, whilst trying to debug some memory issues, Jan Kneschke (of MySQL Proxy and lighttpd fame) implemented a way to dump this item hash in full, using YAML, to a file local to the agent – by sending a SIGUSR2 signal to the running process:

 
$ killall -USR2 mysql-monitor-agent

The agent then dumps it’s item hash structure to a file in the /tmp directory, with the date and time in the name, such as /tmp/mysql-monitor-agent-items.dump-20110414-124656.txt.

I’d been playing with this recently, and realized that this also allows you to maintain a local set of snapshots of performance and configuration of the entire system on a regular basis as well. Having the MEM UI is wonderful, but sometimes it’s handy to have something to {grep/sed/awk/your favorite} on the shell of the system you are trying to diagnose as well.

So a quick cron job later:

$ crontab -l
* * * * * killall -USR2 mysql-monitor-agent
$ ll /tmp/mysql-monitor-agent-items.dump*
-rw-rw-rw-  1 mark  wheel  88332 14 Apr 13:05 /tmp/mysql-monitor-agent-items.dump-20110414-130500.txt
-rw-rw-rw-  1 mark  wheel  88331 14 Apr 13:06 /tmp/mysql-monitor-agent-items.dump-20110414-130600.txt
-rw-rw-rw-  1 mark  wheel  88332 14 Apr 13:07 /tmp/mysql-monitor-agent-items.dump-20110414-130700.txt
-rw-rw-rw-  1 mark  wheel  88332 14 Apr 13:08 /tmp/mysql-monitor-agent-items.dump-20110414-130800.txt
...

Here’s an example of one of the above files, which is monitoring a stand alone 5.5.10 instance on my macbook.

The file starts off with some info on the agent process itself, followed by the output of various SQL statements that we run (those in the items-mysql-monitor.xml or custom.xml files), and then the main data collection items for the various in-built collectors. To point out some of the interesting parts, you can get:

  • All of the parsed variables from SHOW ENGINE INNODB STATUS
  •  "innodbstatus": 
        attributes:
          "innodb_bp_add_alloc": int
          "innodb_bp_created_per_sec": float
          "innodb_bp_db_pages": int
          "innodb_bp_dictionary_alloc": int
    ...
        instances:
          "5a41ac04-115d-448f-a0ca-144eeb5ac8b5": 
    ...
            values:
              "innodb_bp_add_alloc": 0
              "innodb_bp_created_per_sec": 0.000000
              "innodb_bp_db_pages": 191
              "innodb_bp_dictionary_alloc": 63928
    
  • SHOW GLOBAL STATUS
  •  
      "status": 
        attributes:
    ...
          "Bytes_received": int
          "Bytes_sent": int
    ...
        instances:
    ...
            values:
    ...
              "Bytes_received": 6073208
              "Bytes_sent": 70190554
    
  • SHOW GLOBAL VARIABLES
  •  
      "variables": 
        attributes:
          "auto_increment_increment": int
          "auto_increment_offset": int
    ...
        instances:
    ...
            values:
              "auto_increment_increment": 1
              "auto_increment_offset": 1
    
  • SHOW PROCESSLIST
  •  
      "processlist": 
    ...
        instances:
    ...
            values:
              "Command": "Query"
              "Host": "localhost:51682"
              "Id": 4
              "Info": "SHOW /*!40000 FULL */ PROCESSLIST"
              "State": undefined
              "Time": 0
              "User": "msandbox"
              "db": undefined
    ...
    
  • CPU details (the counters are CPU ticks), for each CPU in the system (counting from 0 upwards)
  •  
    "os": 
      "cpu": 
        attributes:
          "cpu_cache_size": int
          "cpu_idle": int
    ...
        instances:
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}.0": 
    ...
            values:
              "cpu_cache_size": 6144
              "cpu_idle": 445505030
    ...
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}.1": 
    ...
            values:
              "cpu_cache_size": 6144
              "cpu_idle": 442250880
    ...
    
  • Disk IO information
  •  
      "disk": 
        attributes:
          "disk_bytes_read": int
          "disk_bytes_written": int
    ...
        instances:
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}./dev/disk0s2": 
    ...
            values:
              "disk_bytes_read": 43988611584
              "disk_bytes_written": 101020259328
    ...
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}./dev/disk2s3": 
    ...
            values:
              "disk_bytes_read": 12241350656
              "disk_bytes_written": 18404140032
    

    Without even showing the output you can get from SHOW SLAVE STATUS, SHOW MASTER STATUS, SHOW MASTER LOGS, mountpoint space usage information, memory usage, network interface information, load averages, OS details, all the info from the various SQL statement results, Query Analysis data (if enabled) – all in a single snapshot of data.

    So now if you’re interested how something has been running over time, and you can’t get to a MEM UI for some reason – or want to do some local analysis using your own scripts:

     
    $ grep -ih com_show_variables /tmp/mysql-monitor-agent-items* | grep -v int
              "Com_show_variables": 2781
              "Com_show_variables": 2782
              "Com_show_variables": 2783
              "Com_show_variables": 2784
              "Com_show_variables": 2785
    ...
    

    Or if somebody tells you some hiccup happened at 12:46 today – you can go and take a look at this file as well!

    Most of the data is collected in one minute intervals – especially all of the OS data, and the output of SHOW [VARIABLES | STATUS | INNODB STATUS | MASTER STATUS | SLAVE STATUS]. Others like some of the SQL statements are run on different intervals (some up to 6 hours or more apart), so these may not be super up to date for the current snapshot.

    SHOW PROCESSLIST by default only runs once on startup and is not scheduled for anything further by the MEM UI yet, so that will not auto-update. You can add a rule that uses one of the items from it’s output to force this however, such as mysql:processlist:time, using a dummy “instance” for the variable such as 0 (the instance in the processlist case is the connection id from SHOW PROCESSLIST), so that no instances are ever pulled back to MEM, wasting space there. Schedule that dummy rule for once a minute as well, and you should get the full output of SHOW PROCESSLIST in your item dump as well.

    What a difference Atomics can make

    Following up to my previous blog on graphing statement execution in performance_schema, Sunny Bains on the InnoDB team pointed out that in looking at the INSERT graph, he didn’t think I had atomic operations enabled within my build.

    Particularly here (from trunk):

    225 /******************************************************************//**
    226 Increments lock_word the specified amount and returns new value.
    227 @return lock->lock_word after increment */
    228 UNIV_INLINE
    229 lint
    230 rw_lock_lock_word_incr(
    231 /*===================*/
    232         rw_lock_t*      lock,           /*!< in/out: rw-lock */
    233         ulint           amount)         /*!< in: amount of increment */
    234 {
    235 #ifdef INNODB_RW_LOCKS_USE_ATOMICS
    236         return(os_atomic_increment_lint(&lock->lock_word, amount));
    237 #else /* INNODB_RW_LOCKS_USE_ATOMICS */
    238         lint local_lock_word;
    239 
    240         mutex_enter(&(lock->mutex));    < === See this a lot in his graph
    241 
    242         lock->lock_word += amount;
    243         local_lock_word = lock->lock_word;
    244 
    245         mutex_exit(&(lock->mutex));
    246 
    247         return(local_lock_word);
    248 #endif /* INNODB_RW_LOCKS_USE_ATOMICS */
    249 }
    

    He was of course right, I was building on OS X 10.5, which ships with this gcc version by default:

    Cerberus:lp-mysql-trunk mark$ gcc --version
    i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5493)
    

    InnoDB supports atomic operations with GCC 4.1.2 or later, so a quick update of gcc using ports (sudo port install gcc45) later:

    Cerberus:lp-mysql-trunk mark$ export CC=/opt/local/bin/gcc-mp-4.5
    Cerberus:lp-mysql-trunk mark$ export CXX=/opt/local/bin/gcc-mp-4.5
    Cerberus:lp-mysql-trunk mark$ cmake . && make
    -- The C compiler identification is GNU
    -- The CXX compiler identification is GNU
    -- The CXX compiler identification is GNU
    ...
    -- Performing Test HAVE_GCC_ATOMIC_BUILTINS
    -- Performing Test HAVE_GCC_ATOMIC_BUILTINS - Success
    ...
    -- Performing Test HAVE_IB_GCC_ATOMIC_BUILTINS
    -- Performing Test HAVE_IB_GCC_ATOMIC_BUILTINS - Success
    -- Performing Test HAVE_IB_ATOMIC_PTHREAD_T_GCC
    -- Performing Test HAVE_IB_ATOMIC_PTHREAD_T_GCC - Success
    ...

    Re-running the same tests that I had done before to generate the graphs really does tell a picture, even when just looking at the thumbnail view:

    InnoDB INSERT no atomics

    InnoDB INSERT no atomics

    InnoDB INSERT atomics

    InnoDB INSERT atomics

    The timings may not be a true reflection of the consistent overhead (this is my macbook, which does a lot of other work whilst I’m testing this stuff out), but just looking at the difference in numbers of synchronization points, it’s clear that there will be far less contention (and hence potential wait time) in the atomics cases – obviously.

    So what’s the lesson we can learn from this?

    If you see a large number of waits for either the wait/synch/mutex/innodb/rw_lock_mutex or wait/synch/mutex/innodb/thr_local_mutex, you may want to investigate whether the build you are using for your platform has atomic operations enabled. If not, then consider finding a platform or build that does support them.

    These are instrumented within performance_schema within the current 5.5 GA release as well, so you can tell whether these are one of your top waits by simply looking at the summary of all mutex waits for InnoDB globally:

    Cerberus:~ mark$ ~/sandboxes/msb_5_5_10/use
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 2
    Server version: 5.5.10 MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql> SELECT * 
        ->   FROM performance_schema.events_waits_summary_global_by_event_name 
        ->  WHERE event_name like 'wait/synch/mutex/innodb/%' 
        ->  ORDER BY sum_timer_wait DESC;
    +-------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
    | EVENT_NAME                                            | COUNT_STAR | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
    +-------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
    | wait/synch/mutex/innodb/mutex_list_mutex              |       8367 |      382312980 |          34020 |          45692 |         158760 |
    | wait/synch/mutex/innodb/rw_lock_list_mutex            |       8286 |      380714040 |          34020 |          45946 |        3549420 |
    | wait/synch/mutex/innodb/kernel_mutex                  |        848 |      326206440 |          34020 |         384677 |       93589020 |
    | wait/synch/mutex/innodb/log_sys_mutex                 |       1658 |      222532380 |          34020 |         134217 |        3961440 |
    | wait/synch/mutex/innodb/rseg_mutex                    |       2847 |      204581160 |          34020 |          71858 |         283500 |
    | wait/synch/mutex/innodb/buf_pool_mutex                |       3204 |      160389180 |          34020 |          50059 |         298620 |
    | wait/synch/mutex/innodb/fil_system_mutex              |       2022 |      119584080 |          34020 |          59141 |         298620 |
    | wait/synch/mutex/innodb/flush_list_mutex              |        773 |       29253420 |          34020 |          37844 |         136080 |
    | wait/synch/mutex/innodb/recv_sys_mutex                |        166 |       10069920 |          45360 |          60662 |         147420 |
    | wait/synch/mutex/innodb/log_flush_order_mutex         |        147 |        6486480 |          34020 |          44125 |         173880 |
    | wait/synch/mutex/innodb/purge_sys_mutex               |         62 |        3938760 |          41580 |          63528 |         124740 |
    | wait/synch/mutex/innodb/trx_undo_mutex                |         31 |        3424680 |          34020 |         110473 |         181440 |
    | wait/synch/mutex/innodb/dict_sys_mutex                |         37 |        3073140 |          41580 |          83057 |         321300 |
    | wait/synch/mutex/innodb/trx_doublewrite_mutex         |         59 |        2683800 |          34020 |          45488 |         158760 |
    | wait/synch/mutex/innodb/autoinc_mutex                 |         24 |        2540160 |          56700 |         105840 |         215460 |
    | wait/synch/mutex/innodb/thr_local_mutex               |         11 |         774900 |          41580 |          70445 |         136080 |
    | wait/synch/mutex/innodb/srv_innodb_monitor_mutex      |         10 |         627480 |          41580 |          62748 |          94500 |
    | wait/synch/mutex/innodb/innobase_share_mutex          |          3 |         340200 |          60480 |         113400 |         173880 |
    | wait/synch/mutex/innodb/file_format_max_mutex         |          5 |         272160 |          34020 |          54432 |          86940 |
    | wait/synch/mutex/innodb/ibuf_mutex                    |          3 |         117180 |          34020 |          39060 |          41580 |
    | wait/synch/mutex/innodb/rw_lock_mutex                 |          0 |              0 |              0 |              0 |              0 |
    | wait/synch/mutex/innodb/syn_arr_mutex                 |          0 |              0 |              0 |              0 |              0 |
    | wait/synch/mutex/innodb/srv_dict_tmpfile_mutex        |          0 |              0 |              0 |              0 |              0 |
    ...
    

    This is a standard MySQL 5.5.10 build for OS X, which has had similar kinds of statements run against it. We can see that wait/synch/mutex/innodb/rw_lock_mutex has had 0 waits, and wait/synch/mutex/innodb/thr_local_mutex has only had 11 waits globally. None are anywhere near close to the top wait events. This build most certainly does have atomics enabled.

    So for those of you running 5.5 in production now:

    What are your top waits?

    A Big Bag of Epic Awesomeness

    I tried to come up with a number of topics for this post, but none seemed to really convey what I really feel.. And really this blog is about all of them..

    • A look at PERFORMANCE_SCHEMA.STAGES and PERFORMANCE_SCHEMA.STATEMENTS
    • Graphing MySQL Statement Execution
    • A MySQL visual EXPLAIN plan with PERFORMANCE_SCHEMA
    • Tracing Sessions with PERFORMANCE_SCHEMA

    If that doesn’t whet your appetite (and trust me, I need to, this post is long, but I feel is worth reading all the way to the end), then let me start out by asking the question:

    Wouldn’t you like to be able to trace what a SQL statement did, either in the same or another session, on a production instance, after the fact? Wouldn’t you like to know where all of it’s time was spent, and some statistics on what it was doing specifically?

    I know, I know, you can do this with the slow query log to some extent, you can do it with external tools like MySQL Enterprise Monitor to some extent… But these are all a very high level. How about something like:

    InnoDB PK Lookup

    InnoDB PK Lookup

    Yes? Not sure what you’re seeing here anyway? Think this is an April Fools? Well then read on…

    As Marc Alff announced recently, performance_schema development within mysql-trunk (5.6.x) has come on in leaps and bounds, with 4 new levels of instrumentation added. The first two that Marc lists, table IO and table lock instrumentation, are in the same class as previous instrumentation – wait/. This means that they are instantly tracked and aggregated in the same way as previous 5.5 instrumentation, in the wait events and events waits summary tables (although they too have new summary tables as well, I won’t talk about those here).

    The next two however, stages and statement instrumentation are each individually new classes of instrumentation on their own. This means they get a new set of tables that track their events in both raw and aggregate form. Here’s a recent table listing of a build from launchpad (for comparison, performance_schema in 5.5 has 17 tables):

    mysql> show tables;
    +---------------------------------------------------+
    | Tables_in_performance_schema                      |
    +---------------------------------------------------+
    | cond_instances                                    |
    | events_stages_current                             |
    | events_stages_history                             |
    | events_stages_history_long                        |
    | events_stages_summary_by_thread_by_event_name     |
    | events_stages_summary_global_by_event_name        |
    | events_statements_current                         |
    | events_statements_history                         |
    | events_statements_history_long                    |
    | events_statements_summary_by_thread_by_event_name |
    | events_statements_summary_global_by_event_name    |
    | events_waits_current                              |
    | events_waits_history                              |
    | events_waits_history_long                         |
    | events_waits_summary_by_instance                  |
    | events_waits_summary_by_thread_by_event_name      |
    | events_waits_summary_global_by_event_name         |
    | file_instances                                    |
    | file_summary_by_event_name                        |
    | file_summary_by_instance                          |
    | mutex_instances                                   |
    | objects_summary_global_by_type                    |
    | performance_timers                                |
    | rwlock_instances                                  |
    | setup_actors                                      |
    | setup_consumers                                   |
    | setup_instruments                                 |
    | setup_objects                                     |
    | setup_timers                                      |
    | table_io_waits_summary_by_index_usage             |
    | table_io_waits_summary_by_table                   |
    | table_lock_waits_summary_by_table                 |
    | threads                                           |
    +---------------------------------------------------+
    33 rows in set (0.00 sec)
    

    Each of the non-summary “events_stages” tables have this structure:

    mysql> desc events_stages_history_long;
    +--------------------+----------------------------------+------+-----+---------+-------+
    | Field              | Type                             | Null | Key | Default | Extra |
    +--------------------+----------------------------------+------+-----+---------+-------+
    | THREAD_ID          | int(11)                          | NO   |     | NULL    |       |
    | EVENT_ID           | bigint(20) unsigned              | NO   |     | NULL    |       |
    | EVENT_NAME         | varchar(128)                     | NO   |     | NULL    |       |
    | SOURCE             | varchar(64)                      | YES  |     | NULL    |       |
    | TIMER_START        | bigint(20) unsigned              | YES  |     | NULL    |       |
    | TIMER_END          | bigint(20) unsigned              | YES  |     | NULL    |       |
    | TIMER_WAIT         | bigint(20) unsigned              | YES  |     | NULL    |       |
    | NESTING_EVENT_ID   | bigint(20) unsigned              | YES  |     | NULL    |       |
    | NESTING_EVENT_TYPE | enum('STATEMENT','STAGE','WAIT') | YES  |     | NULL    |       |
    +--------------------+----------------------------------+------+-----+---------+-------+
    9 rows in set (0.01 sec)
    

    And each of the non-summary “events_statements” tables have this structure:

    mysql> desc events_statements_history_long;
    +-------------------------+----------------------------------+------+-----+---------+-------+
    | Field                   | Type                             | Null | Key | Default | Extra |
    +-------------------------+----------------------------------+------+-----+---------+-------+
    | THREAD_ID               | int(11)                          | NO   |     | NULL    |       |
    | EVENT_ID                | bigint(20) unsigned              | NO   |     | NULL    |       |
    | EVENT_NAME              | varchar(128)                     | NO   |     | NULL    |       |
    | SOURCE                  | varchar(64)                      | YES  |     | NULL    |       |
    | TIMER_START             | bigint(20) unsigned              | YES  |     | NULL    |       |
    | TIMER_END               | bigint(20) unsigned              | YES  |     | NULL    |       |
    | TIMER_WAIT              | bigint(20) unsigned              | YES  |     | NULL    |       |
    | LOCK_TIME               | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SQL_TEXT                | longtext                         | YES  |     | NULL    |       |
    | CURRENT_SCHEMA          | varchar(64)                      | YES  |     | NULL    |       |
    | OBJECT_TYPE             | varchar(64)                      | YES  |     | NULL    |       |
    | OBJECT_SCHEMA           | varchar(64)                      | YES  |     | NULL    |       |
    | OBJECT_NAME             | varchar(64)                      | YES  |     | NULL    |       |
    | OBJECT_INSTANCE_BEGIN   | bigint(20)                       | YES  |     | NULL    |       |
    | MYSQL_ERRNO             | int(11)                          | YES  |     | NULL    |       |
    | RETURNED_SQLSTATE       | varchar(5)                       | YES  |     | NULL    |       |
    | MESSAGE_TEXT            | varchar(128)                     | YES  |     | NULL    |       |
    | ERRORS                  | bigint(20) unsigned              | NO   |     | NULL    |       |
    | WARNINGS                | bigint(20) unsigned              | NO   |     | NULL    |       |
    | ROWS_AFFECTED           | bigint(20) unsigned              | NO   |     | NULL    |       |
    | ROWS_SENT               | bigint(20) unsigned              | NO   |     | NULL    |       |
    | ROWS_EXAMINED           | bigint(20) unsigned              | NO   |     | NULL    |       |
    | CREATED_TMP_DISK_TABLES | bigint(20) unsigned              | NO   |     | NULL    |       |
    | CREATED_TMP_TABLES      | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SELECT_FULL_JOIN        | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SELECT_FULL_RANGE_JOIN  | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SELECT_RANGE            | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SELECT_RANGE_CHECK      | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SELECT_SCAN             | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SORT_MERGE_PASSES       | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SORT_RANGE              | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SORT_ROWS               | bigint(20) unsigned              | NO   |     | NULL    |       |
    | SORT_SCAN               | bigint(20) unsigned              | NO   |     | NULL    |       |
    | NO_INDEX_USED           | bigint(20) unsigned              | NO   |     | NULL    |       |
    | NO_GOOD_INDEX_USED      | bigint(20) unsigned              | NO   |     | NULL    |       |
    | NESTING_EVENT_ID        | bigint(20) unsigned              | YES  |     | NULL    |       |
    | NESTING_EVENT_TYPE      | enum('STATEMENT','STAGE','WAIT') | YES  |     | NULL    |       |
    +-------------------------+----------------------------------+------+-----+---------+-------+
    37 rows in set (0.01 sec)

    So what are “stages”? As Marc says, they are “Major execution phases of a statement”, to classic MySQL users, they are what you know as the “State” column within SHOW PROCESSLIST for a running session, or the “Status” within a SHOW PROFILE snapshot when run within your own session.

    However, in performance_schema they not only named differently (yea, who knows why), they are tracked individually and over time for every instrumented thread (as Marc also noted, we can now filter instrumentation on threads as well, though I won’t talk about that in this post).

    So, for example if I wanted to see the stages that “thread 15″ went through:

    mysql> select * from events_stages_history_long where thread_id = 15;
    +-----------+----------+--------------------------------+--------------------+-----------------+-----------------+------------+------------------+--------------------+
    | THREAD_ID | EVENT_ID | EVENT_NAME                     | SOURCE             | TIMER_START     | TIMER_END       | TIMER_WAIT | NESTING_EVENT_ID | NESTING_EVENT_TYPE |
    +-----------+----------+--------------------------------+--------------------+-----------------+-----------------+------------+------------------+--------------------+
    |        15 |        8 | stage/sql/init                 | sql_parse.cc:936   | 169201734212160 | 169202108901150 |  374688990 |                7 | STATEMENT          |
    |        15 |       16 | stage/sql/checking permissions | sql_parse.cc:4718  | 169202108901150 | 169202121817170 |   12916020 |                7 | STATEMENT          |
    |        15 |       17 | stage/sql/Opening tables       | sql_base.cc:4805   | 169202121817170 | 169202185009910 |   63192740 |                7 | STATEMENT          |
    |        15 |       18 | stage/sql/init                 | sql_select.cc:3584 | 169202185009910 | 169202259037630 |   74027720 |                7 | STATEMENT          |
    |        15 |       20 | stage/sql/optimizing           | sql_select.cc:1761 | 169202259037630 | 169202307953380 |   48915750 |                7 | STATEMENT          |
    |        15 |       21 | stage/sql/executing            | sql_select.cc:2778 | 169202307953380 | 169202352895550 |   44942170 |                7 | STATEMENT          |
    |        15 |       23 | stage/sql/end                  | sql_select.cc:3622 | 169202352895550 | 169202374889730 |   21994180 |                7 | STATEMENT          |
    |        15 |       24 | stage/sql/query end            | sql_parse.cc:4408  | 169202374889730 | 169202381928320 |    7038590 |                7 | STATEMENT          |
    |        15 |       25 | stage/sql/closing tables       | sql_parse.cc:4460  | 169202381928320 | 169202385916980 |    3988660 |                7 | STATEMENT          |
    |        15 |       26 | stage/sql/freeing items        | sql_parse.cc:5614  | 169202385916980 | 169202440796870 |   54879890 |                7 | STATEMENT          |
    |        15 |       27 | stage/sql/cleaning up          | sql_parse.cc:1482  | 169202440796870 | 169202453663880 |   12867010 |                7 | STATEMENT          |
    ...

    If you scroll to the right, you can see something that hasn’t been talked about a lot so far with performance_schema – it’s concept of “event nesting”. Each of the tables has a NESTING_EVENT_ID column, which links to it’s parent, and a NESTING_EVENT_TYPE column, which defines the parent event class – as either STATEMENT, STAGE or WAIT.

    5.5 has the NESTING_EVENT_ID column in the wait tables, but it is always NULL. Now however this is used throughout, and the stages themselves point to a parent event – 7 in the above case, of a STATEMENT class. So let’s take a look at what that statement was:

    mysql> select * from events_statements_history_long where event_id = 7\G
    *************************** 1. row ***************************
                  THREAD_ID: 15
                   EVENT_ID: 7
                 EVENT_NAME: statement/sql/select
                     SOURCE: sql_parse.cc:935
                TIMER_START: 410446612073000
                  TIMER_END: 410447340515000
                 TIMER_WAIT: 728442000
                  LOCK_TIME: 0
                   SQL_TEXT: select @@version_comment limit 1
             CURRENT_SCHEMA: NULL
                OBJECT_TYPE: NULL
              OBJECT_SCHEMA: NULL
                OBJECT_NAME: NULL
      OBJECT_INSTANCE_BEGIN: NULL
                MYSQL_ERRNO: 0
          RETURNED_SQLSTATE: NULL
               MESSAGE_TEXT: NULL
                     ERRORS: 0
                   WARNINGS: 0
              ROWS_AFFECTED: 0
                  ROWS_SENT: 1
              ROWS_EXAMINED: 0
    CREATED_TMP_DISK_TABLES: 0
         CREATED_TMP_TABLES: 0
           SELECT_FULL_JOIN: 0
     SELECT_FULL_RANGE_JOIN: 0
               SELECT_RANGE: 0
         SELECT_RANGE_CHECK: 0
                SELECT_SCAN: 0
          SORT_MERGE_PASSES: 0
                 SORT_RANGE: 0
                  SORT_ROWS: 0
                  SORT_SCAN: 0
              NO_INDEX_USED: 0
         NO_GOOD_INDEX_USED: 0
           NESTING_EVENT_ID: NULL
         NESTING_EVENT_TYPE: NULL
    1 row in set (0.00 sec)
    

    We use the adjacency list model to link the events together, so each event just points to it’s parent. From a pure SQL perspective this has some downsides, but given that we don’t update this data with SQL, the only real downside is not being able to represent the tree view nicely in a pure SQL output – it’s hard to find the depth of the tree, to be able to indent your output in the columns.

    So I had to find another way to represent this nicely, otherwise it’s just not that obvious how freaking awesome of an advancement in MySQL statement tracing this is.. Yet I love making SQL do things for me without having to wrap a program around it, so I thought I’d do another take on Shlomi Noach’s graphs with SQL.

    My approach is slightly different though, I wanted to be able to generate a true graph, like the picture above. The DOT language is perfect for representing that hierarchy, and is supported by a number of graphing libraries and tools – including Graphviz, which I used to generate my own images here.

    And so, without further ado, here’s how to generate an entire hierarchical stack trace of what a thread has done, with what is essentially a single SQL statement, in DOT format:

    CREATE DATABASE IF NOT EXISTS ps_helper;
    USE ps_helper;
    
    DROP PROCEDURE IF EXISTS dump_thread_stack;
    
    DELIMITER $$
    
    CREATE PROCEDURE dump_thread_stack(thd_id INT, debug INT)
    BEGIN
    
    	SELECT 'digraph events { rankdir=LR; nodesep=0.10;';
    	SELECT CONCAT('// Stack created: ', NOW());
    	SELECT CONCAT('// MySQL version: ', VERSION());
    	SELECT CONCAT('// MySQL user: ', CURRENT_USER());
    
    	SELECT CONCAT(IF(nesting_event_id IS NOT NULL, CONCAT(nesting_event_id, ' -> '), ''), 
    	              event_id, '; ', event_id, ' [label="',
    	              '(',ROUND(timer_wait/1000000, 2),'μ) ',
    	              IF (event_name NOT LIKE 'wait/io%', SUBSTRING_INDEX(event_name, '/', -2), event_name),
    	              IF (event_name LIKE 'statement/%', IFNULL(CONCAT('\n', wait_info), ''), ''),
    	              IF (debug AND event_name LIKE 'wait%', wait_info, ''),
    	              '", ', 
    	              CASE WHEN event_name LIKE 'wait/io/file%' THEN 
    	                     'shape=box, style=filled, color=red'
    	                   WHEN event_name LIKE 'wait/io/table%' THEN 
    	                     'shape=box, style=filled, color=lightcoral'
    	                   WHEN event_name LIKE 'wait/synch/mutex%' THEN
    	                     'style=filled, color=lightskyblue'
    	                   WHEN event_name LIKE 'wait/synch/cond%' THEN
    	                     'style=filled, color=darkseagreen3'
    	                   WHEN event_name LIKE 'wait/synch/rwlock%' THEN
    	                     'style=filled, color=orchid'
    	                   WHEN event_name LIKE 'wait/lock%' THEN
    	                     'shape=box, style=filled, color=tan'
    	                   WHEN event_name LIKE 'statement/%' THEN
    	                     CONCAT('shape=box, style=bold', 
    	                            IF((timer_wait/1000000000000) > @@long_query_time, ' style=filled, color=red', ''))
    	                   WHEN event_name LIKE 'stage/%' THEN
    	                     'style=filled, color=slategray3'
    	                   ELSE '' END,
    	               '];') event	
    	  FROM (
    	       (SELECT thread_id, event_id, event_name, timer_wait, timer_start, nesting_event_id, 
    	               CONCAT(sql_text, '\n',
    	                      'errors: ', errors, '\n',
    	                      'warnings: ', warnings, '\n',
    	                      'lock time: ', ROUND(lock_time/1000000, 2),'μ\n',
    	                      'rows affected: ', rows_affected, '\n',
    	                      'rows sent: ', rows_sent, '\n',
    	                      'rows examined: ', rows_examined, '\n',
    	                      'tmp tables: ', created_tmp_tables, '\n',
    	                      'tmp disk tables: ', created_tmp_disk_tables, '\n'
    	                      'select scan: ', select_scan, '\n',
    	                      'select full join: ', select_full_join, '\n',
    	                      'select full range join: ', select_full_range_join, '\n',
    	                      'select range: ', select_range, '\n',
    	                      'select range check: ', select_range_check, '\n', 
    	                      'sort merge passes: ', sort_merge_passes, '\n',
    	                      'sort rows: ', sort_rows, '\n',
    	                      'sort range: ', sort_range, '\n',
    	                      'sort scan: ', sort_scan, '\n',
    	                      'no index used: ', IF(no_index_used, 'TRUE', 'FALSE'), '\n',
    	                      'no good index used: ', IF(no_good_index_used, 'TRUE', 'FALSE'), '\n'
    	                      ) AS wait_info
    	          FROM performance_schema.events_statements_history_long)
    	       UNION 
    	       (SELECT thread_id, event_id, event_name, timer_wait, timer_start, nesting_event_id, null AS wait_info
    	          FROM performance_schema.events_stages_history_long) 
    	       UNION 
    	       (SELECT thread_id, event_id, 
    	               CONCAT(event_name, 
    	                      IF(event_name NOT LIKE 'wait/synch/mutex%', IFNULL(CONCAT(' - ', operation), ''), ''), 
    	                      IF(number_of_bytes IS NOT NULL, CONCAT(' ', number_of_bytes, ' bytes'), ''),
    	                      IF(event_name LIKE 'wait/io/file%', '\n', ''),
    	                      IF(object_schema IS NOT NULL, CONCAT('\nObject: ', object_schema, '.'), ''), IFNULL(object_name, ''),
    	                      IF(index_name IS NOT NULL, CONCAT(' Index: ', index_name), ''), '\n'
    	                      ) AS event_name,
    	               timer_wait, timer_start, nesting_event_id, source AS wait_info
    	          FROM performance_schema.events_waits_history_long)) events 
    	 WHERE thread_id = thd_id
    	 ORDER BY event_id;
    	 
    	 SELECT '}';
    	
    END$$
    
    DELIMITER ;
    

    The first parameter to the procedure is the thread id that you want to dump the stack for. The second is a debug flag, which includes the source file and line number that wait events came from for each node.

    To find the thread that you are interested in, you should check the performance_schema.threads table, which includes everything you should need to link this back to a running connection:

    mysql> select thread_id, processlist_id, processlist_user, processlist_host
        ->   from threads
        ->  where type = 'foreground';
    +-----------+----------------+------------------+------------------+
    | thread_id | processlist_id | processlist_user | processlist_host |
    +-----------+----------------+------------------+------------------+
    |        18 |              4 | root             | localhost        |
    +-----------+----------------+------------------+------------------+
    1 row in set (0.01 sec)
    

    You can either call this stored procedure directly and output all of it’s result sets directly to a file in your own language, or be like me, and be lazy, and call it like this:

    $ mysql -BN -u root -e "call ps_helper.dump_thread_stack(17, 0)" > ~/Desktop/stack.dot

    Load the dot file in to your favorite graphing solution, and you will get the full stack of event information that is available within performance_schema for that thread. If you’re lucky enough to capture a full trace of what the session did since it started, you might get a thread stack looking something like this:

    Full MyISAM Thread Stack

    Full Stack

    Statements at the top of the graph are the most recent, with the earliest statement executed at the bottom of the graph. The legend for the graph would be:

    • Grey – Stages
    • Blue – Mutexes
    • Orchid – RW Locks
    • Tan – Table Locks
    • Light Coral – Table IO
    • Red – File IO

    If the statement box is a solid red colour, then the statement ran for longer than the long_query_time setting for the instance the trace was generated on.

    You will not always be this lucky however, wait information within performance_schema degrades very quickly depending how you have it configured, especially on busy systems. Each table holds this many events of each class by default:

    
    mysql> show variables like 'perf%events%size';
    +--------------------------------------------------------+-------+
    | Variable_name                                          | Value |
    +--------------------------------------------------------+-------+
    | performance_schema_events_stages_history_long_size     | 1000  |
    | performance_schema_events_stages_history_size          | 10    |
    | performance_schema_events_statements_history_long_size | 1000  |
    | performance_schema_events_statements_history_size      | 10    |
    | performance_schema_events_waits_history_long_size      | 10000 |
    | performance_schema_events_waits_history_size           | 10    |
    +--------------------------------------------------------+-------+
    6 rows in set (0.00 sec)
    

    The each statement can have 10-15 or more stages per statement, and 10’s to 1000’s of wait events individually, so you will need to use some good judgement on how to size this, depending on your use cases.

    However, when you do have a degraded output, this still gives you quite a bit of information for the statements anyway:

    Degraded Stack

    Degraded Stack

    I’ve found it to be quite an eye opener in some respects, to see how many times certain things are locked in the background.

    Here’s a few examples:

    For the eagle eyed and in the know – these graphs actually show a bug in the underlying performance_schema instrumentation in the release I am testing against, in two areas. First, some wait events are getting linked up to the wrong stages. We also seem to double account for IO in InnoDB in some cases. We’re tracking those things down at the moment..

    But even with those wrinkles, I think it’s obvious what kind of value this is going to provide in 5.6. And there’s more – I’ll have to talk about filtering, and the other summary views, at some point as well!

    I’m pretty sure my topic was the right choice now.. :)