A Big Bag of Epic Awesomeness

I tried to come up with a number of topics for this post, but none seemed to really convey what I really feel.. And really this blog is about all of them..

  • A look at PERFORMANCE_SCHEMA.STAGES and PERFORMANCE_SCHEMA.STATEMENTS
  • Graphing MySQL Statement Execution
  • A MySQL visual EXPLAIN plan with PERFORMANCE_SCHEMA
  • Tracing Sessions with PERFORMANCE_SCHEMA

If that doesn’t whet your appetite (and trust me, I need to, this post is long, but I feel is worth reading all the way to the end), then let me start out by asking the question:

Wouldn’t you like to be able to trace what a SQL statement did, either in the same or another session, on a production instance, after the fact? Wouldn’t you like to know where all of it’s time was spent, and some statistics on what it was doing specifically?

I know, I know, you can do this with the slow query log to some extent, you can do it with external tools like MySQL Enterprise Monitor to some extent… But these are all a very high level. How about something like:

InnoDB PK Lookup

InnoDB PK Lookup

Yes? Not sure what you’re seeing here anyway? Think this is an April Fools? Well then read on…

As Marc Alff announced recently, performance_schema development within mysql-trunk (5.6.x) has come on in leaps and bounds, with 4 new levels of instrumentation added. The first two that Marc lists, table IO and table lock instrumentation, are in the same class as previous instrumentation – wait/. This means that they are instantly tracked and aggregated in the same way as previous 5.5 instrumentation, in the wait events and events waits summary tables (although they too have new summary tables as well, I won’t talk about those here).

The next two however, stages and statement instrumentation are each individually new classes of instrumentation on their own. This means they get a new set of tables that track their events in both raw and aggregate form. Here’s a recent table listing of a build from launchpad (for comparison, performance_schema in 5.5 has 17 tables):

mysql> show tables;
+---------------------------------------------------+
| Tables_in_performance_schema                      |
+---------------------------------------------------+
| cond_instances                                    |
| events_stages_current                             |
| events_stages_history                             |
| events_stages_history_long                        |
| events_stages_summary_by_thread_by_event_name     |
| events_stages_summary_global_by_event_name        |
| events_statements_current                         |
| events_statements_history                         |
| events_statements_history_long                    |
| events_statements_summary_by_thread_by_event_name |
| events_statements_summary_global_by_event_name    |
| events_waits_current                              |
| events_waits_history                              |
| events_waits_history_long                         |
| events_waits_summary_by_instance                  |
| events_waits_summary_by_thread_by_event_name      |
| events_waits_summary_global_by_event_name         |
| file_instances                                    |
| file_summary_by_event_name                        |
| file_summary_by_instance                          |
| mutex_instances                                   |
| objects_summary_global_by_type                    |
| performance_timers                                |
| rwlock_instances                                  |
| setup_actors                                      |
| setup_consumers                                   |
| setup_instruments                                 |
| setup_objects                                     |
| setup_timers                                      |
| table_io_waits_summary_by_index_usage             |
| table_io_waits_summary_by_table                   |
| table_lock_waits_summary_by_table                 |
| threads                                           |
+---------------------------------------------------+
33 rows in set (0.00 sec)

Each of the non-summary “events_stages” tables have this structure:

mysql> desc events_stages_history_long;
+--------------------+----------------------------------+------+-----+---------+-------+
| Field              | Type                             | Null | Key | Default | Extra |
+--------------------+----------------------------------+------+-----+---------+-------+
| THREAD_ID          | int(11)                          | NO   |     | NULL    |       |
| EVENT_ID           | bigint(20) unsigned              | NO   |     | NULL    |       |
| EVENT_NAME         | varchar(128)                     | NO   |     | NULL    |       |
| SOURCE             | varchar(64)                      | YES  |     | NULL    |       |
| TIMER_START        | bigint(20) unsigned              | YES  |     | NULL    |       |
| TIMER_END          | bigint(20) unsigned              | YES  |     | NULL    |       |
| TIMER_WAIT         | bigint(20) unsigned              | YES  |     | NULL    |       |
| NESTING_EVENT_ID   | bigint(20) unsigned              | YES  |     | NULL    |       |
| NESTING_EVENT_TYPE | enum('STATEMENT','STAGE','WAIT') | YES  |     | NULL    |       |
+--------------------+----------------------------------+------+-----+---------+-------+
9 rows in set (0.01 sec)

And each of the non-summary “events_statements” tables have this structure:

mysql> desc events_statements_history_long;
+-------------------------+----------------------------------+------+-----+---------+-------+
| Field                   | Type                             | Null | Key | Default | Extra |
+-------------------------+----------------------------------+------+-----+---------+-------+
| THREAD_ID               | int(11)                          | NO   |     | NULL    |       |
| EVENT_ID                | bigint(20) unsigned              | NO   |     | NULL    |       |
| EVENT_NAME              | varchar(128)                     | NO   |     | NULL    |       |
| SOURCE                  | varchar(64)                      | YES  |     | NULL    |       |
| TIMER_START             | bigint(20) unsigned              | YES  |     | NULL    |       |
| TIMER_END               | bigint(20) unsigned              | YES  |     | NULL    |       |
| TIMER_WAIT              | bigint(20) unsigned              | YES  |     | NULL    |       |
| LOCK_TIME               | bigint(20) unsigned              | NO   |     | NULL    |       |
| SQL_TEXT                | longtext                         | YES  |     | NULL    |       |
| CURRENT_SCHEMA          | varchar(64)                      | YES  |     | NULL    |       |
| OBJECT_TYPE             | varchar(64)                      | YES  |     | NULL    |       |
| OBJECT_SCHEMA           | varchar(64)                      | YES  |     | NULL    |       |
| OBJECT_NAME             | varchar(64)                      | YES  |     | NULL    |       |
| OBJECT_INSTANCE_BEGIN   | bigint(20)                       | YES  |     | NULL    |       |
| MYSQL_ERRNO             | int(11)                          | YES  |     | NULL    |       |
| RETURNED_SQLSTATE       | varchar(5)                       | YES  |     | NULL    |       |
| MESSAGE_TEXT            | varchar(128)                     | YES  |     | NULL    |       |
| ERRORS                  | bigint(20) unsigned              | NO   |     | NULL    |       |
| WARNINGS                | bigint(20) unsigned              | NO   |     | NULL    |       |
| ROWS_AFFECTED           | bigint(20) unsigned              | NO   |     | NULL    |       |
| ROWS_SENT               | bigint(20) unsigned              | NO   |     | NULL    |       |
| ROWS_EXAMINED           | bigint(20) unsigned              | NO   |     | NULL    |       |
| CREATED_TMP_DISK_TABLES | bigint(20) unsigned              | NO   |     | NULL    |       |
| CREATED_TMP_TABLES      | bigint(20) unsigned              | NO   |     | NULL    |       |
| SELECT_FULL_JOIN        | bigint(20) unsigned              | NO   |     | NULL    |       |
| SELECT_FULL_RANGE_JOIN  | bigint(20) unsigned              | NO   |     | NULL    |       |
| SELECT_RANGE            | bigint(20) unsigned              | NO   |     | NULL    |       |
| SELECT_RANGE_CHECK      | bigint(20) unsigned              | NO   |     | NULL    |       |
| SELECT_SCAN             | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_MERGE_PASSES       | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_RANGE              | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_ROWS               | bigint(20) unsigned              | NO   |     | NULL    |       |
| SORT_SCAN               | bigint(20) unsigned              | NO   |     | NULL    |       |
| NO_INDEX_USED           | bigint(20) unsigned              | NO   |     | NULL    |       |
| NO_GOOD_INDEX_USED      | bigint(20) unsigned              | NO   |     | NULL    |       |
| NESTING_EVENT_ID        | bigint(20) unsigned              | YES  |     | NULL    |       |
| NESTING_EVENT_TYPE      | enum('STATEMENT','STAGE','WAIT') | YES  |     | NULL    |       |
+-------------------------+----------------------------------+------+-----+---------+-------+
37 rows in set (0.01 sec)

So what are “stages”? As Marc says, they are “Major execution phases of a statement”, to classic MySQL users, they are what you know as the “State” column within SHOW PROCESSLIST for a running session, or the “Status” within a SHOW PROFILE snapshot when run within your own session.

However, in performance_schema they not only named differently (yea, who knows why), they are tracked individually and over time for every instrumented thread (as Marc also noted, we can now filter instrumentation on threads as well, though I won’t talk about that in this post).

So, for example if I wanted to see the stages that “thread 15″ went through:

mysql> select * from events_stages_history_long where thread_id = 15;
+-----------+----------+--------------------------------+--------------------+-----------------+-----------------+------------+------------------+--------------------+
| THREAD_ID | EVENT_ID | EVENT_NAME                     | SOURCE             | TIMER_START     | TIMER_END       | TIMER_WAIT | NESTING_EVENT_ID | NESTING_EVENT_TYPE |
+-----------+----------+--------------------------------+--------------------+-----------------+-----------------+------------+------------------+--------------------+
|        15 |        8 | stage/sql/init                 | sql_parse.cc:936   | 169201734212160 | 169202108901150 |  374688990 |                7 | STATEMENT          |
|        15 |       16 | stage/sql/checking permissions | sql_parse.cc:4718  | 169202108901150 | 169202121817170 |   12916020 |                7 | STATEMENT          |
|        15 |       17 | stage/sql/Opening tables       | sql_base.cc:4805   | 169202121817170 | 169202185009910 |   63192740 |                7 | STATEMENT          |
|        15 |       18 | stage/sql/init                 | sql_select.cc:3584 | 169202185009910 | 169202259037630 |   74027720 |                7 | STATEMENT          |
|        15 |       20 | stage/sql/optimizing           | sql_select.cc:1761 | 169202259037630 | 169202307953380 |   48915750 |                7 | STATEMENT          |
|        15 |       21 | stage/sql/executing            | sql_select.cc:2778 | 169202307953380 | 169202352895550 |   44942170 |                7 | STATEMENT          |
|        15 |       23 | stage/sql/end                  | sql_select.cc:3622 | 169202352895550 | 169202374889730 |   21994180 |                7 | STATEMENT          |
|        15 |       24 | stage/sql/query end            | sql_parse.cc:4408  | 169202374889730 | 169202381928320 |    7038590 |                7 | STATEMENT          |
|        15 |       25 | stage/sql/closing tables       | sql_parse.cc:4460  | 169202381928320 | 169202385916980 |    3988660 |                7 | STATEMENT          |
|        15 |       26 | stage/sql/freeing items        | sql_parse.cc:5614  | 169202385916980 | 169202440796870 |   54879890 |                7 | STATEMENT          |
|        15 |       27 | stage/sql/cleaning up          | sql_parse.cc:1482  | 169202440796870 | 169202453663880 |   12867010 |                7 | STATEMENT          |
...

If you scroll to the right, you can see something that hasn’t been talked about a lot so far with performance_schema – it’s concept of “event nesting”. Each of the tables has a NESTING_EVENT_ID column, which links to it’s parent, and a NESTING_EVENT_TYPE column, which defines the parent event class – as either STATEMENT, STAGE or WAIT.

5.5 has the NESTING_EVENT_ID column in the wait tables, but it is always NULL. Now however this is used throughout, and the stages themselves point to a parent event – 7 in the above case, of a STATEMENT class. So let’s take a look at what that statement was:

mysql> select * from events_statements_history_long where event_id = 7\G
*************************** 1. row ***************************
              THREAD_ID: 15
               EVENT_ID: 7
             EVENT_NAME: statement/sql/select
                 SOURCE: sql_parse.cc:935
            TIMER_START: 410446612073000
              TIMER_END: 410447340515000
             TIMER_WAIT: 728442000
              LOCK_TIME: 0
               SQL_TEXT: select @@version_comment limit 1
         CURRENT_SCHEMA: NULL
            OBJECT_TYPE: NULL
          OBJECT_SCHEMA: NULL
            OBJECT_NAME: NULL
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
      RETURNED_SQLSTATE: NULL
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
          ROWS_AFFECTED: 0
              ROWS_SENT: 1
          ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
     CREATED_TMP_TABLES: 0
       SELECT_FULL_JOIN: 0
 SELECT_FULL_RANGE_JOIN: 0
           SELECT_RANGE: 0
     SELECT_RANGE_CHECK: 0
            SELECT_SCAN: 0
      SORT_MERGE_PASSES: 0
             SORT_RANGE: 0
              SORT_ROWS: 0
              SORT_SCAN: 0
          NO_INDEX_USED: 0
     NO_GOOD_INDEX_USED: 0
       NESTING_EVENT_ID: NULL
     NESTING_EVENT_TYPE: NULL
1 row in set (0.00 sec)

We use the adjacency list model to link the events together, so each event just points to it’s parent. From a pure SQL perspective this has some downsides, but given that we don’t update this data with SQL, the only real downside is not being able to represent the tree view nicely in a pure SQL output – it’s hard to find the depth of the tree, to be able to indent your output in the columns.

So I had to find another way to represent this nicely, otherwise it’s just not that obvious how freaking awesome of an advancement in MySQL statement tracing this is.. Yet I love making SQL do things for me without having to wrap a program around it, so I thought I’d do another take on Shlomi Noach’s graphs with SQL.

My approach is slightly different though, I wanted to be able to generate a true graph, like the picture above. The DOT language is perfect for representing that hierarchy, and is supported by a number of graphing libraries and tools – including Graphviz, which I used to generate my own images here.

And so, without further ado, here’s how to generate an entire hierarchical stack trace of what a thread has done, with what is essentially a single SQL statement, in DOT format:

CREATE DATABASE IF NOT EXISTS ps_helper;
USE ps_helper;

DROP PROCEDURE IF EXISTS dump_thread_stack;

DELIMITER $$

CREATE PROCEDURE dump_thread_stack(thd_id INT, debug INT)
BEGIN

	SELECT 'digraph events { rankdir=LR; nodesep=0.10;';
	SELECT CONCAT('// Stack created: ', NOW());
	SELECT CONCAT('// MySQL version: ', VERSION());
	SELECT CONCAT('// MySQL user: ', CURRENT_USER());

	SELECT CONCAT(IF(nesting_event_id IS NOT NULL, CONCAT(nesting_event_id, ' -> '), ''), 
	              event_id, '; ', event_id, ' [label="',
	              '(',ROUND(timer_wait/1000000, 2),'μ) ',
	              IF (event_name NOT LIKE 'wait/io%', SUBSTRING_INDEX(event_name, '/', -2), event_name),
	              IF (event_name LIKE 'statement/%', IFNULL(CONCAT('\n', wait_info), ''), ''),
	              IF (debug AND event_name LIKE 'wait%', wait_info, ''),
	              '", ', 
	              CASE WHEN event_name LIKE 'wait/io/file%' THEN 
	                     'shape=box, style=filled, color=red'
	                   WHEN event_name LIKE 'wait/io/table%' THEN 
	                     'shape=box, style=filled, color=lightcoral'
	                   WHEN event_name LIKE 'wait/synch/mutex%' THEN
	                     'style=filled, color=lightskyblue'
	                   WHEN event_name LIKE 'wait/synch/cond%' THEN
	                     'style=filled, color=darkseagreen3'
	                   WHEN event_name LIKE 'wait/synch/rwlock%' THEN
	                     'style=filled, color=orchid'
	                   WHEN event_name LIKE 'wait/lock%' THEN
	                     'shape=box, style=filled, color=tan'
	                   WHEN event_name LIKE 'statement/%' THEN
	                     CONCAT('shape=box, style=bold', 
	                            IF((timer_wait/1000000000000) > @@long_query_time, ' style=filled, color=red', ''))
	                   WHEN event_name LIKE 'stage/%' THEN
	                     'style=filled, color=slategray3'
	                   ELSE '' END,
	               '];') event	
	  FROM (
	       (SELECT thread_id, event_id, event_name, timer_wait, timer_start, nesting_event_id, 
	               CONCAT(sql_text, '\n',
	                      'errors: ', errors, '\n',
	                      'warnings: ', warnings, '\n',
	                      'lock time: ', ROUND(lock_time/1000000, 2),'μ\n',
	                      'rows affected: ', rows_affected, '\n',
	                      'rows sent: ', rows_sent, '\n',
	                      'rows examined: ', rows_examined, '\n',
	                      'tmp tables: ', created_tmp_tables, '\n',
	                      'tmp disk tables: ', created_tmp_disk_tables, '\n'
	                      'select scan: ', select_scan, '\n',
	                      'select full join: ', select_full_join, '\n',
	                      'select full range join: ', select_full_range_join, '\n',
	                      'select range: ', select_range, '\n',
	                      'select range check: ', select_range_check, '\n', 
	                      'sort merge passes: ', sort_merge_passes, '\n',
	                      'sort rows: ', sort_rows, '\n',
	                      'sort range: ', sort_range, '\n',
	                      'sort scan: ', sort_scan, '\n',
	                      'no index used: ', IF(no_index_used, 'TRUE', 'FALSE'), '\n',
	                      'no good index used: ', IF(no_good_index_used, 'TRUE', 'FALSE'), '\n'
	                      ) AS wait_info
	          FROM performance_schema.events_statements_history_long)
	       UNION 
	       (SELECT thread_id, event_id, event_name, timer_wait, timer_start, nesting_event_id, null AS wait_info
	          FROM performance_schema.events_stages_history_long) 
	       UNION 
	       (SELECT thread_id, event_id, 
	               CONCAT(event_name, 
	                      IF(event_name NOT LIKE 'wait/synch/mutex%', IFNULL(CONCAT(' - ', operation), ''), ''), 
	                      IF(number_of_bytes IS NOT NULL, CONCAT(' ', number_of_bytes, ' bytes'), ''),
	                      IF(event_name LIKE 'wait/io/file%', '\n', ''),
	                      IF(object_schema IS NOT NULL, CONCAT('\nObject: ', object_schema, '.'), ''), IFNULL(object_name, ''),
	                      IF(index_name IS NOT NULL, CONCAT(' Index: ', index_name), ''), '\n'
	                      ) AS event_name,
	               timer_wait, timer_start, nesting_event_id, source AS wait_info
	          FROM performance_schema.events_waits_history_long)) events 
	 WHERE thread_id = thd_id
	 ORDER BY event_id;
	 
	 SELECT '}';
	
END$$

DELIMITER ;

The first parameter to the procedure is the thread id that you want to dump the stack for. The second is a debug flag, which includes the source file and line number that wait events came from for each node.

To find the thread that you are interested in, you should check the performance_schema.threads table, which includes everything you should need to link this back to a running connection:

mysql> select thread_id, processlist_id, processlist_user, processlist_host
    ->   from threads
    ->  where type = 'foreground';
+-----------+----------------+------------------+------------------+
| thread_id | processlist_id | processlist_user | processlist_host |
+-----------+----------------+------------------+------------------+
|        18 |              4 | root             | localhost        |
+-----------+----------------+------------------+------------------+
1 row in set (0.01 sec)

You can either call this stored procedure directly and output all of it’s result sets directly to a file in your own language, or be like me, and be lazy, and call it like this:

$ mysql -BN -u root -e "call ps_helper.dump_thread_stack(17, 0)" > ~/Desktop/stack.dot

Load the dot file in to your favorite graphing solution, and you will get the full stack of event information that is available within performance_schema for that thread. If you’re lucky enough to capture a full trace of what the session did since it started, you might get a thread stack looking something like this:

Full MyISAM Thread Stack

Full Stack

Statements at the top of the graph are the most recent, with the earliest statement executed at the bottom of the graph. The legend for the graph would be:

  • Grey – Stages
  • Blue – Mutexes
  • Orchid – RW Locks
  • Tan – Table Locks
  • Light Coral – Table IO
  • Red – File IO

If the statement box is a solid red colour, then the statement ran for longer than the long_query_time setting for the instance the trace was generated on.

You will not always be this lucky however, wait information within performance_schema degrades very quickly depending how you have it configured, especially on busy systems. Each table holds this many events of each class by default:


mysql> show variables like 'perf%events%size';
+--------------------------------------------------------+-------+
| Variable_name                                          | Value |
+--------------------------------------------------------+-------+
| performance_schema_events_stages_history_long_size     | 1000  |
| performance_schema_events_stages_history_size          | 10    |
| performance_schema_events_statements_history_long_size | 1000  |
| performance_schema_events_statements_history_size      | 10    |
| performance_schema_events_waits_history_long_size      | 10000 |
| performance_schema_events_waits_history_size           | 10    |
+--------------------------------------------------------+-------+
6 rows in set (0.00 sec)

The each statement can have 10-15 or more stages per statement, and 10′s to 1000′s of wait events individually, so you will need to use some good judgement on how to size this, depending on your use cases.

However, when you do have a degraded output, this still gives you quite a bit of information for the statements anyway:

Degraded Stack

Degraded Stack

I’ve found it to be quite an eye opener in some respects, to see how many times certain things are locked in the background.

Here’s a few examples:

For the eagle eyed and in the know – these graphs actually show a bug in the underlying performance_schema instrumentation in the release I am testing against, in two areas. First, some wait events are getting linked up to the wrong stages. We also seem to double account for IO in InnoDB in some cases. We’re tracking those things down at the moment..

But even with those wrinkles, I think it’s obvious what kind of value this is going to provide in 5.6. And there’s more – I’ll have to talk about filtering, and the other summary views, at some point as well!

I’m pretty sure my topic was the right choice now.. :)

12 comments

  1. Very nice! We’re approaching what I have often said I miss: the insane amount of instrumentation you can get from SQL Server. Now all we need is some Maatkit tools to make it easy to use :-)

Got something to say?