Monitoring MySQL IO Latency with performance_schema

Baron pointed to two really good articles recently by Brendan Gregg on Filesystem Latency (part1 / part2), that I too would recommend everybody read.

They rightly point out that trying to correlate IO statistics from tools such as iostat to a databases workload, especially when that database is on a shared resource, is fraught with complications – and further to that, not the full picture anyway, even on a dedicated system. Brendan points out that to get the full picture, you really need to be instrumenting your profiling at the read/write layer of the application – before getting to per filesystem / disk statistics.

Brendan has followed up on his posts with a third on how to track these filesystem calls from a running MySQL process with DTrace, which again I recommend. In this post Brendan says the following, linking to my blog on graphing statement execution in 5.6 (thanks Brendan):

If you are on MySQL 5.6 and later, it looks like you may also be able to 
get this information from the wait/io events from the new performance schema 
additions (I havenâ??t tried).

This is true.

However I’d like to show how filesystem latency can be monitored from the current MySQL 5.5 GA version, with performance_schema, on all platforms.

I’ve written about tracking IO with performance_schema in the past, and in retrospect, realize that I should have talked more about monitoring the latency aspects of IO as well, my bad. :) That post looked at the file IO summary tables.

Now let us take a look at what we can get from the wait event summary tables for IO.

When event timing is enabled within performance_schema, every single wait event is a measure of latency, whether that be to grab a synchronization point, or perform a file IO operation, or even later in 5.6 to take a table lock, perform table IO or network IO etc. – up to executing an entire statement or stage (including all locks, file IO, synch points etc. below them).

From the file IO perspective, all of these are instrumented around the layer that Brendan points to – interactions with the filesystem, not the disks, from within MySQL. You can see exactly which file IO events are instrumented within the performance_schema.setup_instruments table:

mysql> select * from setup_instruments where name like 'wait/io/file/%';
+--------------------------------------+---------+-------+
| NAME                                 | ENABLED | TIMED |
+--------------------------------------+---------+-------+
| wait/io/file/sql/map                 | YES     | YES   |
| wait/io/file/sql/binlog              | YES     | YES   |
| wait/io/file/sql/binlog_index        | YES     | YES   |
| wait/io/file/sql/relaylog            | YES     | YES   |
| wait/io/file/sql/relaylog_index      | YES     | YES   |
...
| wait/io/file/myisam/data_tmp         | YES     | YES   |
| wait/io/file/myisam/dfile            | YES     | YES   |
| wait/io/file/myisam/kfile            | YES     | YES   |
| wait/io/file/myisam/log              | YES     | YES   |
| wait/io/file/myisammrg/MRG           | YES     | YES   |
| wait/io/file/innodb/innodb_data_file | YES     | YES   |
| wait/io/file/innodb/innodb_log_file  | YES     | YES   |
| wait/io/file/innodb/innodb_temp_file | YES     | YES   |
+--------------------------------------+---------+-------+
42 rows in set (0.01 sec)

Each instrument of the wait/io/file type tracks a certain form of IO that is performed by MySQL, such as wait/io/file/innodb/innodb_data_file tracking datafile IO for InnoDB, and wait/io/file/sql/binlog tracking IO to binary logs at the SQL layer.

We do this in much the same way that Brendan does with DTrace, though instead we do this by wrapping the original functions, such as os_file_read(), os_file_write() etc., with functions that do the timing around them.

Each of these original functions are now macros within 5.5. They each call a performance_schema specific inline function that then records the start and end times of the calls, and collects other data specific to the type of wait event, such as which file we are operating on, the kind of file operation, the number of bytes for each, and where the operation was initiated from in the code.

These in turn then call the original, now renamed functions, such as os_file_read_func() and os_file_write_func() (meaning that if you want to use Brendan’s solution with 5.5+, you should probably switch it to use these functions directly). Here’s InnoDB’s pfs_os_file_read_func() as an inline example:

pfs_os_file_read_func(
/*==================*/
	os_file_t	file,	/*!< in: handle to a file */
	void*		buf,	/*!< in: buffer where to read */
	ulint		offset,	/*!< in: least significant 32 bits of file
				offset where to read */
	ulint		offset_high,/*!< in: most significant 32 bits of
				offset */
	ulint		n,	/*!< in: number of bytes to read */
	const char*	src_file,/*!< in: file name where func invoked */
	ulint		src_line)/*!< in: line where the func invoked */	
{	
	ibool	result;
	struct PSI_file_locker*	locker = NULL;
	PSI_file_locker_state	state;
	
	register_pfs_file_io_begin(&state, locker, file, n, PSI_FILE_READ,
				   src_file, src_line);
	
	result = os_file_read_func(file, buf, offset, offset_high, n);
	
	register_pfs_file_io_end(locker, n);

	return(result);
}

Now we have the background out of the way, we can take a look at how the data is exposed.

I’m going to start with the raw stream of data that is exposed within the wait event tables, in the events_waits_history_long table specifically, as this gives the 10,000 most recent events within performance_schema (by default), giving a pretty up to date snapshot of what is happening “now”.

As with some of my previous examples, I’m going to use a helper view to do this. The following gets the latest IO events, also showing the originating thread, operation type, size and latency of each request:

DROP VIEW IF EXISTS latest_file_io;

CREATE VIEW latest_file_io AS
SELECT IF(id IS NULL, 
             CONCAT(SUBSTRING_INDEX(name, '/', -1), ':', thread_id), 
             CONCAT(user, '@', host, ':', id)
          ) thread, 
       SUBSTRING_INDEX(object_name, '/', -1) file, 
       timer_wait/1000000 latency_usec, 
       operation, 
       number_of_bytes bytes
  FROM performance_schema.events_waits_history_long 
  JOIN performance_schema.threads USING (thread_id)
  LEFT JOIN information_schema.processlist ON processlist_id = id
 WHERE object_name IS NOT NULL
 ORDER BY timer_start;

mysql> select * from latest_file_io;
+------------------------+--------------------+--------------+-----------+---------+
| thread                 | file               | latency_usec | operation | bytes   |
+------------------------+--------------------+--------------+-----------+---------+
| msandbox@localhost:119 | ib_logfile1        |     834.7232 | write     | 2035712 |
| msandbox@localhost:119 | ib_logfile1        |     833.2492 | write     | 2035712 |
| msandbox@localhost:119 | ib_logfile0        |      11.2195 | write     |     512 |
| msandbox@localhost:119 | ib_logfile0        |      10.8199 | write     |     512 |
| msandbox@localhost:119 | ib_logfile0        |     716.1718 | write     | 1697792 |
| msandbox@localhost:119 | ib_logfile0        |     715.5535 | write     | 1697792 |
| msandbox@localhost:119 | ib_logfile0        |    8649.3640 | sync      |    NULL |
| msandbox@localhost:119 | ib_logfile1        |   35845.0394 | sync      |    NULL |
| srv_master_thread:15   | ibdata1            |     242.0453 | write     |  540672 |
| srv_master_thread:15   | ibdata1            |     241.2235 | write     |  540672 |
| srv_master_thread:15   | ibdata1            |    3337.6979 | sync      |    NULL |
| srv_master_thread:15   | ibdata1            |       6.3638 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.4863 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5542 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.4524 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6032 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6070 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6145 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5240 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5391 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5429 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6899 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.4675 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5316 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5881 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6107 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5203 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5391 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6409 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5730 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5655 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6070 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7465 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6371 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7238 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7578 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7615 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7917 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6899 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7276 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.8219 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.8483 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7465 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.7804 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      27.4456 | write     |   16384 |
| io_handler_thread:6    | ibdata1            |      20.4560 | write     |   16384 |
| io_handler_thread:7    | ibdata1            |      25.5342 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      17.5757 | write     |   16384 |
| io_handler_thread:6    | ibdata1            |      18.2317 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      19.8604 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      16.9989 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      15.9320 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      16.9801 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      14.6351 | write     |   16384 |
| io_handler_thread:3    | ibdata1            |      15.5362 | write     |   16384 |
| io_handler_thread:7    | ibdata1            |     250.6145 | write     |  360448 |
| srv_master_thread:15   | ibdata1            |    5418.0932 | sync      |    NULL |
| srv_master_thread:15   | ibdata1            |      45.0779 | write     |   49152 |
| srv_master_thread:15   | ibdata1            |      44.1052 | write     |   49152 |
| srv_master_thread:15   | ibdata1            |     468.7128 | sync      |    NULL |
| srv_master_thread:15   | ibdata1            |       3.9283 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.6107 | write     |   16384 |
| srv_master_thread:15   | ibdata1            |       0.5429 | write     |   16384 |
| io_handler_thread:6    | ibdata1            |      22.2656 | write     |   16384 |
| io_handler_thread:4    | ibdata1            |      33.1119 | write     |   32768 |
| srv_master_thread:15   | ibdata1            |     757.6984 | sync      |    NULL |
| msandbox@localhost:92  | latest_file_io.frm |       6.9217 | stat      |    NULL |
| msandbox@localhost:92  | latest_file_io.frm |      13.3496 | open      |    NULL |
| msandbox@localhost:92  | latest_file_io.frm |      18.1299 | read      |    3222 |
| msandbox@localhost:92  | latest_file_io.frm |      50.2993 | close     |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |     278.9498 | create    |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |     145.4617 | create    |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |      48.4030 | write     |     176 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       7.2120 | write     |     100 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.6456 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.6833 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.5099 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.4797 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.4722 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.4722 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.4722 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.4420 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       3.4722 | write     |       7 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       9.8058 | chsize    |    1024 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |      15.6907 | close     |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |      98.9889 | close     |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |      13.2063 | open      |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |     150.3212 | read      |      24 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       1.9265 | seek      |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYI |       6.6427 | read      |     339 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |      13.3496 | open      |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |      69.0853 | write     |     696 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |       9.8020 | write     |     728 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |       6.2620 | write     |     696 |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |       4.2488 | tell      |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |       3.9736 | seek      |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |       1.2743 | seek      |    NULL |
| msandbox@localhost:92  | #sql3a48_5c_64.MYD |      26.1676 | read      |    2120 |
+------------------------+--------------------+--------------+-----------+---------+
98 rows in set (0.16 sec)

Just for comparison, the first thread in the output was executing this:

mysql> insert into t1 (select * from t1);
Query OK, 65536 rows affected (0.78 sec)
Records: 65536  Duplicates: 0  Warnings: 0

mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|             119 |
+-----------------+
1 row in set (0.00 sec)

In the version that I am using here (5.5.12) there is a bug in some cases of InnoDB IO (such as writing to the InnoDB log files) that double accounts for IO, so you have to do some manual work to exclude some rows (the second events of the same size but lower times) to find out just how much time was spent in this version. That annoys me. Luckily, we have since tracked down the problem there, and it’s been fixed and should be released in 5.5.13.

That results in 35845.0394 + 8649.3640 + 716.1718 + 11.2195 + 834.7232 = 46056.5179 microseconds, or roughly 0.046 seconds, out of the 0.78 seconds reported to the client. Most of the time at the filesystem layer was in doing an fsync() on the log files (which forced the data to hit the platters), rather than write().

The rest was spent doing “other things” in memory, which are then synced out-of-band by the background threads to the data files / filesystem outside of the query execution’s runtime. This is visible with srv_master_thread first doing a big write to the doublewrite buffer, and then merging those changes to the datafiles in 16K pages.

It’s also worth noting what is happening within the msandbox@localhost:92 thread at the end of the result. That is actually the file IO that was required to return the result of the query that I was executing. There will be a few events after that statement completed to clean them up etc.

The first 4 events against latest_file_io.frm are to get the structure of the view I was selecting against, and the rest are a result of the JOIN against INFORMATION_SCHEMA.PROCESSLIST, which in this case is forcing a temporary table to be created for the processlist result. In 5.6, there is no longer any need for this JOIN, as all the columns are available within the threads table, so the IO could be reduced to just the view definition then.

This section also helps to point out one of the deficiencies with using Brendan’s script (at least without extending it) to really see when filesystem interactions are the causes of overall latency – a large portion of the time spent there was in creating the temporary table files, not just writing to or reading from them. Likewise with the view and opening/closing.

This also helps to illustrate a point that Ronald Bradford has asked me in the past:

My work with the PERFORMANCE_SCHEMA has been limited to experimenting such as 
described by Tracking IO with performance_schema however a recent discussion 
with Mark Leith highlighted that with the performance_schema it will be possible 
to determine the exact size of disk based temporary tables created by the MySQL 
kernel for example.

Sorry for the delay Ronald, but I hope this shows a good example of exactly what kind of IO we track against temporary tables as well. If not, here’s a better one:

mysql> select file, round(sum(bytes)/1024/1204, 2) size_mb, round(sum(latency_usec), 2) latency_usec
    ->   from latest_file_io
    ->  where file like '#sql%' and operation = 'write'
    -> group by file, operation
    -> order by file;
+---------------------+---------+--------------+
| file                | size_mb | latency_usec |
+---------------------+---------+--------------+
| #sql3a48_575_2a.MYI |    0.00 |        10.11 |
| #sql3a48_575_2b.MYD |    2.11 |      7435.86 |
| #sql3a48_575_2b.MYI |    0.00 |        10.81 |
| #sql3a48_575_2c.MYD |    0.00 |        46.16 |
| #sql3a48_575_2c.MYI |    0.00 |        61.27 |
| #sql3a48_575_2d.MYD |    2.30 |      7638.26 |
| #sql3a48_575_2d.MYI |    0.00 |        63.24 |
+---------------------+---------+--------------+
7 rows in set (0.12 sec)

This may be a bit of a hit and miss approach, as this table only stores 10,000 events by default, and when you’re playing with temporary tables sometimes the number of file IO events could easily exceed this. Temporary tables are also very transient, MySQL doesn’t hold the file handlers open nor keep much metadata around once they are removed, so we don’t aggregate their data within the file summary tables at this time.

We do however summarize the events for regular table files, and the other files held open by the instance such as binary logs and log files etc. as shown above.

Each wait is summarized at 3 different levels within 5.5 – here are some examples, starting each with the raw output for the InnoDB file IO events:

Globally by event – events_waits_summary_global_by_event_name

mysql> select * from events_waits_summary_global_by_event_name
    ->  where event_name like 'wait/io/file/innodb/%';
+--------------------------------------+------------+----------------+----------------+----------------+----------------+
| EVENT_NAME                           | COUNT_STAR | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+--------------------------------------+------------+----------------+----------------+----------------+----------------+
| wait/io/file/innodb/innodb_data_file |        364 |   369575256440 |         395850 |     1015316638 |   241104845800 |
| wait/io/file/innodb/innodb_log_file  |         53 |   130454237420 |        2386410 |     2461400706 |    58808150830 |
| wait/io/file/innodb/innodb_temp_file |          0 |              0 |              0 |              0 |              0 |
+--------------------------------------+------------+----------------+----------------+----------------+----------------+
3 rows in set (0.00 sec)

These allow you to find your top IO consumers by latency, and can easily be joined against the file_summary_by_event_name table, to also show the number of bytes read and written per type of IO:

DROP VIEW IF EXISTS top_global_io_consumers_by_latency;

CREATE VIEW top_global_io_consumers_by_latency AS
SELECT event_name,
       count_star,
       ROUND(sum_timer_wait/1000000000000, 2) total_sec,
       ROUND(min_timer_wait/1000000, 2) min_usec,
       ROUND(avg_timer_wait/1000000, 2) avg_usec,
       ROUND(max_timer_wait/1000000, 2) max_usec,
       count_read,
       ROUND(sum_number_of_bytes_read/1024/1024, 2) read_mb,
       count_write,
       ROUND(sum_number_of_bytes_write/1024/1024, 2) write_mb
  FROM performance_schema.events_waits_summary_global_by_event_name 
  JOIN performance_schema.file_summary_by_event_name USING (event_name) 
 WHERE event_name LIKE 'wait/io/file/%' 
 ORDER BY sum_timer_wait DESC LIMIT 5;

mysql> select * from ps_helper.top_global_io_consumers_by_latency;
+--------------------------------------+------------+-----------+----------+----------+-----------+------------+---------+-------------+----------+
| event_name                           | count_star | total_sec | min_usec | avg_usec | max_usec  | count_read | read_mb | count_write | write_mb |
+--------------------------------------+------------+-----------+----------+----------+-----------+------------+---------+-------------+----------+
| wait/io/file/myisam/dfile            |    1828445 |     10.82 |     0.74 |     5.92 | 361525.96 |    1250257 | 2094.98 |      577762 |  1019.34 |
| wait/io/file/myisam/kfile            |       1212 |      0.34 |     1.12 |   283.54 | 109499.59 |        132 |    0.03 |         696 |     0.02 |
| wait/io/file/innodb/innodb_data_file |        407 |      0.27 |     7.36 |   652.37 |  17684.12 |        397 |   10.14 |           4 |     0.06 |
| wait/io/file/sql/file_parser         |        108 |      0.15 |     1.25 |  1355.50 |  41132.89 |         20 |    0.03 |           5 |     0.02 |
| wait/io/file/sql/FRM                 |       1027 |      0.12 |     1.08 |   115.64 |  41406.17 |        431 |    0.14 |           0 |     0.00 |
+--------------------------------------+------------+-----------+----------+----------+-----------+------------+---------+-------------+----------+
5 rows in set (0.11 sec)

This is my test instance, so there is no great amount of IO going on. It shows in this example that MyISAM data file IO is the top consumer, followed by MyISAM index IO, InnoDB data file IO, view parsing, and frm file parsing respectively.

Per Thread by event – events_waits_summary_by_thread_by_event_name

mysql> select * from events_waits_summary_by_thread_by_event_name 
    ->  where event_name like 'wait/io/file/innodb/%';
+-----------+--------------------------------------+------------+----------------+----------------+----------------+----------------+
| THREAD_ID | EVENT_NAME                           | COUNT_STAR | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+-----------+--------------------------------------+------------+----------------+----------------+----------------+----------------+
|         0 | wait/io/file/innodb/innodb_data_file |        297 |    17541591340 |        9183720 |       59062597 |     2244695700 |
|         0 | wait/io/file/innodb/innodb_log_file  |         25 |      594694880 |        2386410 |       23787795 |      291044000 |
|         0 | wait/io/file/innodb/innodb_temp_file |          0 |              0 |              0 |              0 |              0 |
|         3 | wait/io/file/innodb/innodb_data_file |          0 |              0 |              0 |              0 |              0 |
|         3 | wait/io/file/innodb/innodb_log_file  |          0 |              0 |              0 |              0 |              0 |
|         3 | wait/io/file/innodb/innodb_temp_file |          0 |              0 |              0 |              0 |              0 |
...
|         1 | wait/io/file/innodb/innodb_data_file |          0 |              0 |              0 |              0 |              0 |
|         1 | wait/io/file/innodb/innodb_log_file  |          0 |              0 |              0 |              0 |              0 |
|         1 | wait/io/file/innodb/innodb_temp_file |          0 |              0 |              0 |              0 |              0 |
+-----------+--------------------------------------+------------+----------------+----------------+----------------+----------------+
51 rows in set (0.00 sec)

These are the events summarized by thread, so that you can trace which threads – either foreground or background – are likely to be consuming the most IO. This is just from a latency perspective however, and can not easily be correlated back to the file summary tables in a meaningful way. This means you can not easily correlate the amount of IO done with the time taken per thread (unless you are polling the current / history tables, if they are enabled).

However, it can provide insight in to who is suffering the most from filesystem latency:

DROP VIEW IF EXISTS top_thread_io_consumers_by_latency;

CREATE VIEW top_thread_io_consumers_by_latency AS
SELECT IF(id IS NULL, 
             CONCAT(SUBSTRING_INDEX(name, '/', -1), ':', thread_id), 
             CONCAT(user, '@', host, ':', id)
          ) thread, 
       SUM(count_star) count_star,
       ROUND(SUM(sum_timer_wait)/1000000000000, 2) total_sec,
       ROUND(MIN(min_timer_wait)/1000000, 2) min_usec,
       ROUND(AVG(avg_timer_wait)/1000000, 2) avg_usec,
       ROUND(MAX(max_timer_wait)/1000000, 2) max_usec
  FROM performance_schema.events_waits_summary_by_thread_by_event_name 
  LEFT JOIN performance_schema.threads USING (thread_id) 
  LEFT JOIN information_schema.processlist ON processlist_id = id
 WHERE event_name LIKE 'wait/io/file/%'
   AND sum_timer_wait > 0
 GROUP BY thread
 ORDER BY total_sec DESC LIMIT 5;

mysql>  select * from top_thread_io_consumers_by_latency;
+----------------------+------------+-----------+----------+----------+----------+
| thread               | count_star | total_sec | min_usec | avg_usec | max_usec |
+----------------------+------------+-----------+----------+----------+----------+
| msandbox@localhost:3 |       2159 |      0.98 |     0.05 |  1271.01 | 56968.61 |
| main:0               |        971 |      0.24 |     1.08 |   257.33 | 14270.73 |
| srv_master_thread:15 |         54 |      0.03 |     0.46 |  1104.93 |  5547.38 |
| io_handler_thread:1  |          8 |      0.02 |     9.90 |  2285.61 | 10963.86 |
| io_handler_thread:7  |          3 |      0.00 |    45.94 |    67.70 |   102.48 |
+----------------------+------------+-----------+----------+----------+----------+
5 rows in set (0.15 sec)

Per Instance – events_waits_summary_by_instance

mysql> select * from events_waits_summary_by_instance 
    ->  where event_name like 'wait/io/file/innodb/%';
+--------------------------------------+-----------------------+------------+----------------+----------------+----------------+----------------+
| EVENT_NAME                           | OBJECT_INSTANCE_BEGIN | COUNT_STAR | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+--------------------------------------+-----------------------+------------+----------------+----------------+----------------+----------------+
| wait/io/file/innodb/innodb_data_file |            4680873728 |        364 |   369575256440 |         395850 |     1015316638 |   241104845800 |
| wait/io/file/innodb/innodb_log_file  |            4681729232 |         49 |   130426765430 |        2386410 |     2661770723 |    58808150830 |
| wait/io/file/innodb/innodb_log_file  |            4685789600 |          4 |       27471990 |        5877430 |        6867997 |        8271380 |
+--------------------------------------+-----------------------+------------+----------------+----------------+----------------+----------------+
3 rows in set (0.04 sec)

Finally, we end with all of the events summarized by the instance of the event. With performance_schema, if there are multiple instances of a certain event type, such as two InnoDB log files, or multiple InnoDB data files, or multiple instances of a specific mutex in memory etc. – each of those are also aggregated individually within this table.

In the above example, there is a single InnoDB datafile (ibdata1 – 4680873728), and two InnoDB log files (ib_logfile0 and ib_logfile1).

We have a bit of a disconnect here however within performance_schema – it is not possible at the moment to join the events_waits_summary_by_instance table to the file_summary_by_instance table. file_summary_by_instance uses the file_name column as the key, whilst events_waits_summary_by_instance only exposes the object_instance_begin column. We hope to rectify this within the 5.6 release, by adding the object_instance_begin column to the file_summary_by_instance table (and possibly aggregating more right inline with the other file summary tables).

This would then allow us to drill in by file to both the total amount of IO done, and the total/min/max/avg etc. latency involved in doing so, per file that it accessed from the database. I’m putting together a worklog for this, and will post it to the comments once it is public.

I’ve also now created a ps_helper page dedicated to collecting various views and procedures that help with looking in to performance_schema data. I’ve still got a number of things that I’m playing with that I hope to blog about and publish soon as well. :)