Monitoring Table and Index IO with PERFORMANCE_SCHEMA

Another new feature of performance_schema within 5.6 is the ability to monitor both table and index IO – this was added in two parts:

If you read my blog on graphing statement wait times – some of the data that WL#4895 exposes is seen within the examples there, they are within the “light coral” coloured blocks, such as the following from the InnoDB PK lookup example I gave:

Index IO Example

Index IO Example

This shows the level of detail you can get per table IO operation, per statement.

However, as Mark Callaghan noted recently, some of the more valuable insight on how tables / indexes are accessed can be gleaned by looking at these statistics, with some further aggregation, over time.

This is what WL#5379 covers, and what I’ll focus on here. It implements two new tables within performance_schema:

Both tables have this basic structure (although table_io_waits_summary_by_index_usage also adds an INDEX_NAME column, and performs it’s aggregation grouping by the index name as well):

mysql> DESC table_io_waits_summary_by_table;
+------------------+---------------------+------+-----+---------+-------+
| Field            | Type                | Null | Key | Default | Extra |
+------------------+---------------------+------+-----+---------+-------+
| OBJECT_TYPE      | varchar(64)         | YES  |     | NULL    |       |
| OBJECT_SCHEMA    | varchar(64)         | YES  |     | NULL    |       |
| OBJECT_NAME      | varchar(64)         | YES  |     | NULL    |       |
| COUNT_STAR       | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_WAIT   | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_READ       | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_READ   | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_WRITE      | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_WRITE  | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_FETCH      | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_FETCH  | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_INSERT     | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_INSERT | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_UPDATE     | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_UPDATE | bigint(20) unsigned | NO   |     | NULL    |       |
| COUNT_DELETE     | bigint(20) unsigned | NO   |     | NULL    |       |
| SUM_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
| MIN_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
| AVG_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
| MAX_TIMER_DELETE | bigint(20) unsigned | NO   |     | NULL    |       |
+------------------+---------------------+------+-----+---------+-------+

We aggregate usage data at three different levels within these tables, per table / index, for your convenience:

All ops, with the COUNT_STAR and *_WAIT columns
   Read ops, with the *_READ columns
       Fetch ops, with the *_FETCH columns
   Write ops, with the *_WRITE columns
       Insert ops, with the *_INSERT columns
       Update ops, with the *_UPDATE columns
       Delete ops, with the *_DELETE columns

For those of you that are used to the TABLE_STATISTICS / INDEX_STATISTICS patches from Google, or one of our “siblings” that has incorporated these, you may see the similarity in some of these statistics.

Mark also noted about these patches:

It also is very easy to use — select * from easy_to_remember_table_name. I hope the P_S can also provide that.

I think people will get used to the names of the tables over time (I certainly have, in the testing that I have been doing). However, if these tables either are forgettable (erm, SHOW TABLES, or use a client with auto-completing), or provide too much data by default, or you already have tooling around the Google patches, I would suggest using something that databases are good at – Views.

Here’s how to return the same data as the TABLE_STATISTICS patch:

CREATE DATABASE IF NOT EXISTS ps_helper;
USE ps_helper;

DROP VIEW IF EXISTS table_statistics;

CREATE VIEW table_statistics AS 
SELECT pst.object_schema AS table_schema, 
       pst.object_name AS table_name, 
       pst.count_read AS rows_read, 
       pst.count_write AS rows_changed,
       (pst.count_write * COUNT(psi.index_name)) AS rows_changed_x_indexes
  FROM performance_schema.table_io_waits_summary_by_table AS pst
  LEFT JOIN performance_schema.table_io_waits_summary_by_index_usage AS psi 
    ON pst.object_schema = psi.object_schema AND pst.object_name = psi.object_name
   AND psi.index_name IS NOT NULL
 GROUP BY pst.object_schema, pst.object_name
 ORDER BY pst.sum_timer_wait DESC;

mysql> SELECT * FROM ps_helper.table_statistics LIMIT 5;
+--------------+-----------------------+-----------+--------------+------------------------+
| table_schema | table_name            | rows_read | rows_changed | rows_changed_x_indexes |
+--------------+-----------------------+-----------+--------------+------------------------+
| mysql        | time_zone_leap_second |        23 |            0 |                      0 |
| test         | t1                    |         4 |            1 |                      2 |
| mysql        | slave_master_info     |         1 |            0 |                      0 |
| mysql        | user                  |         5 |            0 |                      0 |
| mysql        | db                    |         3 |            0 |                      0 |
+--------------+-----------------------+-----------+--------------+------------------------+
5 rows in set (0.00 sec)

This solution also orders the results by the SUM_TIMER_WAIT column for each table in the background, which is where we offer more insight in to table usage with performance_schema.

Not only do we count the operations against the tables, but we can also time them as well (if timing is enabled), allowing you to track higher contended tables, not just highly used tables – the result always returns the highest overhead tables at the top, making “top N” queries much easier than with TABLE_STATISTICS (which appears to return rows in some random order).

Personally I’d be tempted to add more columns from our tables though – breaking down by operation type for a start.

And for INDEX_STATISTICS:

DROP VIEW IF EXISTS index_statistics;

CREATE VIEW index_statistics AS
SELECT object_schema AS table_schema,
       object_name AS table_name,
       index_name,
       count_read AS rows_read
  FROM performance_schema.table_io_waits_summary_by_index_usage
 WHERE index_name IS NOT NULL
 ORDER BY sum_timer_wait DESC;

mysql> select * from index_statistics limit 5;
+--------------+-----------------------+------------+-----------+
| table_schema | table_name            | index_name | rows_read |
+--------------+-----------------------+------------+-----------+
| mysql        | time_zone_leap_second | PRIMARY    |        23 |
| mysql        | slave_master_info     | PRIMARY    |         1 |
| test         | t1                    | PRIMARY    |         1 |
| test         | t1                    | j          |         1 |
| mysql        | tables_priv           | PRIMARY    |         1 |
+--------------+-----------------------+------------+-----------+
5 rows in set (0.01 sec)

Again, we order by the SUM_TIMER_WAIT column, to find the highest overhead indexes first, for an easy “top N”.

However, there is another WHERE clause here – WHERE index_name IS NOT NULL – which is worth mentioning. Within the table_io_waits_summary_by_index_usage table, not only do we aggregate operations per index, we also aggregate how many operations did not use indexes when accessing the tables. We do this with an extra row per table, where the INDEX_NAME column is NULL:

mysql> SHOW CREATE TABLE test.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `j` varchar(10) DEFAULT NULL,
  `k` int(11) DEFAULT NULL,
  PRIMARY KEY (`i`),
  KEY `j` (`j`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SELECT object_schema, 
    ->        object_name, 
    ->        index_name, 
    ->        count_fetch,
    ->        count_insert,
    ->        count_update,
    ->        count_delete
    ->   FROM performance_schema.table_io_waits_summary_by_index_usage 
    ->  WHERE object_schema = 'test' 
    ->    AND object_name = 't1';
+---------------+-------------+------------+-------------+--------------+--------------+--------------+
| object_schema | object_name | index_name | count_fetch | count_insert | count_update | count_delete |
+---------------+-------------+------------+-------------+--------------+--------------+--------------+
| test          | t1          | PRIMARY    |           9 |            0 |            0 |            0 |
| test          | t1          | j          |       30803 |            0 |            0 |            0 |
| test          | t1          | NULL       |         101 |        30722 |            4 |            3 |
+---------------+-------------+------------+-------------+--------------+--------------+--------------+
3 rows in set (0.01 sec)

INSERT statements will always be counted against the “null row” per table (an INSERT never directly uses an index, although it may add to them). Currently for the *_UPDATE and *_DELETE columns, we also seem to be putting all aggregation in to the “null row” as well – though I think this is a bug (it is at least undefined in the WL), so I’ve opened Bug#60905 for this. That would allow us to also get more information on which specific access methods, per table, are causing the scans.

Like the Google patches, it is now easy to find out which indexes are not being used as well. However, the performance_schema implementation allows this in a much more elegant way. Where previously you would have had to LEFT JOIN against the INFORMATION_SCHEMA.STATISTICS table as Vadim shows in his example – which incurs a huge penalty on systems with a large number of tables, just look at the statement runtime in the examples there – performance_schema has one row per index in each table, no matter whether it has been read or not, and does not need to hit the disk to gather information around the tables – all of it’s data is entirely in memory, no high overhead JOIN operations are necessary.

To give a view example again:

DROP VIEW IF EXISTS unused_indexes;
 
CREATE VIEW unused_indexes AS
SELECT object_schema,
       object_name,
       index_name
  FROM performance_schema.table_io_waits_summary_by_index_usage 
 WHERE index_name IS NOT NULL
   AND count_star = 0
 ORDER BY object_schema, object_name;

mysql> SELECT * FROM ps_helper.unused_indexes WHERE object_schema = 'test';
+---------------+-------------+------------+
| object_schema | object_name | index_name |
+---------------+-------------+------------+
| test          | t2          | PRIMARY    |
| test          | t2          | j          |
+---------------+-------------+------------+
2 rows in set (0.00 sec)

But given the extra data within performance_schema, it is also possible to find the tables which cause full table scans:


DROP VIEW IF EXISTS table_scans;

CREATE VIEW table_scans AS
SELECT object_schema, 
       object_name,
       count_read AS rows_full_scanned
  FROM performance_schema.table_io_waits_summary_by_index_usage 
 WHERE index_name IS NULL
   AND count_read > 0
 ORDER BY count_read DESC;

mysql> SELECT * FROM ps_helper.table_scans WHERE object_schema = 'test';
+---------------+-------------+-------------------+
| object_schema | object_name | rows_full_scanned |
+---------------+-------------+-------------------+
| test          | t1          |                 5 |
| test          | t2          |                 3 |
+---------------+-------------+-------------------+
2 rows in set (0.01 sec)

We also track temporary tables within this instrumentation:

mysql> SELECT * 
    ->   FROM performance_schema.table_io_waits_summary_by_table 
    ->  WHERE  object_type = 'TEMPORARY TABLE'\G
*************************** 1. row ***************************
     OBJECT_TYPE: TEMPORARY TABLE
   OBJECT_SCHEMA: test
     OBJECT_NAME: t1_temp
      COUNT_STAR: 18
  SUM_TIMER_WAIT: 117369000
  MIN_TIMER_WAIT: 117180
  AVG_TIMER_WAIT: 6520500
  MAX_TIMER_WAIT: 31952340
      COUNT_READ: 12
  SUM_TIMER_READ: 94311000
  MIN_TIMER_READ: 117180
  AVG_TIMER_READ: 7858998
  MAX_TIMER_READ: 31952340
     COUNT_WRITE: 6
 SUM_TIMER_WRITE: 23058000
 MIN_TIMER_WRITE: 147420
 AVG_TIMER_WRITE: 3842748
 MAX_TIMER_WRITE: 13400100
...

All on top of the per-wait information that can be tied to threads, stages, and statements! Possibilities abound! :)

One comment

  1. Hi Mark.

    Great post, as usual.

    There is also an Easter egg in the performance schema table io instrumentation …

    The statistics reported in the performance schema are updated … LIVE.

    Try a statement that runs for a long time, for example:

    mysql> insert into test.t1 (select rand() from test.t1);
    Query OK, 2097152 rows affected (2 min 25.96 sec)
    Records: 2097152  Duplicates: 0  Warnings: 0
    

    While it runs, execute this statement many times:

    select object_type, object_schema, object_name, count_star
      from table_io_waits_summary_by_table where object_name='t1';
    

    And see how the statistics collected are changing:

    +-------------+---------------+-------------+------------+
    | object_type | object_schema | object_name | count_star |
    +-------------+---------------+-------------+------------+
    | TABLE       | test          | t1          |    8201455 |
    +-------------+---------------+-------------+------------+
    1 row in set (0.01 sec)
    
    +-------------+---------------+-------------+------------+
    | object_type | object_schema | object_name | count_star |
    +-------------+---------------+-------------+------------+
    | TABLE       | test          | t1          |    8454169 |
    +-------------+---------------+-------------+------------+
    1 row in set (0.00 sec)
    

    — Marc

Got something to say?