MySQL sys version 1.0.1 released

I’ve just released the 1.0.1 version of the MySQL sys schema. It contains a few bug fixes (including a contribution from Joe Grasse, thanks Joe!), and a number of new helper procedures around viewing and altering configuration for consumers and instruments, contributed by the MySQL QA team, as they’ve started to incorporate more Performance Schema trace data in to their automated testing.

Next up is the 1.1.0 release, that also includes a bunch of new host summary views that were contributed by Arnaud Adant (of the MySQL Support team). I have a number of new things in development to add as well before then though.

Let me know if there are things you’d like to see as well, maybe I can find time to work on those too.

Enjoy.

MySQL Connect and Oracle Open World Presentations Online

After a slight delay (travel and catching up with “real work”), I’ve now uploaded the talks that I gave at MySQL Connect and Oracle Open World.

They are available on my Presentations Page, and inline below for convenience. The “Introduction to MySQL Enterprise Monitor” talk was actually a full demo, but there are some screenshots of MEM 3.0 in there if you’re interested in seeing a high level picture of what it looks like now.

Thanks to all that attended my talks, I got a lot of good questions and feedback!

A MySQL Replication Load Average with Performance Schema

Monitoring MySQL’s replication has always been a bit hit and miss, especially when trying to detect whether a slave is becoming overloaded or not. There’s been a few ways to do this in the past:

  • Monitor the Seconds_behind_master variable from SHOW SLAVE STATUS, which is the difference between the timestamp passed down from the master that the event was executed, to the time that the SQL thread started executing the event.
  • Use something like mk-heartbeat, which operates in the same way as Seconds_Behind_Master (trying to show you the actual time difference between the master and slave), and is a little more robust in complex replication chains, and other situations where Seconds_Behind_Master falls down (such as when the IO thread is lagging).

These are fine for knowing how far out of date your slave is getting, but they don’t really tell you that your slave is becoming overloaded. In other words, this only lets you detect that the slave is lagging, not that it is going to lag.

You can also attempt to monitor replication lag according to the differences in where the two threads are reading / writing to, based on the binary log and relay log positions compared against the master. This allows you to dive a little more in to where the delays may be happening in the chain, rather than relying on a single time behind metric.

Here’s a (somewhat simplified and not entirely accurate – the IO thread reads via the master) representation of how an event moves through replication, and the points that can be monitored with current instrumentation (including time delayed replication within 5.6):

Now, I say “Delay” above, that’s not entirely accurate, as most of those metrics actually record the file, and position within that file, that each stage is at – they allow you to measure the number of bytes difference at each stage, not the latency of each of the stages.

This allows you to generate these metrics:

  • IO Thread Bytes Behind Master:
    • Master (File / Position) – Slave (Master_Log_File / Relay_Master_Log_Pos)
  • SQL Thread Bytes Behind IO Thread:
    • Slave (Master_Log_File / Relay_Master_Log_Pos) – Slave (Relay_Log_file / Relay_Log_Pos)
  • SQL Thread Bytes Behind Master:
    • Master (File / Position) – Slave (Relay_Log_file / Relay_Log_Pos)
  • Slave Bytes Behind Master:
    • Master (File / Position) – Slave (Relay_Master_Log_file / Exec_Master_Log_Pos)
  • SQL Thread Current Event Size:
    • Slave (Relay_Log_file / Relay_Log_Pos) – Slave (Relay_Master_Log_file / Exec_Master_Log_Pos)
  • Seconds Behind Overall (maybe):
    • Slave (Seconds_Behind_Master)

Actually generating this data consistently is problematic however. As it requires two statements to be executed on two entirely different systems, you will need to ensure that you execute these as close as possible to each other in time, so ntp will need to be installed on each system, and they will need to be kicked off on a precise clock based schedule on each server. Even then, they really can only be used as an indicator, not a source of truth, simply because of this.

The expressions above are slightly simplified as well. Logs roll over, so you also need to know the size that logs can get to using either max_binlog_size or max_relay_log_size (if non-zero), as appropriate, and calculate the difference depending on the number of logs between each position – not entirely simple.

But even with these metrics in hand, they don’t really tell us much – the size of an event within the replication stream can differ depending on whether statement based or row based binary logging is enabled. A “small” event could be a statement that takes a very long time to run, or a single row based event that executes in no time at all. YMMV.

So we can use these to profile the system, and graphing over time probably wouldn’t hurt, but relying on them as an indicator that you should start looking at a slave because of performance issues gets tough, unless you rely on something that tells you that you are already behind, in a reactive rather than proactive manner.

What we really need is an indicator that tells us that we should be looking in to performance problems before the slave even starts to lag at all.

And so we come back to the title of this blog. The idea is simple, we need a “Replication Load Average”, that can tell us how busy the slave really is – if it’s not executing 100% of the time, then the chances are it’s not really falling behind. If it’s starting to be busy 90-95% of the time, then the chances are, it could start to lag pretty soon. It’s not a new idea either, I’ve seen a couple of prominent people wishing for such a thing:

Peter Zaitsev: Idea: MySQL Should add “Replication Load Average” metric which would tell what time replication thread was busy processing events compared to being idle. This would help a lot to see when you’re close to the limit.

Jeremy Cole: I agree however that getting saturation information is important and is something we will potentially implement in Twitter MySQL. It would be useful to get essentially a %busy/%idle stat for the SQL thread.

I’m happy to say gentlemen, this is already within 5.5, and has been since it was released.

When the SQL thread has executed all events from the relay log, it enters MYSQL_BIN_LOG::wait_for_update_relay_log(), which then waits for a condition called update_cond to be raised. This is exactly where the slave goes “idle”. And luckily within Performance Schema in 5.5, we can track conditions. Conditions get timed within Performance Schema for the time they were waiting for the condition to be raised by another thread – so in the above case, the total time that the SQL thread was idle.

The condition waited for within wait_for_update_relay_log() can be mapped to the wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond event within Performance Schema – here’s a snapshot from a 5.5 server that is in the middle of a replication tree (it’s both a slave and a master, so has both relay logs and binary logs enabled – the wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond is a similar condition that is used by the IO thread logged in to a master, to signal that new events have been written to the binary log.):

mysql> select * from events_waits_summary_global_by_event_name order by sum_timer_wait desc limit 5;
+------------------------------------------------------------+------------+--------------------+----------------+----------------+----------------+
| EVENT_NAME                                                 | COUNT_STAR | SUM_TIMER_WAIT     | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+------------------------------------------------------------+------------+--------------------+----------------+----------------+----------------+
| wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond           |     595360 | 677596417470900396 |       15021936 |  1138128892553 | 20142735786288 |
| wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond             |      60466 | 167695645597814052 |      187123608 |  2773387450762 | 84468739641228 |
| wait/io/file/myisam/kfile                                  |    3237163 |    844640545485132 |         696600 |      260919992 |  1122678434952 |
| wait/io/file/myisam/dfile                                  |    4156623 |    377275756973940 |         206712 |       90764968 |  2911791051108 |
| wait/synch/cond/sql/COND_queue_state                       |         11 |    321025543065288 |   166075717824 | 29184140278662 | 59224373303796 |
+------------------------------------------------------------+------------+--------------------+----------------+----------------+----------------+

OK, so we can get to the raw data.. Great.. But how do we turn that in to a load average? The key is to read the values on a set interval, compute the deltas of the SUM_TIMER_WAIT statistic, and compare the delta – which is the idle time within the interval – to the overall interval.

As a quick proof of concept, I’ll use an event to do this, to call a procedure to gather the data, compute the deltas, and generate some extra statistics. Before boring you with the details of the implementation, let me first give a snapshot of the end result that we are striving for:

mysql> select tstamp, busy_pct, one_min_avg, five_min_avg, fifteen_min_avg
    ->   from slave_sql_load_average
    ->  order by tstamp desc limit 10;
+---------------------+----------+-------------+--------------+-----------------+
| tstamp              | busy_pct | one_min_avg | five_min_avg | fifteen_min_avg |
+---------------------+----------+-------------+--------------+-----------------+
| 2012-07-24 14:00:29 |    79.94 |       67.10 |        66.92 |           66.25 |
| 2012-07-24 14:00:24 |    39.97 |       67.10 |        66.93 |           66.15 |
| 2012-07-24 14:00:19 |    79.92 |       67.11 |        66.92 |           66.34 |
| 2012-07-24 14:00:14 |    84.31 |       67.09 |        66.93 |           66.24 |
| 2012-07-24 14:00:09 |    38.62 |       66.98 |        66.93 |           66.11 |
| 2012-07-24 14:00:04 |    83.26 |       67.03 |        66.95 |           66.31 |
| 2012-07-24 13:59:59 |    79.30 |       66.73 |        66.90 |           66.19 |
| 2012-07-24 13:59:54 |    39.97 |       66.81 |        66.91 |           66.09 |
| 2012-07-24 13:59:49 |    79.94 |       66.81 |        66.91 |           66.28 |
| 2012-07-24 13:59:44 |    79.95 |       66.80 |        66.91 |           66.18 |
+---------------------+----------+-------------+--------------+-----------------+

This gives what both Jeremy and Peter were asking for – a single “busy” counter, with some load average like statistics over 3 different intervals, with a snapshot every 5 seconds in this example. This allows us to see that there are peaks and troughs in the load of the slave on the shorter interval, but flattens it’s overall load with the larger averaged intervals to give you a better baseline of how busy the slave really is over time.

So first, let’s start with the procedure that generates this data:

DROP PROCEDURE IF EXISTS compute_slave_load_average;

DELIMITER $$

CREATE PROCEDURE compute_slave_load_average()
BEGIN
    DECLARE v_ps_enabled VARCHAR(3);
    DECLARE v_update_cond_enabled VARCHAR(3);
    DECLARE v_update_cond_timed VARCHAR(3);

    DECLARE v_wait_count BIGINT DEFAULT 0;
    DECLARE v_last_wait_count BIGINT DEFAULT 0;
    DECLARE v_wait_count_delta BIGINT DEFAULT 0;

    DECLARE v_wait_sum BIGINT DEFAULT 0;
    DECLARE v_last_wait_sum BIGINT DEFAULT 0;
    DECLARE v_wait_delta BIGINT DEFAULT 0;

    DECLARE v_last_tstamp DATETIME;
    DECLARE v_wait_sum_tstamp DATETIME;
    DECLARE v_time_diff BIGINT;

    DECLARE v_current_wait VARCHAR(128);
    DECLARE v_current_timer_end BIGINT;

    DECLARE v_busy_pct DECIMAL(5,2);
    DECLARE v_one_min_avg DECIMAL(5,2);
    DECLARE v_five_min_avg DECIMAL(5,2);
    DECLARE v_fifteen_min_avg DECIMAL(5,2);

    DECLARE v_insert_id BIGINT;
    
    /* Disable binary logging */
    SET sql_log_bin = 0;

    /* Check Performance Schema is enabled properly */
    SELECT VARIABLE_VALUE INTO v_ps_enabled
      FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES 
     WHERE VARIABLE_NAME = 'performance_schema';

    SELECT enabled, timed
      INTO v_update_cond_enabled, v_update_cond_timed
      FROM performance_schema.setup_instruments
     WHERE name = 'wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond';

    IF v_ps_enabled = 'OFF' THEN
        INSERT INTO ps_helper_logs (module, message)
        VALUES ('compute_slave_load_average', 'performance_schema is disabled');
    ELSEIF v_update_cond_enabled = 'NO' OR v_update_cond_timed = 'NO' THEN
        INSERT INTO ps_helper_logs (module, message)
        VALUES ('compute_slave_load_average', 
                CONCAT('performance_schema is not configured properly, 
                        the wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond event is currently
                        set to enabled: ', v_update_cond_enabled, ', timed: ', v_update_cond_timed, 
                        'within the setup_instruments table'));
    ELSE 
        /* Get the latest MYSQL_RELAY_LOG::update_cond wait info for the slave SQL thread */
        SELECT his.sum_timer_wait, his.count_star, cur.event_name, cur.timer_end, SYSDATE() 
          INTO v_wait_sum, v_wait_count, v_current_wait, v_current_timer_end, v_wait_sum_tstamp 
          FROM performance_schema.events_waits_summary_by_thread_by_event_name his
          JOIN performance_schema.threads thr USING (thread_id)
          JOIN performance_schema.events_waits_current cur USING (thread_id)
         WHERE his.event_name = 'wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond'
           AND name = 'thread/sql/slave_sql';

        /* Get the last summary for comparison */
        SELECT idle_sum, events_sum, tstamp 
          INTO v_last_wait_sum, v_last_wait_count, v_last_tstamp
          FROM slave_sql_load_average
         ORDER BY id DESC LIMIT 1;

        /* Compute the interval and event count deltas */
        SET v_time_diff = (UNIX_TIMESTAMP(v_wait_sum_tstamp) - UNIX_TIMESTAMP(v_last_tstamp)) * 1000000000000;
        SET v_wait_count_delta = v_wait_count - v_last_wait_count;

        /* Compute the delta busy percentages */
        IF (v_wait_sum != v_last_wait_sum AND v_wait_count_delta > 0) THEN
            /* There have been waits during the period, calculate the stats */
            SET v_wait_delta = v_wait_sum - v_last_wait_sum;

            IF (v_wait_delta > v_time_diff) THEN
                /* The last wait was longer than our current period, estimate waits in period */
                SET v_wait_delta = v_wait_delta % (v_time_diff * FLOOR(v_wait_delta/v_time_diff));
                SET v_busy_pct = 100 - ((v_wait_delta / v_time_diff) * 100);
            ELSE 
                /* In a normal period, calculate using raw wait delta */
                SET v_busy_pct = 100 - ((v_wait_delta / v_time_diff) * 100);
            END IF;
        ELSEIF (v_current_wait = 'wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond' 
                AND v_current_timer_end IS NULL) THEN
            /* Waiting 100% on a single event for the entire period, i.e 100% idle*/
            SET v_wait_delta = v_time_diff;
            SET v_busy_pct = 0.00;
        ELSE 
            /* Waiting 100% on a single event for the entire period that is not update_cond */
            SET v_wait_delta = v_time_diff;
            SET v_busy_pct = 100.00;
        END IF;

        /* Log the initial stats */
        INSERT INTO slave_sql_load_average 
               (idle_sum, idle_delta, idle_avg, idle_delta_formatted, 
                events_sum, events_delta, busy_pct, tstamp, current_wait)
        VALUES (v_wait_sum, v_wait_delta, format_time(v_wait_delta / v_wait_count_delta),
                format_time(v_wait_delta), v_wait_count, v_wait_count_delta, 
                v_busy_pct, v_wait_sum_tstamp, v_current_wait);

        SELECT LAST_INSERT_ID() INTO v_insert_id;

        /* Compute the averages taking the last interval in to account */
        SELECT SUM(busy_pct)/COUNT(*) INTO v_one_min_avg
          FROM slave_sql_load_average
         WHERE busy_pct IS NOT NULL 
           AND tstamp > SYSDATE() - INTERVAL 1 MINUTE;

        SELECT SUM(busy_pct)/COUNT(*) INTO v_five_min_avg
          FROM slave_sql_load_average
         WHERE busy_pct IS NOT NULL 
           AND tstamp > SYSDATE() - INTERVAL 5 MINUTE;

        SELECT SUM(busy_pct)/COUNT(*) INTO v_fifteen_min_avg
          FROM slave_sql_load_average
         WHERE busy_pct IS NOT NULL 
           AND tstamp > SYSDATE() - INTERVAL 15 MINUTE;

        UPDATE slave_sql_load_average SET 
               one_min_avg = v_one_min_avg,
               five_min_avg = v_five_min_avg,
               fifteen_min_avg = v_fifteen_min_avg
         WHERE id = v_insert_id;

        /* Purge anything older than 2 hours */            
        DELETE FROM slave_sql_load_average 
         WHERE tstamp < NOW() - INTERVAL 2 HOUR;
    END IF;

    /* Re-enable binary logging */
    SET sql_log_bin = 1;
END$$

DELIMITER ; 

At a high level, this gathers the wait data for the wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond event from the events_waits_summary_by_thread_by_event_name table for only the slave SQL thread, which is exposed as thread/sql/slave_sql, then grabs the last snapshot of data for comparison. It then looks at the current and previous wait info to determine how to to calculate the actual busy percentage based on how the event data has changed over the period, computes the deltas and busy percent for that last interval appropriately, and inserts that in to a table.

Finally it computes the moving averages for the 3 sets of defined intervals and updates the just inserted row with that data (so the moving average also takes in to account the current snapshot), and finally purges anything older than two hours, to store a rolling two hour window of 5 second snapshots.

It also disables binary logging (because these are local statistics that should not be propagated to downstream systems), and does some checking whether performance schema is enabled, and logs appropriately within a logging table if not.

The table structures needed for the procedure are as follows:

DROP TABLE IF EXISTS slave_sql_load_average;

CREATE TABLE slave_sql_load_average (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  tstamp TIMESTAMP,
  idle_avg VARCHAR(12),
  idle_delta_formatted VARCHAR(12),
  busy_pct DECIMAL(5,2),
  one_min_avg DECIMAL(5,2),
  five_min_avg DECIMAL(5,2),
  fifteen_min_avg DECIMAL(5,2),
  idle_sum BIGINT,
  idle_delta BIGINT,
  events_sum INT,
  events_delta INT,
  current_wait VARCHAR(128),
  KEY (tstamp)
) ENGINE = InnoDB;

DROP TABLE IF EXISTS ps_helper_logs;

CREATE TABLE ps_helper_logs (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  tstamp TIMESTAMP,
  module VARCHAR(64),
  message TEXT
) ENGINE = InnoDB;

The full output from the table looks like this:

mysql> select * from slave_sql_load_average order by tstamp desc limit 5;
+-----+---------------------+-----------+----------------------+----------+-------------+--------------+-----------------+--------------------+---------------+------------+--------------+--------------------------------------------------+
| id  | tstamp              | idle_avg  | idle_delta_formatted | busy_pct | one_min_avg | five_min_avg | fifteen_min_avg | idle_sum           | idle_delta    | events_sum | events_delta | current_wait                                     |
+-----+---------------------+-----------+----------------------+----------+-------------+--------------+-----------------+--------------------+---------------+------------+--------------+--------------------------------------------------+
| 106 | 2012-07-24 13:20:34 | 491.11 ms | 982.22 ms            |    80.36 |       67.07 |        65.29 |           65.98 | 689661649765993320 |  982222907152 |     605561 |            2 | wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond |
| 105 | 2012-07-24 13:20:29 | 500.80 ms | 1.00 s               |    79.97 |       67.07 |        65.28 |           65.84 | 689655667543086168 | 1001609993416 |     605559 |            2 | wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond |
| 104 | 2012-07-24 13:20:24 | 1.50 s    | 3.01 s               |    39.90 |       67.05 |        65.28 |           65.70 | 689649665933092752 | 3005124404328 |     605557 |            2 | wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond |
| 103 | 2012-07-24 13:20:19 | 331.17 ms | 993.52 ms            |    80.13 |       67.07 |        65.28 |           65.96 | 689646660808688424 |  993523754020 |     605555 |            3 | wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond |
| 102 | 2012-07-24 13:20:14 | 192.27 ms | 769.07 ms            |    84.62 |       67.04 |        65.29 |           65.82 | 689640667284934404 |  769065072628 |     605552 |            4 | wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond |
+-----+---------------------+-----------+----------------------+----------+-------------+--------------+-----------------+--------------------+---------------+------------+--------------+--------------------------------------------------+

And the event that kicks all of this off:

DROP EVENT IF EXISTS monitor_slave_load_average;

DELIMITER $$

CREATE EVENT IF NOT EXISTS monitor_slave_load_average 
ON SCHEDULE EVERY 5 SECOND DO
BEGIN
    CALL compute_slave_load_average();
END$$

DELIMITER ;

You’ll also need the format_time() function from ps_helper.

Here’s a graph of the data over time, including a period where the slave IO thread was turned off, where you can see the moving average windows decay more slowly than the last intervals statistics:

Finally, some notes on this method:

  • Importantly, this doesn’t require any knowledge of timestamps across systems, or require any synchronization for collection across systems
  • Yes this is not exactly like the Linux load average – which uses an Exponential Moving Average, which weights snapshots differently the older they get, but that would also be possible
  • The time delta recorded within 5.5 is only possible at a 1 second granularity, which makes us lose quite a bit of precision when compared to picosecond precision wait information, within 5.6 we can use microsecond precision with SYSDATE(6), and get even more accurate reporting – and note, SYSDATE() is used instead of NOW(), as within procedures NOW() returns the timestamp for when the procedure started, not when the function is called
  • The first 15 minutes worth of aggregate data should be taken with a pinch of salt, wait for the data set to “warm up”
  • This doesn’t necessarily negate the need for the other statistics described above – which can all be used to trace where latency might be, and in the case of Seconds_Behind_Master or a heartbeating implementation can still give information on how far behind you are getting, once your slave does really start to lag, as well as using something like Jeremy’s velocity metric (linked via is name above), which can give an idea of the general direction of of a slave once it’s started lagging.

With all that said though, I think this is the *key* metric that anybody should be tracking first and foremost. It is far better to fix problems or spread load before you start lagging.

The joys of digging through the different events in Performance Schema to see what they actually record! There’s gold in there still to be mined, but I think this one is a genuine 14 oz nugget.

Monitoring Processes with Performance Schema in MySQL 5.6

I’ve written before about how you can monitor the last wait per connection with Performance Schema in MySQL 5.5.

That solution joined the performance_schema.events_waits_current with the performance_schema.threads and INFORMATION_SCHEMA.PROCESSLIST tables.

Joining to INFORMATION_SCHEMA.PROCESSLIST is a necessary evil within MySQL 5.5, to be able to get the user and host information etc. – however it is certainly not optimal, because using INFORMATION_SCHEMA.PROCESSLIST (and it’s SHOW [FULL] PROCESSLIST cousin) incurs a bunch of mutexes to be locked – the fill_schema_processlist() function attempts to lock LOCK_thread_count globally (which will block new connections whilst the entire processlist output is generated), as well as tmp->LOCK_thd_data per thread that it iterates over (which could in turn stall each thread if they try to access this) and mysys_var->mutex which protects the mysys_var struct (used around killing connections).

If you are polling the processlist output frequently (I’ve seen some solutions out there poll it once a second, or even lower, sometimes), then the chances are, you are stalling real application connections as well. This is why we don’t use these commands within MySQL Enterprise Monitor for things like Query Analysis.

With that in mind, we have now extended the performance_schema.threads table in MySQL 5.6 to include all of the information that is available within the INFORMATION_SCHEMA.PROCESSLIST table:

mysql> desc performance_schema.threads;
+---------------------+------------------+------+-----+---------+-------+
| Field               | Type             | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+---------+-------+
| THREAD_ID           | int(11)          | NO   |     | NULL    |       |
| NAME                | varchar(128)     | NO   |     | NULL    |       |
| TYPE                | varchar(10)      | NO   |     | NULL    |       |
| PROCESSLIST_ID      | int(11)          | YES  |     | NULL    |       |
| PROCESSLIST_USER    | varchar(16)      | YES  |     | NULL    |       |
| PROCESSLIST_HOST    | varchar(60)      | YES  |     | NULL    |       |
| PROCESSLIST_DB      | varchar(64)      | YES  |     | NULL    |       |
| PROCESSLIST_COMMAND | varchar(16)      | YES  |     | NULL    |       |
| PROCESSLIST_TIME    | bigint(20)       | YES  |     | NULL    |       |
| PROCESSLIST_STATE   | varchar(64)      | YES  |     | NULL    |       |
| PROCESSLIST_INFO    | longtext         | YES  |     | NULL    |       |
| PARENT_THREAD_ID    | int(11)          | YES  |     | NULL    |       |
| ROLE                | varchar(64)      | YES  |     | NULL    |       |
| INSTRUMENTED        | enum('YES','NO') | NO   |     | NULL    |       |
+---------------------+------------------+------+-----+---------+-------+
14 rows in set (0.02 sec)

As with all other instrumentation within Performance Schema, this is entirely non-blocking. When you upgrade to MySQL 5.6, it’s time to start forgetting about SHOW PROCESSLIST, or INFORMATION_SCHEMA.PROCESSLIST. They’ve served us well for a long time, but it’s time to wave goodbye to their contentious ways.

There’s another bonus in the threads table within MySQL 5.6 as well, and that is the newly added INSTRUMENTED column. This allows you turn on and off instrumentation, per thread, dynamically. You do this by simply running an UPDATE:

mysql> select name, type, instrumented from threads;
+----------------------------------------+------------+--------------+
| name                                   | type       | instrumented |
+----------------------------------------+------------+--------------+
| thread/sql/main                        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/io_handler_thread        | BACKGROUND | YES          |
| thread/innodb/srv_lock_timeout_thread  | BACKGROUND | YES          |
| thread/innodb/srv_error_monitor_thread | BACKGROUND | YES          |
| thread/innodb/srv_monitor_thread       | BACKGROUND | YES          |
| thread/innodb/srv_master_thread        | BACKGROUND | YES          |
| thread/innodb/srv_purge_thread         | BACKGROUND | YES          |
| thread/innodb/page_cleaner_thread      | BACKGROUND | YES          |
| thread/sql/shutdown                    | BACKGROUND | YES          |
| thread/sql/manager                     | BACKGROUND | YES          |
| thread/sql/con_sockets                 | BACKGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
+----------------------------------------+------------+--------------+
27 rows in set (0.00 sec)

mysql> update threads set instrumented = 'NO' where name != 'thread/sql/one_connection';
Query OK, 20 rows affected (0.00 sec)
Rows matched: 20  Changed: 20  Warnings: 0

mysql> select name, type, instrumented from threads;
+----------------------------------------+------------+--------------+
| name                                   | type       | instrumented |
+----------------------------------------+------------+--------------+
| thread/sql/main                        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/io_handler_thread        | BACKGROUND | NO           |
| thread/innodb/srv_lock_timeout_thread  | BACKGROUND | NO           |
| thread/innodb/srv_error_monitor_thread | BACKGROUND | NO           |
| thread/innodb/srv_monitor_thread       | BACKGROUND | NO           |
| thread/innodb/srv_master_thread        | BACKGROUND | NO           |
| thread/innodb/srv_purge_thread         | BACKGROUND | NO           |
| thread/innodb/page_cleaner_thread      | BACKGROUND | NO           |
| thread/sql/shutdown                    | BACKGROUND | NO           |
| thread/sql/manager                     | BACKGROUND | NO           |
| thread/sql/con_sockets                 | BACKGROUND | NO           |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
| thread/sql/one_connection              | FOREGROUND | YES          |
+----------------------------------------+------------+--------------+
25 rows in set (0.00 sec)

This then enables/disables all instrumentation within Performance Schema for each thread appropriately.

Now we have all of this data directly within the threads table, we can also start joining it to some of the other tables within Performance Schema as well, to get even more data on what threads are currently doing within the database. Particularly, we can join to the new events_statements_current table as well, to get extra information on statement execution (you will need to ensure that the events_statements_current consumer is enabled within the setup_consumers table).

Here’s an example of another view I’m about to add to ps_helper:

DROP VIEW IF EXISTS processlist_full;

CREATE VIEW processlist_full AS
SELECT pps.thread_id AS thd_id,
       pps.processlist_id AS conn_id,
       IF(pps.name = 'thread/sql/one_connection', 
          CONCAT(pps.processlist_user, '@', pps.processlist_host), 
          REPLACE(pps.name, 'thread/', '')) user,
       pps.processlist_db AS db,
       pps.processlist_command AS command,
       pps.processlist_state AS state,
       pps.processlist_time AS time,
       format_statement(pps.processlist_info) AS current_statement,
       IF(esc.timer_wait IS NOT NULL,
          format_statement(esc.sql_text),
          NULL) AS last_statement,
       IF(esc.timer_wait IS NOT NULL,
          format_time(esc.timer_wait),
          NULL) as last_statement_latency,
       format_time(esc.lock_time) AS lock_latency,
       esc.rows_examined,
       esc.rows_sent,
       esc.rows_affected,
       esc.created_tmp_tables AS tmp_tables,
       esc.created_tmp_disk_tables as tmp_disk_tables,
       IF(esc.no_good_index_used > 0 OR esc.no_index_used > 0, 
          'YES', 'NO') AS full_scan,
       ewc.event_name AS last_wait,
       IF(ewc.timer_wait IS NULL AND ewc.event_name IS NOT NULL, 
          'Still Waiting', 
          format_time(ewc.timer_wait)) last_wait_latency,
       ewc.source
  FROM performance_schema.threads AS pps
  LEFT JOIN performance_schema.events_waits_current AS ewc USING (thread_id)
  LEFT JOIN performance_schema.events_statements_current as esc USING (thread_id)
ORDER BY pps.processlist_time DESC, last_wait_latency DESC;

And some sample output:

...
*************************** 3. row ***************************
                thd_id: 11617
               conn_id: 11596
                  user: root@localhost
                    db: mysql
               command: Sleep
                 state: cleaning up
                  time: 0
     current_statement: NULL
        last_statement: SELECT CAST(SUM_NUMBER_OF_BYTE ... nnodb/innodb_log_file' limit 1
last_statement_latency: 2.02 ms
          lock_latency: 0 ps
         rows_examined: 39
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: YES
             last_wait: idle
     last_wait_latency: Still Waiting
                source: mysqld.cc:848
*************************** 4. row ***************************
                thd_id: 12474
               conn_id: 12453
                  user: root@localhost
                    db: mem
               command: Sleep
                 state: cleaning up
                  time: 0
     current_statement: NULL
        last_statement: commit
last_statement_latency: 952.49 µs
          lock_latency: 0 ps
         rows_examined: 0
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
             last_wait: idle
     last_wait_latency: Still Waiting
                source: mysqld.cc:848
*************************** 5. row ***************************
                thd_id: 12400
               conn_id: 12379
                  user: root@localhost
                    db: ps_helper
               command: Query
                 state: Copying to tmp table
                  time: 0
     current_statement: SELECT pps.thread_id AS thd_id ... e DESC, last_wait_latency DESC
        last_statement: NULL
last_statement_latency: NULL
          lock_latency: 1.00 ms
         rows_examined: 0
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 1
       tmp_disk_tables: 0
             full_scan: YES
             last_wait: wait/synch/mutex/sql/THD::LOCK_thd_data
     last_wait_latency: 77.11 ns
                source: sql_class.h:3843
...

Listed here are two idle connections, and one active connection (which was the one gathering the data).

The events_statements_current table stores the currently executing, or last executed (if no currently executing), statement per thread. This means that this new form of processlist can not only tell you what is currently executing, but also what last executed for sleeping threads.

All of the statistics listed, such as lock latency, row stats, temporary table stats are for either the current or last statement too. So with “conn_id: 12453″ we can see that it last executed a COMMIT (in the last_statement column), which took 952.49 µs (in the last_statement_latency column).

I can’t count how many times I wanted to know the last statement that executed for sleeping connections (particularly those holding open a transaction), nor how many times customers and users asked that question whilst I was in the MySQL Support group, so this is a great step forward.

For idle connections, we now have a new “idle” wait type, which records how long sessions sit idle for as well. For sleeping connections this will always be the current wait (unless you disable the idle instrument). However for active connections, we show the last wait that happened within that statement’s execution.

You can see that the last session listed is copying to a temporary table, had lock latency of 1 ms, and last waited on the wait/synch/mutex/sql/THD::LOCK_thd_data event, which basically corresponds to the LOCK_thd_data mutex I was talking about above.

But hang on? I said above that SHOW PROCESSLIST etc. lock this in a blocking way as well right? And this is also grabbing this mutex? What gives? Well, performance_schema is true table storage engine, and the statement executed is still grabbing a read lock against the performance schema table. The statement itself is also creating a temporary table, doing it’s own thing within it’s own thread as well, and whilst doing that, threads tend to grab their own LOCK_thd_data mutex (there is one per thread), to protect access from other threads – for instance, we LOCK_thd_data to set the current query string, which protects SHOW FULL PROCESSLIST from reading a half written query string for it’s INFO. That doesn’t change for Performance Schema access – but it does help to show just the kind of contention that SHOW FULL PROCESSLIST can cause!

The above view only gets a subset of the available data per statement available within the events_statements_current table as well – I’ve omitted all of the sort / scan information for instance. You can add these to the view as you see fit.

I’ll be adding some views like this to ps_helper shortly. If you have any feedback in the meantime feel free to comment!

Helper Functions for ps_helper

I love our community.

Not long after posting my update on ps_helper, I had a couple of comments around the formatting of values within the output. Daniël van Eeden gave the suggestion that I could add a couple of Stored Functions, for formatting byte and time based values.

Of course, this was a great idea – not least for myself, because I no longer have to worry about how to format certain columns in the output.

I’ve added the following:

format_bytes()
format_time()
format_path()
format_statement()
extract_schema_from_file_name()
extract_table_from_file_name()

I’ve updated all of the views within ps_helper to use these new functions as well (I may have missed some).

Along the way, I caught a couple of bugs (a missing SUM on the COUNT columns of the file IO summaries), and added a couple of updates.

The first was to update the output for the schema_table_statistics view, to include more of the data available within Performance Schema, and also to start using some of the data from the INFORMATION_SCHEMA.INNODB_BUFFER_PAGE table that is now available within MySQL 5.6.

Now the output looks like this:

mysql> select * from schema_table_statistics limit 1\G
*************************** 1. row ***************************
                 table_schema: mem
                   table_name: mysqlserver
                 rows_fetched: 27087
                fetch_latency: 442.72 ms
                rows_inserted: 2
               insert_latency: 185.04 µs 
                 rows_updated: 5096
               update_latency: 1.39 s
                 rows_deleted: 0
               delete_latency: 0 ps
             io_read_requests: 2565
                io_read_bytes: 1121627
              io_read_latency: 10.07 ms
            io_write_requests: 1691
               io_write_bytes: 128383
             io_write_latency: 14.17 ms
             io_misc_requests: 2698
              io_misc_latency: 433.66 ms
          innodb_buffer_pages: 19
   innodb_buffer_pages_hashed: 19
      innodb_buffer_pages_old: 19
innodb_buffer_bytes_allocated: 311296
     innodb_buffer_bytes_data: 1924
    innodb_buffer_rows_cached: 2

I’ve also added another view within the 5.6 set, called statements_with_runtimes_in_95th_percentile.

This outputs all statement digests that have an average run time, in microseconds, that are above the 95th percentile of all statements. It’s output is similar to the statement_analysis view:

mysql> select * from statements_with_runtimes_in_95th_percentile where query not like 'show%';
+-------------------------------------------------------------------+-----------+------------+-----------+------------+---------------+-------------+-------------+-----------+---------------+--------------+----------------------------------+
| query                                                             | full_scan | exec_count | err_count | warn_count | total_latency | max_latency | avg_latency | rows_sent | rows_sent_avg | rows_scanned | digest                           |
+-------------------------------------------------------------------+-----------+------------+-----------+------------+---------------+-------------+-------------+-----------+---------------+--------------+----------------------------------+
| SELECT plugin_name FROM inform ... tus = ? ORDER BY plugin_name   | *         |        169 |         0 |          0 | 2.37 s        | 64.45 ms    | 14.03 ms    |      4394 |            26 |        10816 | 23234b56a0b1f1e350bf51bef3050747 |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 170.08 ms     | 5.68 ms     | 1.44 ms     |     13582 |           115 |        13582 | 34694223091aee1380c565076b7dfece |
| SELECT CAST ( SUM_NUMBER_OF_BY ... WHERE EVENT_NAME = ? LIMIT ?   | *         |        566 |         0 |          0 | 779.56 ms     | 2.93 ms     | 1.38 ms     |       342 |             1 |        17286 | 58d34495d29ad818e68c859e778b0dcb |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 153.35 ms     | 3.06 ms     | 1.30 ms     |     13228 |           112 |        13228 | b816579565d5a2882cb8bd496193dc00 |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 143.31 ms     | 4.57 ms     | 1.21 ms     |     13646 |           116 |        13646 | 27ff8681eb2c8cf999233e7507b439fe |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 143.04 ms     | 7.22 ms     | 1.21 ms     |     13584 |           115 |        13584 | 10b863f20e83dcd9c7782dac249acbb0 |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 137.46 ms     | 16.73 ms    | 1.16 ms     |     13922 |           118 |        13922 | 351ebc26af6babb67570843bcc97f6b0 |
| UPDATE `mem30__inventory` . `R ... mestamp` = ? WHERE `hid` = ?   |           |        114 |         0 |          0 | 127.64 ms     | 30.33 ms    | 1.12 ms     |         0 |             0 |          114 | f4ecf2aebe212e7ed250a0602d86c389 |
| UPDATE `mem30__inventory` . `I ... ` = ? , `hasOldBlocksTime` ... |           |         56 |         0 |          0 | 61.05 ms      | 16.41 ms    | 1.09 ms     |         0 |             0 |           56 | cdc78c70d83c505c5708847ba810d035 |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 121.76 ms     | 1.95 ms     | 1.03 ms     |     13936 |           118 |        13936 | 20f97c53c2a59f5eadc06b2fa90fbe75 |
| UPDATE `mem30__inventory` . `M ... mpileOs` = ? WHERE `hid` = ?   |           |        114 |         0 |          0 | 114.16 ms     | 22.34 ms    | 1.00 ms     |         0 |             0 |          114 | c5d4a65f3f308f4869807e730739af6d |
| CALL `dc_string_insert` (...)                                     |           |         80 |         0 |          0 | 79.89 ms      | 2.62 ms     | 998.50 ┬╡s  |         0 |             0 |          240 | 93eb9cab8ced45cf3b98400e8803f8af |
| SELECT `this_` . `target` AS ` ... D `this_` . `timestamp` <= ?   |           |        118 |         0 |          0 | 116.19 ms     | 1.32 ms     | 984.60 ┬╡s  |     13484 |           114 |        13484 | bd23afed9a41367591e2b71dac76f334 |
+-------------------------------------------------------------------+-----------+------------+-----------+------------+---------------+-------------+-------------+-----------+---------------+--------------+----------------------------------+

And to give an example using some of the bytes formatting as well, here’s an example for the latest_file_io view:

mysql> select * from latest_file_io limit 10;
+----------------------+----------------------------------------+------------+-----------+-----------+
| thread               | file                                   | latency    | operation | requested |
+----------------------+----------------------------------------+------------+-----------+-----------+
| msandbox@localhost:1 | @@tmpdir/#sqlcf28_1_4e.MYI             | 9.26 µs    | write     | 124 bytes |
| msandbox@localhost:1 | @@tmpdir/#sqlcf28_1_4e.MYI             | 4.00 µs    | write     | 2 bytes   |
| msandbox@localhost:1 | @@tmpdir/#sqlcf28_1_4e.MYI             | 56.34 µs   | close     | NULL      |
| msandbox@localhost:1 | @@tmpdir/#sqlcf28_1_4e.MYD             | 53.93 µs   | close     | NULL      |
| msandbox@localhost:1 | @@tmpdir/#sqlcf28_1_4e.MYI             | 104.05 ms  | delete    | NULL      |
| msandbox@localhost:1 | @@tmpdir/#sqlcf28_1_4e.MYD             | 661.18 µs  | delete    | NULL      |
| msandbox@localhost:1 | @@datadir/Cerberus.log                 | 35.99 ms   | write     | 57 bytes  |
| msandbox@localhost:1 | @@datadir/ps_helper/latest_file_io.frm | 7.40 µs    | stat      | NULL      |
| msandbox@localhost:1 | @@datadir/ps_helper/latest_file_io.frm | 9.81 µs    | open      | NULL      |
| msandbox@localhost:1 | @@datadir/ps_helper/latest_file_io.frm | 16.06 µs   | read      | 3.17 KiB  |
+----------------------+----------------------------------------+------------+-----------+-----------+

Keep the feedback coming! :)

MySQL Performance Schema Statement Digests

MySQL 5.6.5 was released recently, with many a great blog written by various developers involved in making it such an awesome release. The list is large, I’d recommend reading the change log. The MySQL 5.6 What Is New page provides a great overview of the 5.6 release so far as well.

The changelog includes this nugget, which I’ll expand upon here:

The Performance Schema now maintains statement digest information. This normalizes and groups statements with the same “signature” and permits questions to be answered about the types of statements the server is executing and how often they occur.

  • A statement_digest consumer in the setup_consumers table controls whether the Performance Schema maintains digest information.
  • The statement event tables (events_statements_current, events_statements_history, and events_statements_history_long) have DIGEST and DIGEST_TEXT columns that contain digest MD5 values and the corresponding normalized statement text strings.
  • A events_statements_summary_by_digest table provides aggregated statement digest information.

Many of you should be familiar with this view now:

It is the Query Analysis view from MySQL Enterprise Monitor. This was first released at the end of 2008, and was conceived (actually in 2006, presented and much discussed on a snowy day at a MEM team meeting in Amsterdam) from the need to easily monitor what statements are running within the database instance.

Yes, there were the Slow Query Log and General Query Log, but they had their downfalls when we were evaluating this – not fine grained enough on the one side, not enough info with too much overhead on the other, and both require access at the OS level to read the log files – not always a problem, but certainly is in some cases. The only tool that did any kind of analysis and statistics aggregation on queries that MySQL had been executing, at the time, was mysqldumpslow.

And so MySQL Proxy was born, and hooked up with MEM to collect statistics on queries that were funneled through it. It was revolutionary (for the MySQL world, at least). Hot on it’s heels we saw new GUI tools from the community trying to do the same thing. We’ve also seen extensions to the old ways of doing things as well with the extended slow query logging, and command line scripts such as mk-query-digest being born (and re-born). We then extended Query Analysis in to the Connectors as well, to share the load of statistics gathering across application servers, and not have to funnel everything through MySQL Proxy (among other benefits).

These are all still great solutions today. Yet they all have one major downfall – they try and skirt the issue of the database instance … actually generating this data and having it available with a SQL interface. That’s what databases are supposed to be good at … right?

Now it’s 2012, and we finally have a solution for this – Performance Schema Statement Digests. So with the picture above in mind, now take a look at:

Strikingly similar, no? And it’s being returned by the database directly. It gets better.

One of the problems with trying to gather statistics on statement traffic from within the protocol stream (done by Proxy, Connectors, any tcpdump solution, etc.) is that there are certain statistics not exposed within the MySQL protocol – everything like the number of rows examined, temporary table usage, select type etc. that can be seen in a SHOW SESSION STATUS for example are practically impossible to correlate (we would have to inject that in to each connection to wrap each “real” statement).

The Slow Query Log does not have this problem – and the extensions there in the community have proved that it’s useful to be able to aggregate more at the statement level too.

Statement Digests most certainly do not have this problem. Here’s the current structure of the new events_statements_summary_by_digest table:

mysql> DESC performance_schema.events_statements_summary_by_digest;
+-----------------------------+---------------------+------+-----+---------------------+-------+
| Field                       | Type                | Null | Key | Default             | Extra |
+-----------------------------+---------------------+------+-----+---------------------+-------+
| DIGEST                      | varchar(32)         | YES  |     | NULL                |       |
| DIGEST_TEXT                 | longtext            | YES  |     | NULL                |       |
| COUNT_STAR                  | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| MIN_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| AVG_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| MAX_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_LOCK_TIME               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ERRORS                  | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_WARNINGS                | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ROWS_AFFECTED           | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ROWS_SENT               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ROWS_EXAMINED           | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_CREATED_TMP_DISK_TABLES | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_CREATED_TMP_TABLES      | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_FULL_JOIN        | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_FULL_RANGE_JOIN  | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_RANGE            | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_RANGE_CHECK      | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_SCAN             | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_MERGE_PASSES       | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_RANGE              | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_ROWS               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_SCAN               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_NO_INDEX_USED           | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_NO_GOOD_INDEX_USED      | bigint(20) unsigned | NO   |     | NULL                |       |
| FIRST_SEEN                  | timestamp           | NO   |     | 0000-00-00 00:00:00 |       |
| LAST_SEEN                   | timestamp           | NO   |     | 0000-00-00 00:00:00 |       |
+-----------------------------+---------------------+------+-----+---------------------+-------+
28 rows in set (0.02 sec)

Possibilities abound!

Before diving in to some of the use cases, first it’s worth describing exactly what a digest is.

As with Query Analysis in MEM, we track and aggregate statistics by normalizing the statements. Performance Schema does this in slightly different ways to how the MEM components do this, however they both follow roughly the same process, by working on a tokenized representation of the statement.

Performance Schema does this by tying in to the lexer within the MySQL server, which spits out the tokenized statement to be parsed, by recording it’s token output stream and doing some of it’s own normalization on top.

A normalized statement may have any of the following done to it:

  • Stripping whitespace (done in the lexer layer)
  • Stripping comments (done in the lexer layer)
  • Replacing literals (integer and string inputs) with a “?” placeholder
    • SELECT foo, bar FROM foobar WHERE foo = 100; becomes SELECT foo, bar FROM foobar WHERE foo = ?
    • For a single value INSERT statement, INSERT INTO foobar VALUES (100); becomes INSERT INTO foobar VALUES (?);
  • Lists of values are folded
    • Folding lists of IN values
      • SELECT foo FROM foobar WHERE bar IN (1, 2, 3) becomes SELECT foo FROM foobar WHERE bar IN ()
    • Folding multi-row INSERTs with single values
      • INSERT INTO t1 VALUES (1), (2) becomes INSERT INTO t1 VALUES (?) /* , … */
    • Folding multi-row INSERTs with many values
      • INSERT INTO t1 VALUES (1, 2, 3), (4, 5, 6) becomes INSERT INTO t1 VALUES (…) /* , … */
    • Lists of values in the SELECT list
      • SELECT 1, 2, 3, foo, bar FROM foobar becomes SELECT ?, … , foo, bar FROM foobar
  • Long normalized statements are truncated, with “” added at the end
    • “long” is currently defined as 1024 bytes, set by PSI_MAX_DIGEST_STORAGE_SIZE in psi.h, unfortunately this is not configurable yet, however it should also be noted that this is on the normalized statement – so large string values etc. are not a concern here

MEM does not fold lists of values down, i.e. “INSERT INTO foobar VALUES (1,2,3,4,5)” will be normalized to “INSERT INTO foobar VALUES (?,?,?,?,?)” within MEM. Performance Schema does this to save memory, by discarding the value tokens along the way and using the above forms for normalization.

With that description out of the way, now let’s take a look at some the value that you can get from the data!

First, let’s get the obvious out the way:

A high level overview of the statements like Query Analysis, sorted by those queries with the highest latency

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
       COUNT_STAR AS exec_count,
       SUM_ERRORS AS err_count,
       SUM_WARNINGS AS warn_count,
       SEC_TO_TIME(SUM_TIMER_WAIT/1000000000000) AS exec_time_total,
       SEC_TO_TIME(MAX_TIMER_WAIT/1000000000000) AS exec_time_max,
       (AVG_TIMER_WAIT/1000000000) AS exec_time_avg_ms,
       SUM_ROWS_SENT AS rows_sent,
       ROUND(SUM_ROWS_SENT / COUNT_STAR) AS rows_sent_avg,
       SUM_ROWS_EXAMINED AS rows_scanned,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 5;

+-------------------------------------------------------------------+-----------+------------+-----------+------------+-----------------+---------------+------------------+------------+----------+--------------+----------------------------------+
| query                                                             | full_scan | exec_count | err_count | warn_count | exec_time_total | exec_time_max | exec_time_avg_ms | rows_total | rows_avg | rows_scanned | digest                           |
+-------------------------------------------------------------------+-----------+------------+-----------+------------+-----------------+---------------+------------------+------------+----------+--------------+----------------------------------+
| SELECT `mysqlserve0_` . `hid`  ... ` . `performanceSchema` AS ... | *         |       7747 |         0 |          0 | 00:00:02.5492   | 00:00:00.0009 |           0.3288 |       5200 |        1 |         5200 | 3176fe79ae8ce631eb328feaaafcf972 |
| SELECT `agent0_` . `hid` AS `h ... ent0_` . `id` = ? FOR UPDATE   |           |       7706 |         0 |          0 | 00:00:02.4067   | 00:00:00.0008 |           0.3122 |       7706 |        1 |         7706 | cbcf9ed706ce0974f1ef91b7c2d690eb |
| SELECT `os0_` . `hid` AS `hid1 ... ion` AS `hasVersion121_` , ... | *         |       5098 |         0 |          0 | 00:00:02.2526   | 00:00:00.0047 |           0.4416 |       5098 |        1 |         5098 | 1677026fd320b4876261b1270d28be38 |
| SELECT `replicatio0_` . `hid`  ... ess` AS `hasMast18_107_` , ... | *         |       2550 |         0 |          0 | 00:00:02.0543   | 00:00:00.0016 |           0.8055 |       2550 |        1 |         2550 | 90b8d082e28985953ff558546e00eb33 |
| SELECT `environmen0_` . `hid`  ... ck130_` , `environmen0_` . ... | *         |       2550 |         0 |          0 | 00:00:01.6082   | 00:00:00.0047 |           0.6304 |       2550 |        1 |         2550 | f0ba68f44b6d9357f1cff9670dc4ff2a |
+-------------------------------------------------------------------+-----------+------------+-----------+------------+-----------------+---------------+------------------+------------+----------+--------------+----------------------------------+

My test data is a little bland (this is a MEM schema monitoring a single MySQL instance), but you get the idea. You can instantly get a picture of your top statements, ordered in this case by latency, along with some extra statistics like whether it caused a full scan, their average latency, how many rows were both scanned and returned etc.

But you could slice and dice this data any way you want – it’s just SQL.

Here’s a few more use cases:

List all normalized statements that use temporary tables ordered by number of on disk temporary tables descending first, then by the number of memory tables.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
       SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
       ROUND(SUM_CREATED_TMP_TABLES / COUNT_STAR) AS avg_tmp_tables_per_query,
       ROUND((SUM_CREATED_TMP_DISK_TABLES / SUM_CREATED_TMP_TABLES) * 100) AS tmp_tables_to_disk_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_CREATED_TMP_TABLES > 0
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC LIMIT 5;

+-------------------------------------------------------------------+------------+-------------------+-----------------+--------------------------+------------------------+
| query                                                             | exec_count | memory_tmp_tables | disk_tmp_tables | avg_tmp_tables_per_query | tmp_tables_to_disk_pct |
+-------------------------------------------------------------------+------------+-------------------+-----------------+--------------------------+------------------------+
| SELECT DISTINCTROW `hibalarm0_ ... testeval2_` . `alarm_id` = ... |          5 |                15 |               5 |                        3 |                     33 |
| SELECT DISTINCTROW `hibalarm0_ ... testeval2_` . `alarm_id` = ... |          2 |                 6 |               2 |                        3 |                     33 |
| SELECT * FROM latest_file_io                                      |          2 |                 4 |               2 |                        2 |                     50 |
| SELECT * FROM PROCESSLIST                                         |          2 |                 2 |               2 |                        1 |                    100 |
| SELECT SQL_CALC_FOUND_ROWS `st ...  , MIN ( `min_exec_time` ) ... |          1 |                 3 |               1 |                        3 |                     33 |
+-------------------------------------------------------------------+------------+-------------------+-----------------+--------------------------+------------------------+

List all normalized statements that have done sorts, ordered by sort_merge_passes, sort_scans and sort_rows, all descending.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_SORT_MERGE_PASSES AS sort_merge_passes,
       ROUND(SUM_SORT_MERGE_PASSES / COUNT_STAR) AS avg_sort_merges,
       SUM_SORT_SCAN AS sorts_using_scans,
       SUM_SORT_RANGE AS sort_using_range,
       SUM_SORT_ROWS AS rows_sorted,
       ROUND(SUM_SORT_ROWS / COUNT_STAR) AS avg_rows_sorted,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_SORT_ROWS > 0
 ORDER BY SUM_SORT_MERGE_PASSES DESC, SUM_SORT_SCAN DESC, SUM_SORT_ROWS DESC LIMIT 5;

+-------------------------------------------------------------------+------------+-------------------+-----------------+-------------------+------------------+-------------+-----------------+----------------------------------+
| query                                                             | exec_count | sort_merge_passes | avg_sort_merges | sorts_using_scans | sort_using_range | rows_sorted | avg_rows_sorted | digest                           |
+-------------------------------------------------------------------+------------+-------------------+-----------------+-------------------+------------------+-------------+-----------------+----------------------------------+
| SELECT * FROM ps_helper . statements_with_sorting                 |          7 |                 0 |               0 |                 7 |                0 |          31 |               4 | 635d19e3e652972b3267ada0bf9c7b36 |
| SELECT * FROM statement_analysis                                  |          4 |                 0 |               0 |                 4 |                0 |          89 |              22 | 10f918a1a410f4fa0fc2602cff02deb7 |
| SELECT table_schema , SUM ( da ... tables GROUP BY table_schema   |          2 |                 0 |               0 |                 2 |                0 |          24 |              12 | 27fecd44f0bf5c0fc4e46f547083a09d |
| SELECT * FROM statements_with_sorting                             |          2 |                 0 |               0 |                 2 |                0 |           3 |               2 | dc117dd0eb81394322e3d4144a997ffc |
+-------------------------------------------------------------------+------------+-------------------+-----------------+-------------------+------------------+-------------+-----------------+----------------------------------+

List all normalized statements that use have done a full table scan ordered by the percentage of times a full scan was done, then by the number of times the statement executed

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_NO_INDEX_USED AS no_index_used_count,
       SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
       ROUND((SUM_NO_INDEX_USED / COUNT_STAR) * 100) no_index_used_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_NO_INDEX_USED > 0
    OR SUM_NO_GOOD_INDEX_USED > 0
ORDER BY no_index_used_pct DESC, exec_count DESC LIMIT 5;

+-------------------------------------------------------------------+------------+---------------------+--------------------------+-------------------+
| query                                                             | exec_count | no_index_used_count | no_good_index_used_count | no_index_used_pct |
+-------------------------------------------------------------------+------------+---------------------+--------------------------+-------------------+
| SELECT `hibadvisor0_` . `sched ... _` . `advisorClassId` IN (?)   |        679 |                 679 |                        0 |               100 |
| SELECT `hibalarm0_` . `alarm_i ... ` . `fixed_time` < ? LIMIT ?   |        365 |                 365 |                        0 |               100 |
| SELECT `id` , `millis_stamp` , ... s` ORDER BY `id` ASC LIMIT ?   |        353 |                 353 |                        0 |               100 |
| SELECT `agent0_` . `hid` AS `h ... ventory` . `Agent` `agent0_`   |        112 |                 112 |                        0 |               100 |
| SELECT COUNT ( * ) AS `col_0_0 ... entry0_` . `fetchedDate` > ?   |         18 |                  18 |                        0 |               100 |
+-------------------------------------------------------------------+------------+---------------------+--------------------------+-------------------+

List all normalized statements that have raised errors or warnings.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_ERRORS AS errors,
       (SUM_ERRORS / COUNT_STAR) * 100 as error_pct,
       SUM_WARNINGS AS warnings,
       (SUM_WARNINGS / COUNT_STAR) * 100 as warning_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_ERRORS > 0
    OR SUM_WARNINGS > 0
ORDER BY SUM_ERRORS DESC, SUM_WARNINGS DESC;

+-------------------------------------------------------------------+------------+--------+-----------+----------+-------------+----------------------------------+
| query                                                             | exec_count | errors | error_pct | warnings | warning_pct | digest                           |
+-------------------------------------------------------------------+------------+--------+-----------+----------+-------------+----------------------------------+
| CREATE PROCEDURE currently_ena ... w_instruments BOOLEAN DEFAULT  |          2 |      2 |  100.0000 |        0 |      0.0000 | ad6024cfc2db562ae268b25e65ef27c0 |
| CREATE PROCEDURE currently_ena ... ents WHERE enabled = ? ; END   |          2 |      1 |   50.0000 |        0 |      0.0000 | 4aac3ab9521a432ff03313a69cfcc58f |
| CREATE PROCEDURE currently_enabled ( BOOLEAN show_instruments     |          1 |      1 |  100.0000 |        0 |      0.0000 | c6df6711da3d1a26bc136dc8b354f6eb |
| CREATE PROCEDURE disable_backg ... d = ? WHERE TYPE = ? ; END IF  |          1 |      1 |  100.0000 |        0 |      0.0000 | 12e0392402780424c736c9555bcc9703 |
| DROP PROCEDURE IF EXISTS currently_enabled                        |         12 |      0 |    0.0000 |        6 |     50.0000 | 44cc7e655d08f430e0dd8f3110ed816c |
| DROP PROCEDURE IF EXISTS disable_background_threads               |          3 |      0 |    0.0000 |        2 |     66.6667 | 0153b7158dae80672bda6181c73f172c |
| CREATE SCHEMA IF NOT EXISTS ps_helper                             |          2 |      0 |    0.0000 |        1 |     50.0000 | a12cabd32d1507c758c71478075f5290 |
+-------------------------------------------------------------------+------------+--------+-----------+----------+-------------+----------------------------------+

All of these greatly show the flexibility of having this instrumentation directly within the server, accessible via the server itself. If you haven’t done it already now is the time to take a little breath, and then shout Hurrah.

Now, it’s worth noting that all of this is stored in memory. By default, the maximum number of rows in this table in 5.6.5 is 200, which effectively lets you monitor 199 normalized statements – as there is a row reserved for a NULL digest, which is a catch all for once this number of normalized statements have been generated – Performance Schema then stores the aggregated statistics for all other statement traffic whose statement digests do not match those already within the table within this NULL row.

In the MEM team we’ve found in practice that people often have many more than this – often 500+ different kinds of statements being executed. You can increase the number of digests that are tracked with the performance_schema_digests_size system variable. You can track how much memory this uses with the SHOW ENGINE PERFORMANCE_SCHEMA STATUS command (here I have the number of digests to track set to 400, which takes roughly 490KB of memory):

mysql> show engine performance_schema status;
+--------------------+--------------------------------------------------------------+-----------+
| Type               | Name                                                         | Status    |
+--------------------+--------------------------------------------------------------+-----------+
...
| performance_schema | events_statements_summary_by_digest.row_size                 | 1256      |
| performance_schema | events_statements_summary_by_digest.row_count                | 400       |
| performance_schema | events_statements_summary_by_digest.memory                   | 502400    |
...

This limit has been raised to 1000 within the current development version (5.6.6), along with Performance Schema (and the instrumentation I have talked about here) being enabled by default!

As with other aggregate tables within Performance Schema, you can reset the statistics within the digest table with:

TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;

Any scripts or tools that are used to analyze this table will need to take this in to account.

We’re just scratching the surface of possibilities with this table at the moment. I have a number of things that I’d like to see added (like.. a summary of each of the major classes of lower wait events, if enabled, like “SUM_FILE_IO_WAIT”, “SUM_SOCKET_IO_WAIT”, “SUM_MUTEX_WAIT” etc.), and I’m sure many of you out there in the community will have your own ideas as well.

Welcome to the future of statement diagnostics within MySQL, we hope you enjoy it! Tell us what you’d like to see next!