5 years and counting

Yesterday marked my 5th anniversary since I signed on the dotted line, and starting working with MySQL AB! I celebrated yesterday with a BBQ with the family, and a few vodka shots for old times sake. :) Interestingly, it was also the 5 year anniversary of the first 5.0 RC being announced.

Since that time I’ve transferred through to Sun, and now find my home to be Oracle (funnily, the user community that I left to join MySQL).

I’ve met a huge number of fantastically talented people, made a lot of new friends, seen a lot of new colleagues come on board as we expanded so quickly, and sadly seen a number of those colleagues leave to go onwards (and pretty much in all cases, upwards) within the IT world. Happily, a large number remain, and continue to amaze me with the work they have been doing, and focus that they have.

I’ve also gotten to know a number of users and customers pretty well, and have to say they are among the most passionate IT community that I’ve ever come across. Your will for MySQL to keep improving is amazing, and remains my inspiration.

It’s also taken me to new places, I’ve gotten to see New York, Las Vegas, San Francisco, Santa Cruz, Washington DC, Santa Clara, Cupertino, Austin, Montreal, Amsterdam, Frankfurt, Heidelberg, Hamburg, and Riga (the less said the better on that one).

For my part, I’m still as jazzed as ever to be working with the MySQL group within Oracle. We’re taking great strides at the moment, with performance, instrumentation, usability, and high availability within MySQL 5.5, and the roadmap ahead has a very bright future for both the Server and the tools that I get to spend my day to day life on (MEM and MySQL Proxy).

And so I raise my coffee cup (hey, I’m at work now), to the next 5 years, and to all the great people that have made my working life so enjoyable!

Tracking mutex locks in a process list, MySQL 5.5′s PERFORMANCE_SCHEMA

Internally MySQL uses various methods to manage (or, block) concurrent access to shared structures within the server – the most common being mutexes (or Mutual Exclusion locks), RW Locks, or Semaphores. Each have slightly different properties on how different threads are allowed to interact when trying to access these synchronization points.

There has been much written on the various hot spots in the server around these locking/blocking constructs over the past couple of years, so I won’t go further in to that here. See the excellent blogs from Mikael or Dimitri from within Oracle, or those from the likes of Mark or Domas over at Facebook, for examples.

Visibility in to what was happening in the server at this level has been very lacking however, with only SHOW MUTEX STATUS available in the standard server distribution, which only tracked InnoDB mutexes and not those at the server level – which could often be the blocking factors for scalability.

I’ve blogged about PERFORMANCE_SCHEMA before, but never really shown what you can now do whilst trying to track down what sessions are doing, and what they are currently waiting for.

I also saw a hack recently that put the fact that a connection was trying to grab a Query Cache mutex in to the thread state (“Waiting on query cache mutex”), to track whether the query cache was causing too much contention. I thought that was entirely the wrong way to track this kind of detail (it misuses the thread state, which should be stages of execution, not each synchronization point, and doesn’t give you any meaningful way to track the entire overhead). Hopefully I can now show why I thought that.

So with that, let’s dive in to an example of how it should really be done, (you will probably have to scroll this to the right):

SELECT NAME, 
       IF(PPS.ID = 0, 'Internal Thread', CONCAT(IPS.USER, '@', IPS.HOST)) USER,
       DB,
       COMMAND,
       STATE,
       TIME,
       EVENT_NAME LAST_WAIT,
       IF(TIMER_WAIT IS NULL , 'Still Waiting', TIMER_WAIT/1000000000) LAST_WAIT_MS
  FROM performance_schema.EVENTS_WAITS_CURRENT 
  JOIN performance_schema.THREADS PPS USING (THREAD_ID) 
  LEFT JOIN INFORMATION_SCHEMA.PROCESSLIST IPS USING (ID);

+----------------------------------------+--------------------+--------------------+---------+--------------+------+--------------------------------------------------+---------------+
| NAME                                   | USER               | DB                 | COMMAND | STATE        | TIME | LAST_WAIT                                        | LAST_WAIT_MS  |
+----------------------------------------+--------------------+--------------------+---------+--------------+------+--------------------------------------------------+---------------+
| thread/sql/main                        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/sql/LOCK_thread_count           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0002        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/io/file/innodb/innodb_log_file              | Still Waiting |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0001        |
| thread/innodb/io_handler_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/buf_pool_mutex           | 0.0004        |
| thread/innodb/srv_error_monitor_thread | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/srv_innodb_monitor_mutex | 0.0002        |
| thread/innodb/srv_master_thread        | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/fil_system_mutex         | 0.0028        |
| thread/innodb/srv_lock_timeout_thread  | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/innodb/kernel_mutex             | 0.0024        |
| thread/sql/one_connection              | msandbox@localhost | performance_schema | Query   | executing    |    0 | wait/io/file/myisam/dfile                        | 0.0053        |
| thread/sql/signal_handler              | Internal Thread    | NULL               | NULL    | NULL         | NULL | wait/synch/mutex/sql/LOCK_thread_count           | 0.0030        |
| thread/sql/one_connection              | msandbox@localhost | test               | Query   | Sending data |  133 | wait/synch/rwlock/innodb/checkpoint_lock         | Still Waiting |
+----------------------------------------+--------------------+--------------------+---------+--------------+------+--------------------------------------------------+---------------+
16 rows in set (0.78 sec)

The first thing to notice is that not only can we see what external connections are doing – we can now see what internal threads are doing as well. All threads are easily identified by the PERFORMANCE_SCHEMA.THREADS.NAME column, with all client connections having the same name of thread/sql/one_connection (the internal function to manage database connections is handle_one_connection()), and each internal thread within the server and storage engines are tagged with their own identifiers, such as thread/sql/main, the main MySQL Server thread, or thread/innodb/srv_master_thread, the main InnoDB storage engine thread.

So we show the name of each thread, and then figure out the user@host if the thread is an external client connection. We then show all of the normal output that you would expect of SHOW PROCESSLIST, with DB, COMMAND, STATE and TIME (I emit the INFO column, but you could add that if you like to see the statement executing as well).

The last two columns come from the new EVENTS_WAITS_CURRENT table. As you can see, we can now show exactly what the current, or last completed, wait for each session was, and for exactly how long they waited (down to picosecond precision).

In my example above, the last thread listed is waiting for checkpointing to complete within InnoDB, whilst an internal thread is waiting on file IO, on the InnoDB transaction logs (i.e, he is the guy working on the checkpoint) – ooops, I was running a huge INSERT … SELECT on an instance with default settings for innodb_log_file_size. :)

If you start to see lots of lots of connections waiting on wait/synch/rwlock/innodb/checkpoint_lock like above, then you too know that you should probably be doing something about your log file sizes (or maybe check whether something else is causing too aggressive checkpointing, such as a very low innodb_max_dirty_pages_pct value).

Keep in mind this can only give you an indication that there are contention points that can be dealt with, as these statistics are very transient in nature.

Here’s an example of tracking down exactly what the contention points in your server are:

SELECT EVENT_NAME,
       SUM_TIMER_WAIT/1000000000 WAIT_MS,
       COUNT_STAR
  FROM performance_schema.EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME 
 ORDER BY SUM_TIMER_WAIT DESC, COUNT_STAR DESC LIMIT 30;

+---------------------------------------------------------+-------------+------------+
| EVENT_NAME                                              | WAIT_MS     | COUNT_STAR |
+---------------------------------------------------------+-------------+------------+
| wait/io/file/innodb/innodb_data_file                    | 150562.0345 |     182750 |
| wait/io/file/innodb/innodb_log_file                     |  77795.5425 |       8913 |
| wait/synch/rwlock/innodb/checkpoint_lock                |  11636.7894 |       1284 |
| wait/synch/rwlock/innodb/btr_search_latch               |   7429.2204 |   39677465 |
| wait/io/file/myisam/kfile                               |   7353.2737 |       2011 |
| wait/synch/mutex/innodb/kernel_mutex                    |   4115.0534 |   59198510 |
| wait/synch/mutex/innodb/fil_system_mutex                |   3473.8341 |    1263809 |
| wait/io/file/myisam/dfile                               |   2940.5701 |        641 |
| wait/synch/mutex/innodb/buf_pool_mutex                  |   2650.8666 |   33982979 |
| wait/synch/mutex/innodb/rw_lock_mutex                   |   1261.8025 |   18877546 |
| wait/io/file/sql/FRM                                    |    116.6419 |        863 |
| wait/synch/cond/sql/COND_thread_count                   |      1.4073 |          1 |
| wait/io/file/sql/pid                                    |      1.2654 |          3 |
| wait/synch/mutex/innodb/mutex_list_mutex                |      0.7675 |      16727 |
| wait/synch/mutex/innodb/rw_lock_list_mutex              |      0.4023 |       8331 |
| wait/io/file/sql/dbopt                                  |      0.2745 |         12 |
| wait/io/file/sql/casetest                               |      0.2041 |          5 |
| wait/synch/mutex/innodb/thr_local_mutex                 |      0.2009 |       2050 |
| wait/synch/mutex/mysys/THR_LOCK_open                    |      0.1993 |        989 |
| wait/synch/rwlock/innodb/trx_purge_latch                |      0.1436 |        255 |
| wait/io/file/sql/ERRMSG                                 |      0.1432 |          5 |
| wait/synch/rwlock/sql/LOCK_grant                        |      0.1375 |        188 |
| wait/synch/rwlock/sql/MDL_lock::rwlock                  |      0.1013 |        481 |
| wait/synch/mutex/sql/Query_cache::structure_guard_mutex |      0.0923 |        628 |
| wait/synch/mutex/mysys/THR_LOCK_myisam                  |      0.0781 |        283 |
| wait/synch/rwlock/sql/Query_cache_query::lock           |      0.0676 |        198 |
| wait/io/file/mysys/charset                              |      0.0561 |          3 |
| wait/synch/mutex/sql/LOCK_open                          |      0.0529 |        543 |
| wait/synch/mutex/mysys/THR_LOCK_lock                    |      0.0504 |        244 |
| wait/synch/mutex/innodb/srv_innodb_monitor_mutex        |      0.0424 |         75 |
+---------------------------------------------------------+-------------+------------+
30 rows in set (0.00 sec)

All of the above is aggregated server wide, since server start. Of course IO is my largest issue – I was doing write heavy work.

But then you can see exactly what I was waiting on after that – the first culprit was as above, way too much checkpointing. Next is waiting for wait/synch/rwlock/innodb/btr_search_latch, the synchronization point for the InnoDB Adaptive Hash Index. I might also do well to disable it with innodb_adaptive_hash_index = 0 in my case.

I can tell immediately that the query cache is not a contention point as well for example – just look for high values for wait/synch/mutex/sql/Query_cache::structure_guard_mutex for that.

5.5 is coming people, get downloading and trying! We want your feedback! :)

UPDATE: Note that in 5.5.6 and above, you should actually use performance_schema.THREADS instead of PROCESSLIST, and performance_schema.EVENTS_WAITS_SUMMARY_GLOBAL_BY_EVENT_NAME instead of EVENTS_WAITS_SUMMARY_BY_EVENT_NAME, these were renamed within Bug#55416.

Improving InnoDB Transaction Reporting

Everybody knows that parsing the output of SHOW ENGINE INNODB STATUS is hard, especially when you want to track the information historically, or want to aggregate any of the more dynamic sections such as the TRANSACTIONS one.

Within the InnoDB plugin the INFORMATION_SCHEMA.INNODB_TRX table was added, which allowed you to at least get some of the information on each transaction, but not the full breadth of information that SHOW ENGINE INNODB STATUS provided.

“This is nice..” I thought “..but why not go the whole hog..?”.. And so I set about doing that, and opened up Bug#53336. In a very short time, I was in a review process with the InnoDB team, the patch was cleaned up, and (after a little hiccup) everything has been pushed for the next milestone.

Here’s the docs notes that I wrote for it today (with a little more annotation):

This patch adds the following columns:

mysql> desc innodb_trx;
+----------------------------+---------------------+------+-----+---------------------+-------+
| Field                      | Type                | Null | Key | Default             | Extra |
+----------------------------+---------------------+------+-----+---------------------+-------+
...
| trx_operation_state        | varchar(64)         | YES  |     | NULL                |       |
| trx_tables_in_use          | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_tables_locked          | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_lock_structs           | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_lock_memory_bytes      | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_rows_locked            | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_rows_modified          | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_concurrency_tickets    | bigint(21) unsigned | NO   |     | 0                   |       |
| trx_isolation_level        | varchar(16)         | NO   |     |                     |       |
| trx_unique_checks          | int(1)              | NO   |     | 0                   |       |
| trx_foreign_key_checks     | int(1)              | NO   |     | 0                   |       |
| trx_last_foreign_key_error | varchar(256)        | YES  |     | NULL                |       |
| trx_adaptive_hash_latched  | int(1)              | NO   |     | 0                   |       |
| trx_adaptive_hash_timeout  | bigint(21) unsigned | NO   |     | 0                   |       |
+----------------------------+---------------------+------+-----+---------------------+-------+
22 rows in set (0.05 sec)

Most of these are self explanatory, a lot of them duplicate the information within SHOW ENGINE INNODB STATUS (so you now no longer need to parse the output of that to get complete transaction information).

Below are some of my notes. For comparison, here’s an example of a transaction from the SHOW ENGINE INNODB STATUS statement:

---TRANSACTION 517, ACTIVE 1 sec, OS thread id 2958520320 inserting
mysql tables in use 2, locked 2
189 lock struct(s), heap size 27968, 54389 row lock(s), undo log entries 2406
MySQL thread id 2, query id 36 localhost root Sending data
insert ignore into t1 (select (i * rand())*100, 1 from t1)

trx_operation_state – corresponds to “inserting”, InnoDB’s internal transaction state
trx_tables_in_use – corresponds to “mysql tables in use 2″, the number of table locks requested by MySQL via external_lock()
trx_tables_locked – corresponds to “locked 2″, the number of actual table locks taken via external_lock()
trx_lock_structs – corresponds to “189 lock struct(s)”, the size of the lock struct list
trx_lock_memory_bytes – corresponds to “heap size 27968″, the number of bytes allocated to locks structs
trx_rows_locked – corresponds to “54389 row lock(s)”, an estimation of the number of rows locked (delete marked rows may make it imprecise)
trx_rows_modified – corresponds to “undo log entries 2406″, the number of rows modified in the transaction (inserted, updated, deleted)

Not seen in the transaction output above:

trx_concurrency_tickets – corresponds to “thread declared inside InnoDB 89″ for a transaction in SHOW ENGINE INNODB STATUS, the number of concurrency tickets remaining for the transaction when innodb_thread_concurrency != 0
trx_isolation_level – the transactions isolation level
trx_unique_checks – whether the transaction has “SET UNIQUE_CHECKS = 0″
trx_foreign_key_checks – whether the transaction has “SET FOREIGN_KEY_CHECKS = 0″
trx_last_foreign_key_error – if the last statement in the transaction resulted in an FK error, the error text is printed here
trx_adaptive_hash_latched – corresponds to “holds adaptive hash latch” being printed for a transaction in SHOW ENGINE INNODB STATUS
trx_adaptive_hash_timeout – when innodb_adapative_hash_index is enabled (default), statements that try to get the adapative hash latch spin 10000 (BTR_SEA_TIMEOUT) times, re-trying getting the adaptive hash latch, before giving up. lower numbers here for a lot of transactions may indicate contention on the adaptive hash latch

In summary – no more having to parse SHOW ENGINE INNODB STATUS output for transaction information, now you can just do it with SQL (with all it’s aggregation goodness as well if you want to)!

Starting a new job!

I’ve had a wild ride over the past ~4.5 years, starting with MySQL AB as a “Support Engineer”, and working through to “Senior Support Engineer”, and then “Regional Support Manager, Americas” with the MySQL Support Team – truly one of the best product support teams I’ve ever known in the IT industry, even if I am biased.

I’ve always had a passion for helping people, which is why I think I did “OK” in Support. However I’ve always also had a second passion which has been bubbling away for me too – building solutions for diagnosing database issues. I started in the database world in the Oracle market, working on monitoring and management tools. MySQL “poached” me from there whilst I was building a MySQL monitoring module for the cross database monitoring tool that we had, as well as working in a supporting/consulting role for our customers.

Given my background when I joined, I was an obvious person to be put in to an internal coordinator role to manage interaction between the team that develops MySQL Enterprise Monitor and Support back then (before MEM reached it’s first GA), so I also had the pleasure of working with that team pretty closely for my time in Support.

It satisfied my second passion for a long time. Now it’s time to take the leap back to that full time!

At the start of this week I started in my new role, as a “Software Development Manager” for the Enterprise Tools team, still within MySQL/Oracle, working full time on MySQL Enterprise Monitor and MySQL Proxy.

I’m pretty excited about the new role – I want to help make monitoring and management of MySQL easy, both by getting more of the “right” data from the server, and by representing it all in coherent ways for DBAs in our tools, so that they too can be more productive in making the right choices when managing their systems. I feel pretty well placed to do that now. :)

If you’re a customer that has used MEM and wants to give any feedback, or even just an interested user in the community – I’m all ears, either leave your comments here, or ping me by email (first_name.last_name@oracle.com) – I’d love to have some discussions with you about your needs, likes, or dislikes.

I thank every person in Support for my time with them, however I owe all of that time to one fairly anonymous guy in the “external” MySQL world, but a man mountain “internally” – Dean Ellis, the man who convinced me to join MySQL AB all those years ago, and has been a fantastic mentor and manager over the years to me. Mr Ellis – I salute you. Thanks for bringing me on the ride, I still don’t regret it, no matter how much you tried to make me! ;)

Enterprise Tools team – Look Out, you have me to whine at you full time now!

A morning hack – Com_change_user

So after I published my patch last night, another of my colleagues – the esteemed Shane Bester – pointed out that there is a related bug – Bug#28405 – which requests that Com_change_user is also split out from Com_admin_commands.

So I extended my patch this morning, to kill two birds with one stone:

=== modified file 'sql/mysqld.cc'
--- sql/mysqld.cc       revid:alik@sun.com-20100114090008-3rsdmlp1w2mqgrhg
+++ sql/mysqld.cc       2010-03-03 09:57:40 +0000
@@ -3131,6 +3131,7 @@
   {"call_procedure",       (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_CALL]), SHOW_LONG_STATUS},
   {"change_db",            (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_CHANGE_DB]), SHOW_LONG_STATUS},
   {"change_master",        (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_CHANGE_MASTER]), SHOW_LONG_STATUS},
+  {"change_user",          (char*) offsetof(STATUS_VAR, com_change_user), SHOW_LONG_STATUS},
   {"check",                (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_CHECK]), SHOW_LONG_STATUS},
   {"checksum",             (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_CHECKSUM]), SHOW_LONG_STATUS},
   {"commit",               (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_COMMIT]), SHOW_LONG_STATUS},
@@ -3174,6 +3175,7 @@
   {"load",                 (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_LOAD]), SHOW_LONG_STATUS},
   {"lock_tables",          (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_LOCK_TABLES]), SHOW_LONG_STATUS},
   {"optimize",             (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_OPTIMIZE]), SHOW_LONG_STATUS},
+  {"ping",                 (char*) offsetof(STATUS_VAR, com_ping), SHOW_LONG_STATUS},
   {"preload_keys",         (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_PRELOAD_KEYS]), SHOW_LONG_STATUS},
   {"prepare_sql",          (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_PREPARE]), SHOW_LONG_STATUS},
   {"purge",                (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_PURGE]), SHOW_LONG_STATUS},
@@ -3350,11 +3352,13 @@
     We have few debug-only commands in com_status_vars, only visible in debug
     builds. for simplicity we enable the assert only in debug builds
 
-    There are 8 Com_ variables which don't have corresponding SQLCOM_ values:
+    There are 10 Com_ variables which don't have corresponding SQLCOM_ values:
     (TODO strictly speaking they shouldn't be here, should not have Com_ prefix
     that is. Perhaps Stmt_ ? Comstmt_ ? Prepstmt_ ?)
 
       Com_admin_commands       => com_other
+      Com_change_user          => com_change_user
+      Com_ping                 => com_ping
       Com_stmt_close           => com_stmt_close
       Com_stmt_execute         => com_stmt_execute
       Com_stmt_fetch           => com_stmt_fetch
@@ -3368,7 +3372,7 @@
     of SQLCOM_ constants.
   */
   compile_time_assert(sizeof(com_status_vars)/sizeof(com_status_vars[0]) - 1 ==
-                     SQLCOM_END + 8);
+                     SQLCOM_END + 10);
 #endif
 
   if (get_options(&remaining_argc, &remaining_argv))

=== modified file 'sql/sql_class.h'
--- sql/sql_class.h     revid:alik@sun.com-20100114090008-3rsdmlp1w2mqgrhg
+++ sql/sql_class.h     2010-03-03 09:56:18 +0000
@@ -443,6 +443,8 @@
   ulong ha_discover_count;
   ulong ha_savepoint_count;
   ulong ha_savepoint_rollback_count;
+  ulong com_ping;
+  ulong com_change_user;
 
   /* KEY_CACHE parts. These are copies of the original */
   ulong key_blocks_changed;

=== modified file 'sql/sql_parse.cc'
--- sql/sql_parse.cc    revid:alik@sun.com-20100114090008-3rsdmlp1w2mqgrhg
+++ sql/sql_parse.cc    2010-03-03 09:56:19 +0000
@@ -979,7 +979,7 @@
 #endif
   case COM_CHANGE_USER:
   {
-    status_var_increment(thd->status_var.com_other);
+    status_var_increment(thd->status_var.com_change_user);
     char *user= (char*) packet, *packet_end= packet + packet_length;
     /* Safe because there is always a trailing \0 at the end of the packet */
     char *passwd= strend(user)+1;
@@ -1409,7 +1409,7 @@
     break;
   }
   case COM_PING:
-    status_var_increment(thd->status_var.com_other);
+    status_var_increment(thd->status_var.com_ping);
     my_ok(thd);                                // Tell client we are alive
     break;
   case COM_PROCESS_INFO:

Again, a quick test script:

< ?php

$conn = mysqli_connect('127.0.0.1','root','msandbox', 'test', 5550);

if (!mysqli_change_user($conn, 'root', 'msandbox', 'mysql')) {
   echo "Change user failed!\n";
} else {
   echo "User changed!\n";
}

?>

And test:

Cerberus:msb_5_5_5 mark$ mysql -u root -pmsandbox -h 127.0.0.1 -P5550 -e "show global status like 'com_change_user'"
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Com_change_user | 0     |
+-----------------+-------+
Cerberus:msb_5_5_5 mark$ php ~/Dev/tests/com_change_user.php
User changed!
Cerberus:msb_5_5_5 mark$ mysql -u root -pmsandbox -h 127.0.0.1 -P5550 -e "show global status like 'com_change_user'"
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Com_change_user | 1     |
+-----------------+-------+
Cerberus:msb_5_5_5 mark$ php ~/Dev/tests/com_change_user.php
User changed!
Cerberus:msb_5_5_5 mark$ mysql -u root -pmsandbox -h 127.0.0.1 -P5550 -e "show global status like 'com_change_user'"
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Com_change_user | 2     |
+-----------------+-------+

Yet Another Disclaimer: Not sure if/when/where this might get in! It is a change in behavior with regards to Com_admin_commands

An evening hack – Com_ping

My boss vented about the lack of a Com_ping SHOW GLOBAL STATUS variable earlier, and I figured it would be dead easy to hack in.

A few minutes later:

Cerberus:mysql-next-mr mark$ bzr diff ./sql
=== modified file 'sql/mysqld.cc'
--- sql/mysqld.cc       revid:alik@sun.com-20100114090008-3rsdmlp1w2mqgrhg
+++ sql/mysqld.cc       2010-03-02 22:58:45 +0000
@@ -3174,6 +3174,7 @@
   {"load",                 (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_LOAD]), SHOW_LONG_STATUS},
   {"lock_tables",          (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_LOCK_TABLES]), SHOW_LONG_STATUS},
   {"optimize",             (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_OPTIMIZE]), SHOW_LONG_STATUS},
+  {"ping",                 (char*) offsetof(STATUS_VAR, com_ping), SHOW_LONG_STATUS},
   {"preload_keys",         (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_PRELOAD_KEYS]), SHOW_LONG_STATUS},
   {"prepare_sql",          (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_PREPARE]), SHOW_LONG_STATUS},
   {"purge",                (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_PURGE]), SHOW_LONG_STATUS},
@@ -3350,11 +3351,12 @@
     We have few debug-only commands in com_status_vars, only visible in debug
     builds. for simplicity we enable the assert only in debug builds
 
-    There are 8 Com_ variables which don't have corresponding SQLCOM_ values:
+    There are 9 Com_ variables which don't have corresponding SQLCOM_ values:
     (TODO strictly speaking they shouldn't be here, should not have Com_ prefix
     that is. Perhaps Stmt_ ? Comstmt_ ? Prepstmt_ ?)
 
       Com_admin_commands       => com_other
+      Com_ping                 => com_ping
       Com_stmt_close           => com_stmt_close
       Com_stmt_execute         => com_stmt_execute
       Com_stmt_fetch           => com_stmt_fetch
@@ -3368,7 +3370,7 @@
     of SQLCOM_ constants.
   */
   compile_time_assert(sizeof(com_status_vars)/sizeof(com_status_vars[0]) - 1 ==
-                     SQLCOM_END + 8);
+                     SQLCOM_END + 9);
 #endif
 
   if (get_options(&remaining_argc, &remaining_argv))

=== modified file 'sql/sql_class.h'
--- sql/sql_class.h     revid:alik@sun.com-20100114090008-3rsdmlp1w2mqgrhg
+++ sql/sql_class.h     2010-03-02 22:56:05 +0000
@@ -443,6 +443,7 @@
   ulong ha_discover_count;
   ulong ha_savepoint_count;
   ulong ha_savepoint_rollback_count;
+  ulong com_ping;
 
   /* KEY_CACHE parts. These are copies of the original */
   ulong key_blocks_changed;

=== modified file 'sql/sql_parse.cc'
--- sql/sql_parse.cc    revid:alik@sun.com-20100114090008-3rsdmlp1w2mqgrhg
+++ sql/sql_parse.cc    2010-03-02 23:23:41 +0000
@@ -1409,7 +1409,7 @@
     break;
   }
   case COM_PING:
-    status_var_increment(thd->status_var.com_other);
+    status_var_increment(thd->status_var.com_ping);
     my_ok(thd);                                // Tell client we are alive
     break;
   case COM_PROCESS_INFO:

COM_PING has always historically been recorded under Com_admin_commands (which is tracked with com_other), however this also includes a number of other commands:

COM_TABLE_DUMP
COM_CHANGE_USER
COM_BINLOG_DUMP
COM_SHUTDOWN
COM_DEBUG

COM_CHANGE_USER can also be used frequently with connection pools, as can COM_PING. It would be nice to differentiate them when tracking down issues within such environments.

Here’s the bug that was opened:

http://bugs.mysql.com/bug.php?id=51667

And a quick test:

Cerberus:msb_5_5_5 mark$ mysql -u root -pmsandbox -h 127.0.0.1 -P 5550 -e "show global status like 'com_ping'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_ping      | 0     |
+---------------+-------+
Cerberus:msb_5_5_5 mark$ php ~/Dev/tests/com_ping.php
Connected!
Cerberus:msb_5_5_5 mark$ mysql -u root -pmsandbox -h 127.0.0.1 -P 5550 -e "show global status like 'com_ping'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_ping      | 1     |
+---------------+-------+
Cerberus:msb_5_5_5 mark$ php ~/Dev/tests/com_ping.php
Connected!
Cerberus:msb_5_5_5 mark$ mysql -u root -pmsandbox -h 127.0.0.1 -P 5550 -e "show global status like 'com_ping'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_ping      | 2     |
+---------------+-------+

The test script:

< ?php

$conn = mysql_connect('127.0.0.1:5550','root','msandbox');
mysql_select_db('test',$conn);

if (!mysql_ping ($conn)) {
   echo "Connection Dead!\n";
} else {
   echo "Connected!\n";
}

?>

Disclaimer: Not sure if/when/where this might get in! It is a change in behavior with regards to Com_admin_commands

PERFORMANCE_SCHEMA hits Prime Time!

I’ve talked about PERFORMANCE_SCHEMA in the past – almost a year ago now.

Back then the feature was just coming together in to something useable, and was on the cusp of moving towards code review. It entered code review, and went around, and around for 9 months, whilst various refinements were made.

Never the less, Marc Alff persevered (much respect!), and yesterday pushed his final merge in to the mysql-next-mr bzr tree. PERFORMANCE_SCHEMA is now awaiting the next milestone release, and will be a part of the next GA release of MySQL!

We have the initial documentation ready:

http://dev.mysql.com/doc/performance-schema/en/index.html

This first round adds the infrastructure to take monitoring of the MySQL Server to the next level, initially adding in instrumentation for sync points (mutexes, rw locks, etc.) and file IO, in the SQL layer, and most of the default storage engines (all those controlled by MySQL/Sun).

Here’s a shout out to the other storage engine developers – we’d love you to start looking at instrumenting your own engines as well. Ask away on the internals@ list – I’m sure Marc will be more than willing to help.

And now that we have the above in – what’s next? It’s a good question! Here’s a list of the major outstanding worklogs:

WL#4674 PERFORMANCE_SCHEMA Setup For Actors
WL#4895 PERFORMANCE_SCHEMA Instrumenting Table IO
WL#4896 PERFORMANCE_SCHEMA Instrumenting Net IO
WL#4878 PERFORMANCE_SCHEMA Trace

Personally, I’m voting for WL#4895 next (well, I’d like to see InnoDB instrumentation too!), but you can vote for what ever you are most interested in via the votes in Worklog as well.

Thanks for all the hard work Marc!

Scientists say dolphins should be treated as ‘non-human persons’

Being an employee of MySQL (past and present), every time I see something related to dolphins it catches my eye – and I just came across this:

Scientists say dolphins should be treated as ‘non-human persons’

The treatment of dolphins has long been an issue for me, I utterly despise some of the things I have seen, and read about. MySQL’s dolphin – Sakila – has always been seen as a symbol of freedom. This is marred by reality, hopefully this will change:

“The researchers argue that their work shows it is morally unacceptable to keep such intelligent animals in amusement parks or to kill them for food or by accident when fishing. Some 300,000 whales, dolphins and porpoises die in this way each year.”

Some other things in the article also ring true for me, in other ways. :)

“The studies show how dolphins have distinct personalities, a strong sense of self and can think about the future.”

“It has also become clear that they are â??culturalâ? animals, meaning that new types of behaviour can quickly be picked up by one dolphin from another.”

“Other research has shown dolphins can solve difficult problems, while those living in the wild co-operate in ways that imply complex social structures and a high level of emotional sophistication.”

FWIW, I’ve donated to both http://www.orcaweb.org.uk/ and http://www.wdcs.org/ in the past, they are both wonderfully worthy causes.

Grouping by Arbitrary Time Ranges (Graphing What You Can See)

First, the back story. One of the MEM developers asked me today about an interesting problem:

We have graphs in MEM that can be generated against an arbitrary time range in the UI – even for a year or more if the data is not purged. Currently MEM does not do any kind of rolling up of the data (in an RRD style), and pulls graph data from each agent/instance on a 1 minute interval. So if you wanted to pull, for instance, the last 3 months worth of data in to a graph – the server back end basically goes back to the database and requests all of the rows – all ~43,829 of them, oh, and that’s for each series – and then calculate deltas on the server side if need be (we store raw values), and stuffs the data in to a graphing library to draw the graph.

Further, graphs are only of a limited (but adjustable) size on the screen – I run MEM with a 900 pixel wide graph personally – so there’s only ~900 points (maybe a few less, with the axis etc. being shown) that can be filled with data.

Trying to fill 900 points with ~43,000 points is an exercise in futility. Damn those pixels. Why doesn’t everybody have a cinema in their ops centers? Filling up your memory with all that data on the app server side is also another issue.

So we’ve been looking at various ways of a) rolling this data in storage, and b) only selecting what can be shown on the screen if possible. The MEM dev hit me up about the latter today, and basically asked – “Given a time range, and a number of pixels, how we can we group a set of rows together to get an aggregated result that only returns the same number of rows as we have pixels?”

Math to the rescue!

Luckily we store the timestamps for the data we collect in a BIGINT column – we store a unix style timestamp with millisecond resolution times. If you’re not doing this, then you should convert your DATETIME / TIMESTAMP etc. to an integer (with UNIX_TIMESTAMP) first.

So I started off with the three known constants:

SET @start_time = 1254325343000; // About a month ago
SET @end_time = 1256908959000; // About now
SET @pixels = 900;

Now, to get the interval that you need to group, you have to first get the full integer range you are dealing with, and then divide it by the number of pixels:

SET @intervals = ((@end_time - @start_time) / @pixels);

Now that we know the interval size that we are going to GROUP together, how do we group it? Math to the rescue again – to generate the time slices you round the whole timestamp number to the nearest interval, and then convert it back again, basically:

FLOOR( (timestamp_col) /  @intervals) *  @intervals

To generate the range, I used a little CONCAT magic (and note that you add one before converting back for the end of the range), and then GROUP BY the output, here’s an example based on the MEM data set – which stores it’s timestamp in “end_time”:

SELECT instance_attribute_id, 
       SUM(value) sum_value,
       MIN(value) min_value,
       MAX(value) max_value,
       AVG(value) avg_value,
       COUNT(value) agg_points,
       CONCAT( ROUND( FLOOR( (end_time) /  @intervals) *  @intervals), ' - ', 
               ROUND( (FLOOR( (end_time) /  @intervals ) + 1) *  @intervals)) AS time_range
  FROM dc_ng_long_now 
 WHERE instance_attribute_id IN (73, 76, 77, 79)  
   AND end_time BETWEEN @start_time AND @end_time
 GROUP BY instance_attribute_id, time_range

The above is what could be used to generate the CPU Usage graph (the instance ID’s are each of the series) in the MEM UI, for a month of data. Some sample output:

+-----------------------+----------------+--------------+--------------+-------------------+------------+-------------------------------+
| instance_attribute_id | sum_value      | min_value    | max_value    | avg_value         | agg_points | time_range                    |
+-----------------------+----------------+--------------+--------------+-------------------+------------+-------------------------------+
|                    73 |   884176993340 | 442088260860 | 442088732480 | 442088496670.0000 |          2 | 1254322602524 - 1254325473209 | 
|                    73 | 21220811440010 | 442089201580 | 442111284920 | 442100238333.5417 |         48 | 1254325473209 - 1254328343893 | 
|                    73 | 21221895300670 | 442111747690 | 442133893930 | 442122818763.9583 |         48 | 1254328343893 - 1254331214578 | 
|                    73 | 21222981255310 | 442134365100 | 442156523250 | 442145442818.9583 |         48 | 1254331214578 - 1254334085262 | 
|                    73 | 20781887896940 | 442156993950 | 442178660650 | 442167827594.4681 |         47 | 1254334085262 - 1254336955947 | 
|                    73 | 21225129132600 | 442179133270 | 442201248030 | 442190190262.5000 |         48 | 1254336955947 - 1254339826631 | 
|                    73 | 21226213026150 | 442201720960 | 442223827720 | 442212771378.1250 |         48 | 1254339826631 - 1254342697316 | 
|                    73 | 21227293251850 | 442224296570 | 442246191720 | 442235276080.2083 |         48 | 1254342697316 - 1254345568000 | 
|                    73 | 21228371431870 | 442246663300 | 442268791290 | 442257738163.9583 |         48 | 1254345568000 - 1254348438684 | 
|                    73 | 21229455531730 | 442269262370 | 442291404770 | 442280323577.7083 |         48 | 1254348438684 - 1254351309369 | 
|                    73 | 20788226878750 | 442291875550 | 442313515300 | 442302699547.8723 |         47 | 1254351309369 - 1254354180053 | 
|                    73 | 21231576621020 | 442313990190 | 442333971060 | 442324512937.9167 |         48 | 1254354180053 - 1254357050738 | 
|                    73 | 21232474653460 | 442334346060 | 442352121880 | 442343221947.0833 |         48 | 1254357050738 - 1254359921422 | 
|                    73 | 21233360433480 | 442352500510 | 442371671120 | 442361675697.5000 |         48 | 1254359921422 - 1254362792107 | 
|                    73 | 21234392588940 | 442372144460 | 442394227610 | 442383178936.2500 |         48 | 1254362792107 - 1254365662791 | 
|                    73 | 21235472883050 | 442394698990 | 442416721710 | 442405685063.5417 |         48 | 1254365662791 - 1254368533476 | 
|                    73 | 21236557207050 | 442417196420 | 442439347780 | 442428275146.8750 |         48 | 1254368533476 - 1254371404160 | 
|                    73 | 20795177199970 | 442439809050 | 442461261420 | 442450578722.7660 |         47 | 1254371404160 - 1254374274844 | 
|                    73 | 21238693926960 | 442461733210 | 442483851610 | 442472790145.0000 |         48 | 1254374274844 - 1254377145529 | 
|                    73 | 21239778454900 | 442484323030 | 442506442450 | 442495384477.0833 |         48 | 1254377145529 - 1254380016213 | 

We can see how many rows have actually been aggregated in to the interval too (scroll to the right) – so that we can decide if there were enough intervals during the first row to be a good enough average (sometimes the first and last intervals may not be, and should perhaps be discarded), and the the interval that each row is computed for, i.e “1254325473209 – 1254328343893″

And a little verification:


mysql> select count(*) 
    ->   from dc_ng_long_now 
    ->  where instance_attribute_id in (73, 76, 77, 79) 
    ->    and end_time between @start_time and @end_time;
+----------+
| count(*) |
+----------+
|   172240 | 
+----------+
1 row in set (0.12 sec)

mysql> select count(*) from(
    -> select instance_attribute_id, 
    ->        MIN(value),
    ->        MAX(value),
    ->        AVG(value),
    ->        CONCAT( ROUND(FLOOR( end_time /  @intervals) *  @intervals), ' - ', 
    ->                ROUND(( FLOOR( end_time /  @intervals ) + 1 ) *  @intervals)) AS time_range
    ->   from dc_ng_long_now 
    ->  where instance_attribute_id in (73, 76, 77, 79) 
    ->    and end_time between @start_time and @end_time
    ->  group by instance_attribute_id, time_range
    -> ) s1;
+----------+
| count(*) |
+----------+
|     3604 | 
+----------+
1 row in set (1.01 sec)

Note, I’m getting 4 sets of 901 rows, in the above case I’m getting roughly a 48 to 1 compression ratio for a month of data – going from 172,240 rows to 3604!.

For MEM, there’s still work to do, but we’re on a roll now! Hopefully graph performance will be greatly improved in coming versions! :)

MySQL University Session – Customizing MySQL Enterprise Monitor

Just a quick note to let the masses know that I will be hosting a MySQL University session tomorrow, based on the talk that I gave at the MySQL UC in April – Customizing MySQL Enterprise Monitor.

It will be at 14:00 UTC – so if you are at all interested in MEM, and want to know how to bend it towards your needs, then come along! I’ll see you there.

EDIT: OOOPS, it’s 13:00 UTC