PERFORMANCE_SCHEMA, Maria and the Tokutek Challenge

As Peter Gulutzan just announced – we’re opening up a “new” worklog that we’ve been working on, Worklog #2360. PERFORMANCE_SCHEMA.

In fact – it’s not “new”, it’s something that has been in the worklog system for a long time, and has had much much much discussion internally between some of the brightest engineers in the group.

The astute among you out there that read my post on the benchmark with Maria for the the tokutek challenge, may have noted this in the configure line that I used:

./configure –prefix=/usr/local/mysql –localstatedir=/data0/mysqldata \
–without-query-cache –with-extra-charsets=complex –with-pic –with-fast-mutexes \
–enable-thread-safe-client –with-readline –with-big-tables –with-zlib-dir=bundled \
–with-plugin-maria –with-plugin-partition –with-plugin-perfschema \
–without-innodb –without-falcon –without-ndbcluster –without-archive-storage-engine \
–without-example-storage-engine –without-federated-storage-engine –without-backhole-storage-engine

I don’t wish to steal Peter’s thunder (he’s worked enormously hard with this project), but I’m going to take this opportunity to post a couple of snippets that I took away from the first benchmark, and let Peter explain them, and the other tables, in his coming posts.

First, let’s check what are the hottest parts of the server where mutexes, IO (there has been more added with Maria in the tree since my tests here, in fact as a result of them I think), conditions or rw locks etc. are concerned whilst I was running the benchmark:

mysql> SELECT *
    ->   FROM events_waits_summary_by_event_name
    ->  WHERE sum_timer_wait != 0
    -> ORDER BY sum_timer_wait DESC;
+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
| EVENT_NAME                                               | COUNT_WAIT | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
| wait/synch/mutex/maria/PAGECACHE::cache_lock             | 2147483647 | 13466583113436 |            138 |           1678 |      544893048 |
| wait/synch/cond/maria/translog_d::log_flush              |       5769 |  2961643469364 |          48942 |      513372069 |     4294177986 |
| wait/synch/cond/mysys/my_thread_var::suspend             |        776 |   805743293206 |          12936 |     1038328986 |     4283692164 |
| wait/synch/mutex/maria/translog_buffer::mutex            | 2147483647 |   577561613332 |            138 |            234 |     4231638106 |
| wait/synch/mutex/maria/translog_d:file_header            | 2147483647 |   422453242530 |            138 |            171 |       10497066 |
| wait/synch/mutex/maria/MARIA_FILE_BITMAP::bitmap_lock    |  735984803 |   169563172188 |            144 |            230 |     1244470326 |
| wait/synch/rwlock/maria/MARIA_KEYDEF::root_lock          |  248027518 |    65520201900 |            162 |            264 |       10019208 |
| wait/io/file/sql/FRM                                     |        801 |    12940295718 |            150 |       16155175 |      515501940 |
| wait/synch/mutex/maria/MARIA_SHARE::intern_lock          |    3854083 |     4986469182 |            150 |           1293 |     2181619680 |
| wait/synch/mutex/sql/LOCK_global_system_variables        |    5208428 |      863439342 |            144 |            165 |         559086 |
| wait/synch/mutex/maria/translog_d::sent_to_disk          |     735999 |      508699002 |            150 |            691 |        1163784 |
| wait/synch/rwlock/maria/translog_d::open_files           |     491046 |      434394126 |            168 |            884 |         473808 |
| wait/synch/mutex/maria/MARIA_SHARE::key_del_lock         |    1665267 |      429800430 |            150 |            258 |        1077558 |
| wait/io/file/maria/Control                               |         92 |      429294504 |          12552 |        4666244 |       31523556 |
| wait/synch/mutex/mysys/LOCK_alarm                        |     882996 |      350870472 |            150 |            397 |         479982 |
| wait/synch/mutex/sql/LOCK_mdl                            |     758523 |      312468192 |            150 |            411 |         147180 |
| wait/synch/mutex/sql/LOCK_open                           |     736908 |      308840712 |            150 |            419 |         475176 |
| wait/synch/mutex/sql/LOCK_thread_count                   |    1036629 |      244505298 |            144 |            235 |         375186 |
| wait/synch/mutex/maria/translog_d::log_flush             |     565581 |      227091810 |            138 |            401 |         177606 |
| wait/synch/mutex/sql/LOCK_global_read_lock               |     490683 |      198332490 |            144 |            404 |         476472 |
| wait/synch/mutex/sql/LOCK_plugin                         |     588674 |      158860836 |            144 |            269 |          56688 |
| wait/synch/rwlock/sql/LOCK_grant                         |     267378 |      157255554 |            180 |            588 |        2143872 |
| wait/synch/mutex/maria/TRN::state_lock                   |     735985 |      147520044 |            150 |            200 |         154188 |
| wait/synch/mutex/maria/translog_d::dirty_buffer_mask     |     490790 |      110804592 |            162 |            225 |         135702 |
| wait/synch/mutex/maria/LOCK_trn_list                     |     490678 |      106710882 |            156 |            217 |         474672 |
| wait/synch/mutex/mysys/THD_LOCK_mutex                    |     490848 |       97715760 |            162 |            199 |          65340 |
| wait/synch/cond/maria/translog_b::waiting_filling_buffer |          5 |       20083674 |        1284888 |        4016734 |        6963702 |
| wait/synch/mutex/mysys/THR_LOCK_lock                     |      43560 |       19417026 |            162 |            445 |          25980 |
| wait/io/file/maria/DataFile                              |         20 |       14346036 |           4596 |         717301 |       14193714 |
| wait/synch/mutex/mysys/BITMAP_mutex                      |      43538 |       11462268 |            156 |            263 |          16278 |
| wait/synch/mutex/sql/LOCK_status                         |      21718 |       10181124 |            198 |            468 |           7518 |
| wait/synch/mutex/sql/LOCK_active_mi                      |      43440 |        9080496 |            180 |            209 |          17550 |
| wait/synch/mutex/sql/LOCK_event_metadata                 |      10804 |        4669272 |            210 |            432 |           8166 |
| wait/synch/mutex/sql/RLI_data_lock                       |      21718 |        4548996 |            180 |            209 |          17376 |
| wait/synch/rwlock/sql/LOCK_system_variables_hash         |      10866 |        4383222 |            186 |            403 |           7410 |
| wait/synch/mutex/sql/LOCK_prepared_stmt_count            |      10877 |        2389752 |            186 |            219 |          14400 |
| wait/synch/mutex/mysys/THR_LOCK_open                     |       1260 |         451236 |            156 |            358 |           2622 |
| wait/io/file/maria/KeyFile                               |         50 |         317220 |           2676 |           6344 |          30624 |
| wait/io/file/sql/ERRMSG                                  |          3 |         115410 |          13980 |          38470 |          81966 |
| wait/synch/mutex/sql/hash_filo::lock                     |        557 |         105720 |            150 |            189 |            654 |
| wait/io/file/mysys/charset                               |          1 |          46644 |          46644 |          46644 |          46644 |
| wait/io/file/sql/DbOpt                                   |          1 |          40734 |          40734 |          40734 |          40734 |
| wait/synch/mutex/mysys/KEY_CACHE::cache_lock             |        158 |          32664 |            150 |            206 |            516 |
| wait/io/file/sql/pid                                     |          1 |          32382 |          32382 |          32382 |          32382 |
| wait/synch/mutex/mysys/my_thread_var::mutex              |        115 |          31194 |            162 |            271 |            834 |
| wait/synch/mutex/mysys/THR_LOCK_charset                  |        157 |          28620 |            150 |            182 |            528 |
| wait/synch/mutex/mysys/THR_LOCK_threads                  |         22 |          25920 |            498 |           1178 |           5838 |
| wait/synch/mutex/myisam/MYISAM_SHARE::intern_lock        |         82 |          20502 |            150 |            250 |            480 |
| wait/synch/rwlock/myisam/MYISAM_SHARE::key_root_lock     |         66 |          19386 |            174 |            293 |            900 |
| wait/synch/mutex/maria/THR_LOCK_maria                    |         42 |          17574 |            204 |            418 |            798 |
| wait/synch/mutex/sql/LOCK_connection_count               |         39 |           9258 |            192 |            237 |            468 |
| wait/synch/mutex/sql/THR_LOCK_DDL_blocker                |         22 |           5664 |            156 |            257 |            534 |
| wait/synch/mutex/sql/LOCK_xid_cache                      |         18 |           5178 |            234 |            287 |            456 |
| wait/synch/mutex/sql/LOCK_delete                         |         18 |           4932 |            192 |            274 |            534 |
| wait/synch/mutex/sql/LOCK_ha_data                        |         12 |           3954 |            222 |            329 |            546 |
| wait/synch/mutex/sql/THR_LOCK_DDL_is_blocked             |         11 |           3294 |            210 |            299 |            528 |
| wait/synch/mutex/maria/MARIA_SHARE::close_lock           |         15 |           3114 |            150 |            207 |            234 |
| wait/synch/mutex/sql/LOCK_lock_db                        |         10 |           3036 |            198 |            303 |            444 |
| wait/synch/mutex/mysys/THR_LOCK_myisam                   |          3 |           1560 |            402 |            520 |            708 |
| wait/synch/mutex/sql/LOCK_user_conn                      |          2 |            696 |            240 |            348 |            456 |
| wait/synch/mutex/sql/LOCK_mysql_create_db                |          2 |            522 |            258 |            261 |            264 |
+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
61 rows in set (0.00 sec)

The times that are being recorded above are actually CPU cycles, if you know the power of your CPUs you could convert those to microseconds fairly easily – or you can tell the PERFORMANCE_SCHEMA to record in other formats too.

mysql> select * from setup_timers;
+------+------------+
| NAME | TIMER_NAME |
+------+------------+
| Wait | CYCLE      |
+------+------------+
1 row in set (0.00 sec)

Next, let’s take a look at what the thread that was inserting the 1 billion rows had been doing. Yes folks, this takes SHOW PROFILES and SHOW ENGINE INNODB MUTEX to a whole different level (it will pretty much make them defunct, imho, if we can get InnoDB using this instrumentation).

mysql> SELECT *
    ->   FROM events_waits_summary_by_thread_by_event_name
    ->  WHERE thread_id = 12
    ->    AND sum_timer_wait != 0
    -> ORDER BY sum_timer_wait DESC;
+-----------+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
| THREAD_ID | EVENT_NAME                                               | COUNT_WAIT | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+-----------+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
|        12 | wait/synch/mutex/maria/PAGECACHE::cache_lock             | 2147483647 | 13461313032030 |            138 |           1684 |      544893048 |
|        12 | wait/synch/cond/maria/translog_d::log_flush              |       5758 |  2961367306584 |          48942 |      514304846 |     4294177986 |
|        12 | wait/synch/cond/mysys/my_thread_var::suspend             |        769 |   805570319284 |          12936 |     1047555681 |     4283692164 |
|        12 | wait/synch/mutex/maria/translog_buffer::mutex            | 2147483647 |   575354215426 |            138 |            234 |     4231638106 |
|        12 | wait/synch/mutex/maria/translog_d:file_header            | 2147483647 |   420424031916 |            138 |            171 |       10497066 |
|        12 | wait/synch/mutex/maria/MARIA_FILE_BITMAP::bitmap_lock    |  732428925 |   168888242736 |            144 |            230 |     1244470326 |
|        12 | wait/synch/rwlock/maria/MARIA_KEYDEF::root_lock          |  246828873 |    65195270814 |            162 |            264 |       10019208 |
|        12 | wait/synch/mutex/maria/MARIA_SHARE::intern_lock          |    3836359 |     4981866294 |            150 |           1298 |     2181619680 |
|        12 | wait/synch/mutex/maria/translog_d::sent_to_disk          |     732420 |      506173524 |            150 |            691 |        1163784 |
|        12 | wait/synch/rwlock/maria/translog_d::open_files           |     488660 |      432319110 |            168 |            884 |         473808 |
|        12 | wait/io/file/maria/Control                               |         92 |      429294504 |          12552 |        4666244 |       31523556 |
|        12 | wait/synch/mutex/maria/MARIA_SHARE::key_del_lock         |    1658211 |      428136198 |            150 |            258 |        1077558 |
|        12 | wait/synch/mutex/mysys/LOCK_alarm                        |     834892 |      311987064 |            150 |            373 |         479982 |
|        12 | wait/synch/mutex/sql/LOCK_open                           |     732430 |      307217022 |            150 |            419 |         475176 |
|        12 | wait/synch/mutex/sql/LOCK_mdl                            |     732432 |      299536968 |            162 |            408 |         147180 |
|        12 | wait/synch/mutex/sql/LOCK_thread_count                   |     976577 |      230754018 |            144 |            236 |         375186 |
|        12 | wait/synch/mutex/maria/translog_d::log_flush             |     563195 |      226177674 |            138 |            401 |         177606 |
|        12 | wait/synch/mutex/sql/LOCK_global_read_lock               |     488287 |      197541498 |            180 |            404 |         476472 |
|        12 | wait/synch/mutex/maria/TRN::state_lock                   |     732430 |      146788518 |            150 |            200 |         154188 |
|        12 | wait/synch/rwlock/sql/LOCK_grant                         |     244146 |      137635878 |            180 |            563 |        2143872 |
|        12 | wait/synch/mutex/sql/LOCK_plugin                         |     488290 |      136693146 |            150 |            279 |          56688 |
|        12 | wait/synch/mutex/maria/translog_d::dirty_buffer_mask     |     488404 |      110289804 |            162 |            225 |         135702 |
|        12 | wait/synch/mutex/maria/LOCK_trn_list                     |     488288 |      106188246 |            156 |            217 |         474672 |
|        12 | wait/synch/mutex/mysys/THD_LOCK_mutex                    |     488287 |       97045002 |            180 |            198 |          65340 |
|        12 | wait/synch/cond/maria/translog_b::waiting_filling_buffer |          5 |       20083674 |        1284888 |        4016734 |        6963702 |
|        12 | wait/synch/mutex/mysys/THR_LOCK_open                     |         92 |         114222 |            414 |           1241 |           2622 |
|        12 | wait/io/file/maria/KeyFile                               |          3 |          20400 |           4374 |           6800 |           9606 |
|        12 | wait/synch/rwlock/sql/LOCK_system_variables_hash         |          3 |           1752 |            288 |            584 |            750 |
|        12 | wait/synch/mutex/mysys/THR_LOCK_threads                  |          1 |            852 |            852 |            852 |            852 |
|        12 | wait/synch/mutex/sql/LOCK_global_system_variables        |          2 |            498 |            234 |            249 |            264 |
|        12 | wait/synch/mutex/sql/LOCK_ha_data                        |          1 |            414 |            414 |            414 |            414 |
|        12 | wait/synch/mutex/sql/hash_filo::lock                     |          1 |            228 |            228 |            228 |            228 |
|        12 | wait/synch/mutex/sql/LOCK_connection_count               |          1 |            204 |            204 |            204 |            204 |
+-----------+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
33 rows in set (0.00 sec)

Bonus points to those that spot the obvious “bug” in the above! πŸ™‚

There is so much more to this worklog, there’s a number more tables, there’s a lot more information, and a lot more ways to look at it. The future of monitoring MySQL just got much brighter.

Stay tuned! I’m sure Peter is itching to spread more information!

16 thoughts on “PERFORMANCE_SCHEMA, Maria and the Tokutek Challenge”

  1. Wow, they used 32-bit signed int for the counter or while sending the result. Please tell me that no global mutexes are locked while collecting this data.

    I want perf stats at a higher level. We already have much of this data for InnoDB via SHOW MUTEX STATUS but how do you tune an app when a certain mutex is hot? And with the new data I already know which mutexes will be hot: LOCK_open, LOCK_mdl, the HEAP engine mutex and the MyISAM key buffer mutex. I need to know which user, client IP address or table is responsible for much of this load and then I might be able to isolate the statements and get the users to change. When will the equivalent of SHOW USER_STATISTICS and SHOW TABLE_STATISTICS be provided? I hope that is in tomorrow’s announcement.

    1. Yea, just a bit of an oversight. πŸ™‚ I had a chat with Marc Alff (the core developer for PERFORMANCE_SCHEMA), and he’ll be changing this.

      PERFORMANCE_SCHEMA doesn’t have a single mutex within it, it’s entirely non-locking. This *may* mean that you get slightly inconsistent data if you’re polling it often, but that’s a performance trade off that we decided to make.

      We don’t have something akin to SHOW USER_STATISTICS right now with PERFORMANCE_SCHEMA (although I’d like to push that all to MEM via QUAN personally), but PERFORAMANCE_SCHEMA can tell you what IO you are doing on tables and indexes etc. to give you some focus in to those that have high waits, or high IO. This was in a little bit of flux with the tree I was trying on, so the stats are not complete, but…:

      mysql> select * from events_waits_summary_by_instance where object_instance_begin like '%purchases%';
      +-----------------------------+------------------------------------------------+------------+----------------+----------------+----------------+----------------+
      | EVENT_NAME                  | OBJECT_INSTANCE_BEGIN                          | COUNT_WAIT | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
      +-----------------------------+------------------------------------------------+------------+----------------+----------------+----------------+----------------+
      | wait/io/file/maria/DataFile | ./iiBench/purchases_noindex.MAD                |          0 |              0 |              0 |              0 |              0 | 
      | wait/io/file/maria/DataFile | ./iiBench/purchases_index.MAD                  |          0 |              0 |              0 |              0 |              0 | 
      | wait/io/file/sql/FRM        | ./iiBench/purchases_index.frm                  |        101 |       36548628 |            150 |         361867 |       35934966 | 
      | wait/io/file/sql/FRM        | ./iiBench/purchases_noindex.frm                |        101 |       12265596 |            150 |         121441 |       11722698 | 
      | wait/io/file/maria/KeyFile  | /data0/mysqldata/iiBench/purchases_index.MAI   |         13 |         114378 |           3192 |           8798 |          30624 | 
      | wait/io/file/maria/KeyFile  | /data0/mysqldata/iiBench/purchases_noindex.MAI |          6 |          37788 |           2922 |           6298 |           9714 | 
      +-----------------------------+------------------------------------------------+------------+----------------+----------------+----------------+----------------+
      6 rows in set (0.00 sec)
      mysql> select * from file_summary_by_instance where file_name like '%purchases%';
      +------------------------------------------------+-----------------------------+------------+-------------+--------------------------+---------------------------+
      | FILE_NAME                                      | EVENT_NAME                  | COUNT_READ | COUNT_WRITE | SUM_NUMBER_OF_BYTES_READ | SUM_NUMBER_OF_BYTES_WRITE |
      +------------------------------------------------+-----------------------------+------------+-------------+--------------------------+---------------------------+
      | ./iiBench/purchases_noindex.MAD                | wait/io/file/maria/DataFile |          0 |           0 |                        0 |                         0 | 
      | ./iiBench/purchases_index.MAD                  | wait/io/file/maria/DataFile |          0 |           0 |                        0 |                         0 | 
      | ./iiBench/purchases_index.frm                  | wait/io/file/sql/FRM        |         37 |          64 |                     3638 |                      9952 | 
      | ./iiBench/purchases_noindex.frm                | wait/io/file/sql/FRM        |         37 |          64 |                     3118 |                      9692 | 
      | /data0/mysqldata/iiBench/purchases_index.MAI   | wait/io/file/maria/KeyFile  |          6 |           7 |                     2616 |                      2421 | 
      | /data0/mysqldata/iiBench/purchases_noindex.MAI | wait/io/file/maria/KeyFile  |          6 |           0 |                     1680 |                         0 | 
      +------------------------------------------------+-----------------------------+------------+-------------+--------------------------+---------------------------+
      6 rows in set (0.00 sec)

      There’s some more work to be done – I’d like to see global query identifiers, and the query log being moved to use a performance_schema table (it’s a storage engine itself), then we can really start doing some great stuff with this!

  2. I have done the same with some of my counters (use 32 instead of 64 bit). We also had too much mutex contention in the implementation of stats collection for SHOW USER_STATS and SHOW TABLE_STATS — a fix for that will be published soon.

    Are there IO stats for InnoDB? Is there an API so that storage engines can report stats? InnoDB collects mutex wait stats for its rw-mutex implementation.

  3. Mark, I see a couple of things others haven’t mentioned. First, wait/synch/mutex/maria/translog_d:file_header has only a single colon. I assume this is because of a typo in the code, not your blog post, and that the slashes and colons are only a naming convention, without real semantics tied to the actual code executing (i.e. I’m assuming it’s up to the coder to put the correct name for each event).

    Secondly, you selected * from the table, ordering by a column that doesn’t show up in the results. I assume this is because you edited the results to keep them from line-wrapping πŸ™‚

  4. Mark, I’ve clicked through to the worklog and it doesn’t look open. “Unfortunately, either a task with that ID does not exist, or it is not a publicly available Worklog entry.”

  5. @Mark Callaghan

    Mark, would a backtrace of where the mutex is being locked be useful? So you could perhaps then narrow down the callpath that is the cause of contention (query is pretty easy to get from that… go up to the DRIZZLEparse….errr.. MYSQLparse call and grab the query string).

    It’s pretty easy to programatically get call stack….

    For me, this is what I’d want if merging something like this into drizzle.

  6. Great to see this project is moving along. Discussions for it started long before I left MySQL πŸ™‚

    It is interesting how flexible is it – for example is it single level or there are multiple blocks which can be one within another ?

    I’m also wondering about instances of the objects – for example for Innodb has munch of mutexes in the buffer pool – are waits scattered or it is couple of hot blocks ?

    There are also multiple goals which may need a bit different approaches – performance profiling for developers to make code more optimal and profiling to understand where query response time comes from – how much was CPU time, how much was waiting on row level locks, mutexes, disk io…

    The backtrace as well us understanfing who held the lock (causing wait) is another important data.

  7. @Stewart – I don’t think we will need call stacks for a while at least when debugging Innodb and the code above it. It will be nice when we fix enough things and hot mutexes are infrequent that we need call stacks. Most of the locks are hot from one place and the hot locks are well known. Changes are needed to improve the propagation of the mutex name from the creator to the function that does the creation, otherwise the common mutex creation function in Innodb is listed as the owner of too many mutexes.

Got something to say?