PERFORMANCE_SCHEMA, Maria and the Tokutek Challenge

As Peter Gulutzan just announced – we’re opening up a “new” worklog that we’ve been working on, Worklog #2360. PERFORMANCE_SCHEMA.

In fact – it’s not “new”, it’s something that has been in the worklog system for a long time, and has had much much much discussion internally between some of the brightest engineers in the group.

The astute among you out there that read my post on the benchmark with Maria for the the tokutek challenge, may have noted this in the configure line that I used:

./configure –prefix=/usr/local/mysql –localstatedir=/data0/mysqldata \
–without-query-cache –with-extra-charsets=complex –with-pic –with-fast-mutexes \
–enable-thread-safe-client –with-readline –with-big-tables –with-zlib-dir=bundled \
–with-plugin-maria –with-plugin-partition –with-plugin-perfschema \
–without-innodb –without-falcon –without-ndbcluster –without-archive-storage-engine \
–without-example-storage-engine –without-federated-storage-engine –without-backhole-storage-engine

I don’t wish to steal Peter’s thunder (he’s worked enormously hard with this project), but I’m going to take this opportunity to post a couple of snippets that I took away from the first benchmark, and let Peter explain them, and the other tables, in his coming posts.

First, let’s check what are the hottest parts of the server where mutexes, IO (there has been more added with Maria in the tree since my tests here, in fact as a result of them I think), conditions or rw locks etc. are concerned whilst I was running the benchmark:

mysql> SELECT *
    ->   FROM events_waits_summary_by_event_name
    ->  WHERE sum_timer_wait != 0
    -> ORDER BY sum_timer_wait DESC;
+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
| EVENT_NAME                                               | COUNT_WAIT | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
| wait/synch/mutex/maria/PAGECACHE::cache_lock             | 2147483647 | 13466583113436 |            138 |           1678 |      544893048 |
| wait/synch/cond/maria/translog_d::log_flush              |       5769 |  2961643469364 |          48942 |      513372069 |     4294177986 |
| wait/synch/cond/mysys/my_thread_var::suspend             |        776 |   805743293206 |          12936 |     1038328986 |     4283692164 |
| wait/synch/mutex/maria/translog_buffer::mutex            | 2147483647 |   577561613332 |            138 |            234 |     4231638106 |
| wait/synch/mutex/maria/translog_d:file_header            | 2147483647 |   422453242530 |            138 |            171 |       10497066 |
| wait/synch/mutex/maria/MARIA_FILE_BITMAP::bitmap_lock    |  735984803 |   169563172188 |            144 |            230 |     1244470326 |
| wait/synch/rwlock/maria/MARIA_KEYDEF::root_lock          |  248027518 |    65520201900 |            162 |            264 |       10019208 |
| wait/io/file/sql/FRM                                     |        801 |    12940295718 |            150 |       16155175 |      515501940 |
| wait/synch/mutex/maria/MARIA_SHARE::intern_lock          |    3854083 |     4986469182 |            150 |           1293 |     2181619680 |
| wait/synch/mutex/sql/LOCK_global_system_variables        |    5208428 |      863439342 |            144 |            165 |         559086 |
| wait/synch/mutex/maria/translog_d::sent_to_disk          |     735999 |      508699002 |            150 |            691 |        1163784 |
| wait/synch/rwlock/maria/translog_d::open_files           |     491046 |      434394126 |            168 |            884 |         473808 |
| wait/synch/mutex/maria/MARIA_SHARE::key_del_lock         |    1665267 |      429800430 |            150 |            258 |        1077558 |
| wait/io/file/maria/Control                               |         92 |      429294504 |          12552 |        4666244 |       31523556 |
| wait/synch/mutex/mysys/LOCK_alarm                        |     882996 |      350870472 |            150 |            397 |         479982 |
| wait/synch/mutex/sql/LOCK_mdl                            |     758523 |      312468192 |            150 |            411 |         147180 |
| wait/synch/mutex/sql/LOCK_open                           |     736908 |      308840712 |            150 |            419 |         475176 |
| wait/synch/mutex/sql/LOCK_thread_count                   |    1036629 |      244505298 |            144 |            235 |         375186 |
| wait/synch/mutex/maria/translog_d::log_flush             |     565581 |      227091810 |            138 |            401 |         177606 |
| wait/synch/mutex/sql/LOCK_global_read_lock               |     490683 |      198332490 |            144 |            404 |         476472 |
| wait/synch/mutex/sql/LOCK_plugin                         |     588674 |      158860836 |            144 |            269 |          56688 |
| wait/synch/rwlock/sql/LOCK_grant                         |     267378 |      157255554 |            180 |            588 |        2143872 |
| wait/synch/mutex/maria/TRN::state_lock                   |     735985 |      147520044 |            150 |            200 |         154188 |
| wait/synch/mutex/maria/translog_d::dirty_buffer_mask     |     490790 |      110804592 |            162 |            225 |         135702 |
| wait/synch/mutex/maria/LOCK_trn_list                     |     490678 |      106710882 |            156 |            217 |         474672 |
| wait/synch/mutex/mysys/THD_LOCK_mutex                    |     490848 |       97715760 |            162 |            199 |          65340 |
| wait/synch/cond/maria/translog_b::waiting_filling_buffer |          5 |       20083674 |        1284888 |        4016734 |        6963702 |
| wait/synch/mutex/mysys/THR_LOCK_lock                     |      43560 |       19417026 |            162 |            445 |          25980 |
| wait/io/file/maria/DataFile                              |         20 |       14346036 |           4596 |         717301 |       14193714 |
| wait/synch/mutex/mysys/BITMAP_mutex                      |      43538 |       11462268 |            156 |            263 |          16278 |
| wait/synch/mutex/sql/LOCK_status                         |      21718 |       10181124 |            198 |            468 |           7518 |
| wait/synch/mutex/sql/LOCK_active_mi                      |      43440 |        9080496 |            180 |            209 |          17550 |
| wait/synch/mutex/sql/LOCK_event_metadata                 |      10804 |        4669272 |            210 |            432 |           8166 |
| wait/synch/mutex/sql/RLI_data_lock                       |      21718 |        4548996 |            180 |            209 |          17376 |
| wait/synch/rwlock/sql/LOCK_system_variables_hash         |      10866 |        4383222 |            186 |            403 |           7410 |
| wait/synch/mutex/sql/LOCK_prepared_stmt_count            |      10877 |        2389752 |            186 |            219 |          14400 |
| wait/synch/mutex/mysys/THR_LOCK_open                     |       1260 |         451236 |            156 |            358 |           2622 |
| wait/io/file/maria/KeyFile                               |         50 |         317220 |           2676 |           6344 |          30624 |
| wait/io/file/sql/ERRMSG                                  |          3 |         115410 |          13980 |          38470 |          81966 |
| wait/synch/mutex/sql/hash_filo::lock                     |        557 |         105720 |            150 |            189 |            654 |
| wait/io/file/mysys/charset                               |          1 |          46644 |          46644 |          46644 |          46644 |
| wait/io/file/sql/DbOpt                                   |          1 |          40734 |          40734 |          40734 |          40734 |
| wait/synch/mutex/mysys/KEY_CACHE::cache_lock             |        158 |          32664 |            150 |            206 |            516 |
| wait/io/file/sql/pid                                     |          1 |          32382 |          32382 |          32382 |          32382 |
| wait/synch/mutex/mysys/my_thread_var::mutex              |        115 |          31194 |            162 |            271 |            834 |
| wait/synch/mutex/mysys/THR_LOCK_charset                  |        157 |          28620 |            150 |            182 |            528 |
| wait/synch/mutex/mysys/THR_LOCK_threads                  |         22 |          25920 |            498 |           1178 |           5838 |
| wait/synch/mutex/myisam/MYISAM_SHARE::intern_lock        |         82 |          20502 |            150 |            250 |            480 |
| wait/synch/rwlock/myisam/MYISAM_SHARE::key_root_lock     |         66 |          19386 |            174 |            293 |            900 |
| wait/synch/mutex/maria/THR_LOCK_maria                    |         42 |          17574 |            204 |            418 |            798 |
| wait/synch/mutex/sql/LOCK_connection_count               |         39 |           9258 |            192 |            237 |            468 |
| wait/synch/mutex/sql/THR_LOCK_DDL_blocker                |         22 |           5664 |            156 |            257 |            534 |
| wait/synch/mutex/sql/LOCK_xid_cache                      |         18 |           5178 |            234 |            287 |            456 |
| wait/synch/mutex/sql/LOCK_delete                         |         18 |           4932 |            192 |            274 |            534 |
| wait/synch/mutex/sql/LOCK_ha_data                        |         12 |           3954 |            222 |            329 |            546 |
| wait/synch/mutex/sql/THR_LOCK_DDL_is_blocked             |         11 |           3294 |            210 |            299 |            528 |
| wait/synch/mutex/maria/MARIA_SHARE::close_lock           |         15 |           3114 |            150 |            207 |            234 |
| wait/synch/mutex/sql/LOCK_lock_db                        |         10 |           3036 |            198 |            303 |            444 |
| wait/synch/mutex/mysys/THR_LOCK_myisam                   |          3 |           1560 |            402 |            520 |            708 |
| wait/synch/mutex/sql/LOCK_user_conn                      |          2 |            696 |            240 |            348 |            456 |
| wait/synch/mutex/sql/LOCK_mysql_create_db                |          2 |            522 |            258 |            261 |            264 |
+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
61 rows in set (0.00 sec)

The times that are being recorded above are actually CPU cycles, if you know the power of your CPUs you could convert those to microseconds fairly easily – or you can tell the PERFORMANCE_SCHEMA to record in other formats too.

mysql> select * from setup_timers;
+------+------------+
| NAME | TIMER_NAME |
+------+------------+
| Wait | CYCLE      |
+------+------------+
1 row in set (0.00 sec)

Next, let’s take a look at what the thread that was inserting the 1 billion rows had been doing. Yes folks, this takes SHOW PROFILES and SHOW ENGINE INNODB MUTEX to a whole different level (it will pretty much make them defunct, imho, if we can get InnoDB using this instrumentation).

mysql> SELECT *
    ->   FROM events_waits_summary_by_thread_by_event_name
    ->  WHERE thread_id = 12
    ->    AND sum_timer_wait != 0
    -> ORDER BY sum_timer_wait DESC;
+-----------+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
| THREAD_ID | EVENT_NAME                                               | COUNT_WAIT | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+-----------+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
|        12 | wait/synch/mutex/maria/PAGECACHE::cache_lock             | 2147483647 | 13461313032030 |            138 |           1684 |      544893048 |
|        12 | wait/synch/cond/maria/translog_d::log_flush              |       5758 |  2961367306584 |          48942 |      514304846 |     4294177986 |
|        12 | wait/synch/cond/mysys/my_thread_var::suspend             |        769 |   805570319284 |          12936 |     1047555681 |     4283692164 |
|        12 | wait/synch/mutex/maria/translog_buffer::mutex            | 2147483647 |   575354215426 |            138 |            234 |     4231638106 |
|        12 | wait/synch/mutex/maria/translog_d:file_header            | 2147483647 |   420424031916 |            138 |            171 |       10497066 |
|        12 | wait/synch/mutex/maria/MARIA_FILE_BITMAP::bitmap_lock    |  732428925 |   168888242736 |            144 |            230 |     1244470326 |
|        12 | wait/synch/rwlock/maria/MARIA_KEYDEF::root_lock          |  246828873 |    65195270814 |            162 |            264 |       10019208 |
|        12 | wait/synch/mutex/maria/MARIA_SHARE::intern_lock          |    3836359 |     4981866294 |            150 |           1298 |     2181619680 |
|        12 | wait/synch/mutex/maria/translog_d::sent_to_disk          |     732420 |      506173524 |            150 |            691 |        1163784 |
|        12 | wait/synch/rwlock/maria/translog_d::open_files           |     488660 |      432319110 |            168 |            884 |         473808 |
|        12 | wait/io/file/maria/Control                               |         92 |      429294504 |          12552 |        4666244 |       31523556 |
|        12 | wait/synch/mutex/maria/MARIA_SHARE::key_del_lock         |    1658211 |      428136198 |            150 |            258 |        1077558 |
|        12 | wait/synch/mutex/mysys/LOCK_alarm                        |     834892 |      311987064 |            150 |            373 |         479982 |
|        12 | wait/synch/mutex/sql/LOCK_open                           |     732430 |      307217022 |            150 |            419 |         475176 |
|        12 | wait/synch/mutex/sql/LOCK_mdl                            |     732432 |      299536968 |            162 |            408 |         147180 |
|        12 | wait/synch/mutex/sql/LOCK_thread_count                   |     976577 |      230754018 |            144 |            236 |         375186 |
|        12 | wait/synch/mutex/maria/translog_d::log_flush             |     563195 |      226177674 |            138 |            401 |         177606 |
|        12 | wait/synch/mutex/sql/LOCK_global_read_lock               |     488287 |      197541498 |            180 |            404 |         476472 |
|        12 | wait/synch/mutex/maria/TRN::state_lock                   |     732430 |      146788518 |            150 |            200 |         154188 |
|        12 | wait/synch/rwlock/sql/LOCK_grant                         |     244146 |      137635878 |            180 |            563 |        2143872 |
|        12 | wait/synch/mutex/sql/LOCK_plugin                         |     488290 |      136693146 |            150 |            279 |          56688 |
|        12 | wait/synch/mutex/maria/translog_d::dirty_buffer_mask     |     488404 |      110289804 |            162 |            225 |         135702 |
|        12 | wait/synch/mutex/maria/LOCK_trn_list                     |     488288 |      106188246 |            156 |            217 |         474672 |
|        12 | wait/synch/mutex/mysys/THD_LOCK_mutex                    |     488287 |       97045002 |            180 |            198 |          65340 |
|        12 | wait/synch/cond/maria/translog_b::waiting_filling_buffer |          5 |       20083674 |        1284888 |        4016734 |        6963702 |
|        12 | wait/synch/mutex/mysys/THR_LOCK_open                     |         92 |         114222 |            414 |           1241 |           2622 |
|        12 | wait/io/file/maria/KeyFile                               |          3 |          20400 |           4374 |           6800 |           9606 |
|        12 | wait/synch/rwlock/sql/LOCK_system_variables_hash         |          3 |           1752 |            288 |            584 |            750 |
|        12 | wait/synch/mutex/mysys/THR_LOCK_threads                  |          1 |            852 |            852 |            852 |            852 |
|        12 | wait/synch/mutex/sql/LOCK_global_system_variables        |          2 |            498 |            234 |            249 |            264 |
|        12 | wait/synch/mutex/sql/LOCK_ha_data                        |          1 |            414 |            414 |            414 |            414 |
|        12 | wait/synch/mutex/sql/hash_filo::lock                     |          1 |            228 |            228 |            228 |            228 |
|        12 | wait/synch/mutex/sql/LOCK_connection_count               |          1 |            204 |            204 |            204 |            204 |
+-----------+----------------------------------------------------------+------------+----------------+----------------+----------------+----------------+
33 rows in set (0.00 sec)

Bonus points to those that spot the obvious “bug” in the above! :)

There is so much more to this worklog, there’s a number more tables, there’s a lot more information, and a lot more ways to look at it. The future of monitoring MySQL just got much brighter.

Stay tuned! I’m sure Peter is itching to spread more information!

Speaking at the MySQL Conference 2009

A little late to post (I’ve known a while), but I thought I’d plug my talk for any interested readers out there, that are going to the conference, and use MEM!

I’m talking about Extending MySQL Enterprise Monitor with Custom Advisors, Graphs and Data Collections.

I’ve had the pleasure of working with the team that writes the MEM software (the “Enterprise Tools” team, internally and lovingly known as the “Merlin Team“, the codename that has survived various renames of the product!) for a little over 3 years now. I can’t say I was there at it’s conception, but I started working with them before the initial release of the product, and have watched (and I like to think helped shape) the product very closely whilst being the “Support Coordinator” for the Support Team for MEM. It’s a great product already, but we have many ideas, it’s going to be an awesome product of the future.

Along the way I’ve helped to write many of the graphs and rules that are released for the MySQL Enterprise Monitor within the default Advisor bundles (along with Andy Bang, one of the original team with the concept) and hope to give MEM users some insights in to how they can extend MEM to suit their own needs.

For example, many users have asked for us to add disk space monitoring – we’re working towards making it more seamless for the next releases (2.0 has taken an interim step for this) – but little know that you can already extend the Monitor to do this within the new 2.0 release:

MEM Disk Monitoring

MEM Disk Monitoring

Come to the talk to find out how – and more, like collecting your own data points (from various sources), graphing them and/or alerting on them! :)

Maria trundles the tokutek challenge

So I saw the tokutek challenge, and wondered to myself how Maria would get along with it. I duly downloaded a 6.0 tree, and the iiBench code, tinkered with it to make it actually build, and fired things up.

I watched it closely, for about a day, then got bored and forgot about it. I remembered today that I should take a look!

CPU Usage (Quad Core)

CPU Usage (Quad Core)

Average rows per second inserted

Average rows per second inserted

Load Averages

Load Averages

You can see, in just over a day the IO load became too heavy to process efficiently.

MySQL was built like this:

./configure –prefix=/usr/local/mysql –localstatedir=/data0/mysqldata \
–without-query-cache –with-extra-charsets=complex –with-pic –with-fast-mutexes \
–enable-thread-safe-client –with-readline –with-big-tables –with-zlib-dir=bundled \
–with-plugin-maria –with-plugin-partition –with-plugin-perfschema \
–without-innodb –without-falcon –without-ndbcluster –without-archive-storage-engine \
–without-example-storage-engine –without-federated-storage-engine –without-backhole-storage-engine

I tinkered with Maria right from the start though, I wanted to see what a longer checkpoint interval would give, so increased it to every 5 minutes – obviously this doesn’t seem great. :) I also wanted to use the same page size as InnoDB out of morbid curiosity. Here’s the my.cnf:

[mysqld]
basedir = /usr/local/mysql
datadir = /data0/mysqldata
log-error = /usr/local/mysql/logs/error.log

maria-block-size = 16384
maria_log_dir_path = /usr/local/mysql/logs
maria_page_checksum = 0
maria_pagecache_buffer_size = 12G
maria-checkpoint-interval = 300

The tables were defined as ENGINE = Maria – i.e TRANSACTIONAL = 1 (with crash recovery) by default.

I added a new custom graph for MEM, to track how the Maria Page Cache gets utilized:

Maria Page Cache Usage

Maria Page Cache Usage

I’ll be making a couple more for Maria as well – including the easy read and write physcial/logical requests from SHOW GLOBAL STATUS (to be released with MEM once Maria is ready, let me know if you want the custom graphs before hand).

The server is RHEL5, Quad Xeon, with 16G RAM, and a 4 disk 10krpm RAID 10 array for the /data0 mountpoint (although using ext3, along with the noop scheduler). Taking a look at iostat when I came back to it, it’s clear that this was my barrier (well, the io wait in the CPU graph is a pretty good indicator as well eh!):

avg-cpu: %user %nice %system %iowait %steal %idle
1.25 0.00 0.00 23.75 0.00 75.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cciss/c0d1 0.00 7.00 9.00 813.00 288.00 6592.00 8.37 142.09 177.35 1.22 100.10

avg-cpu: %user %nice %system %iowait %steal %idle
4.00 0.00 0.25 20.75 0.00 75.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 125.00 0.00 8.00 0.00 1064.00 133.00 0.05 6.25 4.38 3.50
cciss/c0d1 0.00 4.00 7.00 869.00 224.00 7008.00 8.26 142.00 163.25 1.14 100.10

avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 0.25 24.50 0.00 75.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cciss/c0d1 0.00 0.00 8.00 880.00 256.00 7040.00 8.22 143.06 162.75 1.13 100.10

Maria does not make use of bulk_insert_buffer_size, unfortunately, when TRANSACTIONAL = 1. It does when TRANSACTIONAL = 0 however. It also doesn’t use something like InnoDB’s Insert Buffer, so it’s clear that there is probably some way to go when it comes to bulk inserts within Maria for the TRANSACTIONAL mode.

Maria does support concurrent inserts with TRANSACTIONAL = 1, however this is disabled when the table has an AUTO_INCREMENT, (or FULLTEXT/GIS indexes) – so that makes this benchmark difficult from that respect too.

I’d like to restart this benchmark in a few ways:

TRANSACTIONAL = 0, single threaded, default checkpoint interval and block size, default auto_inc

The IO overhead for the log files (on cciss/c0d0 above) was not huge, so it will be interesting to see how this affects things (I’ll report back). This should show how just the new page cache works out as well.

TRANSACTIONAL = 1, 4 threads, default checkpoint interval and block size, default auto_inc

There’s only 4 CPUs on the system, so this will certainly max those out – but I’d like to see the performance differences with filling the cache as well, with concurrent INSERT still disabled.

TRANSACTIONAL = 1, 4 threads, default checkpoint interval and block size, synthetic auto_inc

This would really show how quickly concurrent inserts could start filling the cache concurrently. Sadly I think once the cache is full we will continue to see issues at least for now though.

Maybe I’ll get to those in my copious free time, but thought I’d at least post some initial findings on a “benchmark gone wrong” for amusement factor. :)