PERFORMANCE_SCHEMA hits Prime Time!

January 14th, 2010

I’ve talked about PERFORMANCE_SCHEMA in the past - almost a year ago now.

Back then the feature was just coming together in to something useable, and was on the cusp of moving towards code review. It entered code review, and went around, and around for 9 months, whilst various refinements were made.

Never the less, Marc Alff persevered (much respect!), and yesterday pushed his final merge in to the mysql-next-mr bzr tree. PERFORMANCE_SCHEMA is now awaiting the next milestone release, and will be a part of the next GA release of MySQL!

We have the initial documentation ready:

http://dev.mysql.com/doc/performance-schema/en/index.html

This first round adds the infrastructure to take monitoring of the MySQL Server to the next level, initially adding in instrumentation for sync points (mutexes, rw locks, etc.) and file IO, in the SQL layer, and most of the default storage engines (all those controlled by MySQL/Sun).

Here’s a shout out to the other storage engine developers - we’d love you to start looking at instrumenting your own engines as well. Ask away on the internals@ list - I’m sure Marc will be more than willing to help.

And now that we have the above in - what’s next? It’s a good question! Here’s a list of the major outstanding worklogs:

WL#4674 PERFORMANCE_SCHEMA Setup For Actors
WL#4895 PERFORMANCE_SCHEMA Instrumenting Table IO
WL#4896 PERFORMANCE_SCHEMA Instrumenting Net IO
WL#4878 PERFORMANCE_SCHEMA Trace

Personally, I’m voting for WL#4895 next (well, I’d like to see InnoDB instrumentation too!), but you can vote for what ever you are most interested in via the votes in Worklog as well.

Thanks for all the hard work Marc!

Mark Leith General ,

Scientists say dolphins should be treated as ‘non-human persons’

January 4th, 2010

Being an employee of MySQL (past and present), every time I see something related to dolphins it catches my eye - and I just came across this:

Scientists say dolphins should be treated as ‘non-human persons’

The treatment of dolphins has long been an issue for me, I utterly despise some of the things I have seen, and read about. MySQL’s dolphin - Sakila - has always been seen as a symbol of freedom. This is marred by reality, hopefully this will change:

“The researchers argue that their work shows it is morally unacceptable to keep such intelligent animals in amusement parks or to kill them for food or by accident when fishing. Some 300,000 whales, dolphins and porpoises die in this way each year.”

Some other things in the article also ring true for me, in other ways. :)

“The studies show how dolphins have distinct personalities, a strong sense of self and can think about the future.”

“It has also become clear that they are “cultural” animals, meaning that new types of behaviour can quickly be picked up by one dolphin from another.”

“Other research has shown dolphins can solve difficult problems, while those living in the wild co-operate in ways that imply complex social structures and a high level of emotional sophistication.”

FWIW, I’ve donated to both http://www.orcaweb.org.uk/ and http://www.wdcs.org/ in the past, they are both wonderfully worthy causes.

Mark Leith General ,

Grouping by Arbitrary Time Ranges (Graphing What You Can See)

October 30th, 2009

First, the back story. One of the MEM developers asked me today about an interesting problem:

We have graphs in MEM that can be generated against an arbitrary time range in the UI - even for a year or more if the data is not purged. Currently MEM does not do any kind of rolling up of the data (in an RRD style), and pulls graph data from each agent/instance on a 1 minute interval. So if you wanted to pull, for instance, the last 3 months worth of data in to a graph - the server back end basically goes back to the database and requests all of the rows - all ~43,829 of them, oh, and that’s for each series - and then calculate deltas on the server side if need be (we store raw values), and stuffs the data in to a graphing library to draw the graph.

Further, graphs are only of a limited (but adjustable) size on the screen - I run MEM with a 900 pixel wide graph personally - so there’s only ~900 points (maybe a few less, with the axis etc. being shown) that can be filled with data.

Trying to fill 900 points with ~43,000 points is an exercise in futility. Damn those pixels. Why doesn’t everybody have a cinema in their ops centers? Filling up your memory with all that data on the app server side is also another issue.

So we’ve been looking at various ways of a) rolling this data in storage, and b) only selecting what can be shown on the screen if possible. The MEM dev hit me up about the latter today, and basically asked - “Given a time range, and a number of pixels, how we can we group a set of rows together to get an aggregated result that only returns the same number of rows as we have pixels?”

Math to the rescue!

Luckily we store the timestamps for the data we collect in a BIGINT column - we store a unix style timestamp with millisecond resolution times. If you’re not doing this, then you should convert your DATETIME / TIMESTAMP etc. to an integer (with UNIX_TIMESTAMP) first.

So I started off with the three known constants:

SET @start_time = 1254325343000; // About a month ago
SET @end_time = 1256908959000; // About now
SET @pixels = 900;

Now, to get the interval that you need to group, you have to first get the full integer range you are dealing with, and then divide it by the number of pixels:

SET @intervals = ((@end_time - @start_time) / @pixels);

Now that we know the interval size that we are going to GROUP together, how do we group it? Math to the rescue again - to generate the time slices you round the whole timestamp number to the nearest interval, and then convert it back again, basically:

FLOOR( (timestamp_col) /  @intervals) *  @intervals

To generate the range, I used a little CONCAT magic (and note that you add one before converting back for the end of the range), and then GROUP BY the output, here’s an example based on the MEM data set - which stores it’s timestamp in “end_time”:

SELECT instance_attribute_id, 
       SUM(value) sum_value,
       MIN(value) min_value,
       MAX(value) max_value,
       AVG(value) avg_value,
       COUNT(value) agg_points,
       CONCAT( ROUND( FLOOR( (end_time) /  @intervals) *  @intervals), ' - ', 
               ROUND( (FLOOR( (end_time) /  @intervals ) + 1) *  @intervals)) AS time_range
  FROM dc_ng_long_now 
 WHERE instance_attribute_id IN (73, 76, 77, 79)  
   AND end_time BETWEEN @start_time AND @end_time
 GROUP BY instance_attribute_id, time_range

The above is what could be used to generate the CPU Usage graph (the instance ID’s are each of the series) in the MEM UI, for a month of data. Some sample output:

+-----------------------+----------------+--------------+--------------+-------------------+------------+-------------------------------+
| instance_attribute_id | sum_value      | min_value    | max_value    | avg_value         | agg_points | time_range                    |
+-----------------------+----------------+--------------+--------------+-------------------+------------+-------------------------------+
|                    73 |   884176993340 | 442088260860 | 442088732480 | 442088496670.0000 |          2 | 1254322602524 - 1254325473209 | 
|                    73 | 21220811440010 | 442089201580 | 442111284920 | 442100238333.5417 |         48 | 1254325473209 - 1254328343893 | 
|                    73 | 21221895300670 | 442111747690 | 442133893930 | 442122818763.9583 |         48 | 1254328343893 - 1254331214578 | 
|                    73 | 21222981255310 | 442134365100 | 442156523250 | 442145442818.9583 |         48 | 1254331214578 - 1254334085262 | 
|                    73 | 20781887896940 | 442156993950 | 442178660650 | 442167827594.4681 |         47 | 1254334085262 - 1254336955947 | 
|                    73 | 21225129132600 | 442179133270 | 442201248030 | 442190190262.5000 |         48 | 1254336955947 - 1254339826631 | 
|                    73 | 21226213026150 | 442201720960 | 442223827720 | 442212771378.1250 |         48 | 1254339826631 - 1254342697316 | 
|                    73 | 21227293251850 | 442224296570 | 442246191720 | 442235276080.2083 |         48 | 1254342697316 - 1254345568000 | 
|                    73 | 21228371431870 | 442246663300 | 442268791290 | 442257738163.9583 |         48 | 1254345568000 - 1254348438684 | 
|                    73 | 21229455531730 | 442269262370 | 442291404770 | 442280323577.7083 |         48 | 1254348438684 - 1254351309369 | 
|                    73 | 20788226878750 | 442291875550 | 442313515300 | 442302699547.8723 |         47 | 1254351309369 - 1254354180053 | 
|                    73 | 21231576621020 | 442313990190 | 442333971060 | 442324512937.9167 |         48 | 1254354180053 - 1254357050738 | 
|                    73 | 21232474653460 | 442334346060 | 442352121880 | 442343221947.0833 |         48 | 1254357050738 - 1254359921422 | 
|                    73 | 21233360433480 | 442352500510 | 442371671120 | 442361675697.5000 |         48 | 1254359921422 - 1254362792107 | 
|                    73 | 21234392588940 | 442372144460 | 442394227610 | 442383178936.2500 |         48 | 1254362792107 - 1254365662791 | 
|                    73 | 21235472883050 | 442394698990 | 442416721710 | 442405685063.5417 |         48 | 1254365662791 - 1254368533476 | 
|                    73 | 21236557207050 | 442417196420 | 442439347780 | 442428275146.8750 |         48 | 1254368533476 - 1254371404160 | 
|                    73 | 20795177199970 | 442439809050 | 442461261420 | 442450578722.7660 |         47 | 1254371404160 - 1254374274844 | 
|                    73 | 21238693926960 | 442461733210 | 442483851610 | 442472790145.0000 |         48 | 1254374274844 - 1254377145529 | 
|                    73 | 21239778454900 | 442484323030 | 442506442450 | 442495384477.0833 |         48 | 1254377145529 - 1254380016213 |

We can see how many rows have actually been aggregated in to the interval too (scroll to the right) - so that we can decide if there were enough intervals during the first row to be a good enough average (sometimes the first and last intervals may not be, and should perhaps be discarded), and the the interval that each row is computed for, i.e “1254325473209 - 1254328343893″

And a little verification:

 
mysql> SELECT count(*) 
    ->   FROM dc_ng_long_now 
    ->  WHERE instance_attribute_id IN (73, 76, 77, 79) 
    ->    AND end_time BETWEEN @start_time AND @end_time;
+----------+
| count(*) |
+----------+
|   172240 | 
+----------+
1 row IN SET (0.12 sec)
 
mysql> SELECT count(*) FROM(
    -> SELECT instance_attribute_id, 
    ->        MIN(value),
    ->        MAX(value),
    ->        AVG(value),
    ->        CONCAT( ROUND(FLOOR( end_time /  @intervals) *  @intervals), ' - ', 
    ->                ROUND(( FLOOR( end_time /  @intervals ) + 1 ) *  @intervals)) AS time_range
    ->   FROM dc_ng_long_now 
    ->  WHERE instance_attribute_id IN (73, 76, 77, 79) 
    ->    AND end_time BETWEEN @start_time AND @end_time
    ->  GROUP BY instance_attribute_id, time_range
    -> ) s1;
+----------+
| count(*) |
+----------+
|     3604 | 
+----------+
1 row IN SET (1.01 sec)

Note, I’m getting 4 sets of 901 rows, in the above case I’m getting roughly a 48 to 1 compression ratio for a month of data - going from 172,240 rows to 3604!.

For MEM, there’s still work to do, but we’re on a roll now! Hopefully graph performance will be greatly improved in coming versions! :)

Mark Leith MySQL , ,

MySQL University Session - Customizing MySQL Enterprise Monitor

September 9th, 2009

Just a quick note to let the masses know that I will be hosting a MySQL University session tomorrow, based on the talk that I gave at the MySQL UC in April - Customizing MySQL Enterprise Monitor.

It will be at 14:00 UTC - so if you are at all interested in MEM, and want to know how to bend it towards your needs, then come along! I’ll see you there.

EDIT: OOOPS, it’s 13:00 UTC

Mark Leith General

SHOW RELAYLOG EVENTS

June 26th, 2009

I reported a bug about SHOW BINLOG EVENTS not working with relay logs a couple of years ago - Bug #28777.

It’s now been fixed in MySQL 5.4, by adding a new SHOW statement - SHOW RELAYLOG EVENTS.

The replication team are really hammering through things at the moment - Kudos!

Mark Leith MySQL, MySQL 5.4 , ,

Past Presentations Now Online

June 16th, 2009

I uploaded all of my past presentations to Slideshare recently, and realized that I hadn’t actually posted some of these on my blog in the past as well.

So I’ve created a new Presentations Page that has all of these together now.

It’s kind of funny to see the “MySQL for Oracle DBAs” presentation again - a lot has changed since 2006!

In any case, enjoy if you haven’t seen them - give them a look over if interested, and feel free to post comments or questions on the page!

Mark Leith MySQL , , , , ,

MySQL Support Blogs

May 29th, 2009

As planetmysql moved towards “team blogs” and “individual authors”, we in MySQL Support wanted to show our team affiliation as well.

Dups took this on and created a new MySQL Support Blog on dev.mysql.com - thanks Dups!

picture-16

This aggregates all of our individual blogs - so if you click the blog title in planetmysql, you will be taken to our own blogs. Clicking our names takes you to our aggregated blog:

planetmysql blog snippet

Mark Leith MySQL ,

Goodbye MySQL 6.0!

May 22nd, 2009

The announcement for 6.0.11 went out not along ago. 

Within it:

6.0.11 will be the last release of 6.0. After this we will be
transitioning into a New Release Model for the MySQL Server

http://forge.mysql.com/wiki/Development_Cycle

The goal of this transition is to enable more frequent and timely
releases of the MySQL Server.

The planned milestone for September 2009 will probably include most of
the features from 6.0. More detail on which, will be communicated
before end of June. Features not in this milestone (e.g. Online Backup,
Falcon, Connection thread pooling, and 4 Byte UTF Support) are planned
for future milestones when they stabilize.

We’re still ironing out the details internally, but please do read up on the above wiki about our current plans for the development schedule!

We’ve still got quite a road ahead to get things settled, but to my mind, we’re on the right path.

Thoughts?

Mark Leith MySQL

InnoDB Monitor Docs Extended

May 14th, 2009

After I blogged about the InnoDB Table and Tablespace monitors, the docs guys got in touch with me, and asked if I’d mind the information being integrated in to the official documentation. “Of course not!” I said.

So Paul DuBois picked up the task, and set about integrating what I had in the docs, as well as sending the InnoDB developers and myself some extra questions for more detail, and adding that in too. 

Paul is finished with this for now - and now there are two new sections within the “SHOW ENGINE INNODB STATUS and the InnoDB Monitors” page of the manual:

Thanks for the work Paul!

Mark Leith MySQL , ,

innodb_stats_on_metadata / innodb_adaptive_hash_index

May 13th, 2009

It occurred to me at the MySQL UC that I hadn’t blogged about either innodb_status_on_metadata or innodb_adaptive_hash_index - Morgan Tocker convinced me to do a lightning talk at the end of the Percona conference about them - so I thought I’d do a quick write up on them here as well.
 
First - what’s the point in them? Being the guy that wrote the originating patches for both of them, I have a little background:

innodb_stats_on_metadata

InnoDB is fairly light weight when it gathers table and index statistics. In the default compiled in InnoDB for most users it will do 8 random(ish) dives in to the index and samples the page to try to ascertain the cardinality (selectivity) of the index for use by the optimizer. This being the case, not only is this done when you run ANALYZE TABLE on a table, it is also done during various other points:

The first is easy to digest.

The second and third both occur for each row change within InnoDB. row_update_statistics_if_needed() is called from row_insert_for_mysql()row_update_cascade_for_mysql(), and row_update_for_mysql().

The last in the list is interesting. In order to get an accurate(ish) count of the rows in the table, or the cardinality results etc. for the metadata commands, InnoDB runs the same procedure as an ANALYZE TABLE for all of the tables, because it doesn’t maintain accurate statistics in a table or index header somewhere like MyISAM (being MVCC getting an accurate row count is far more difficult for example).

This is great, if you run these commands periodically to check on their output yourself. Once you start trying to read these tables very often in a monitoring solution of any kind though, you will likely hit two distinct problems:

  1. God Damn that INFORMATION_SCHEMA access is SLLLOOOWWWWW
  2. If you are monitoring your queries, and their EXPLAIN plans, you may start to see “plan instability” - the optimizer randomly choosing different access paths for the queries

Behind door #1 there’s a couple of issues - we traverse the directories and table files in the  data directory for a lot of the information used in the output, as well as the ANALYZE that is done for InnoDB. We’ve done work in Worklog #3732 to optimize certain accesses to these INFORMATION_SCHEMA tables, but this didn’t solve the statistics issue.

The second issue is a little more subtle. You may have benchmarked your application statements in Dev and QA environments - but data sets change when they are in production. If table and index statistics are constantly being recalculated, there is a high probability that the optimizer will in fact choose different access paths for certain queries and tables where the rows that need to be scanned could be satisfied by more than one index.

Unless you are closely monitoring individual statements for their run time, access paths (EXPLAIN) etc. this is far more difficult to catch (often you will be seeing the “good” EXPLAIN plan a lot, and the “bad” one only surfaces every now and then too - so interactive debugging is made harder as well).

So, at the request of a pretty large SaaS customer, who was having both problems, innodb_stats_on_metadata was born - originally in Bug #26598. If you see any of the situations above, give it a try, and see how it works for you, as well.

Originally it was created as a non-dynamic variable, you could only set it on the command line or in an option file. It was altered by the InnoDB guys to be dynamic, and settable by SUPER privileged users. It’s documented in the InnoDB plugin documentation under the “Changes for Flexibility, Ease of Use and Reliability” chapter.

I’m on the fence about allowing this dynamically, as it still allows tools to change the setting as well (which often have the SUPER privilege to get at things like SHOW INNODB STATUS), when the DBA may have wanted this explicitly not allowed for all users.

Be good citizens tool vendors, and respect this setting please.

innodb_adaptive_hash_index

InnoDB has a really unique feature in the database world called “Adaptive Hash Indexes“. It monitors which rows are being accessed via index searches, and dynamically decides to cache those that are used the most often, by taking the BTREE index for the table the rows are from and converting it to a HASH index in memory, for quick row lookups.  Historically, this has performed really well (and still does today in most workloads).

Prior to the fix for the infamous Bug #15815 (the true start of our multi-core scalability jaunt, imho), it just hummed along with most people not noticing it around. Once we had the fix though, we started to see mutex contention in other places, in certain cases leading to InnoDB getting locked up, and stopping the server due to long semaphore waits - this happened within the code for adaptive hash indexes as well, and was reported as Bug #20358.

This has hit a pretty large number of users - many Support customers as well, and has had many engineer hours in trying to reproduce it reliably (without result, the bug is still “Open”, not “Verified”). As this started to hit a number of customers, I wrote a pretty simple patch for innodb_adaptive_hash_index to allow you to switch off the use of adaptive hash indexes altogether within InnoDB.

Now, for some this comes as a performance trade off - if you hit the above bug, it does allow you to work around it, but your application may benefit from Adaptive Hash Indexes too. If you favor reliability over performance, then use it - it fixes the problem (in a somewhat hackish way).

Yasufumi Kinoshita did a lot of great work, in Bug #29560 and Bug #26442, and there have been other issues found and fixed by the InnoDB team - Bug #39483 (closed at the time of writing) - and Mark over at Google - Bug #32149 (still verified at the time of writing). We hardly ever see Bug#20358 any more though, unless it’s on older versions.

This variable also gets a good mention in the InnoDB plugin documentation, under the ”Changes for Flexibility, Ease of Use and Reliability” again, and in it’s own section “Controlling Adaptive Hash Indexing“, because the variable has a nice side effect as well - it can also help to scale, specifically because of the kind of mutex contention that also lead to Bug #20358. 

The InnoDB manual mentions that you can monitor the contention via the SEMAPHORES section of SHOW INNODB STATUS by looking for “many threads waiting on an RW-latch created in btr0sea.c” - quite correctly.

You can also check out how much memory the hash index is using, and how well it’s being utilized, in the INSERT BUFFER AND ADAPTIVE HASH INDEX section:

-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 5, seg size 7,
16104 inserts, 16104 merged recs, 15379 merges
Hash table size 2212699, used cells 846306, node heap has 1574 buffer(s)
0.03 hash searches/s, 0.63 non-hash searches

The last two rows apply to the Adaptive Hash index. “Hash Table size” and “used cells” are the number of cells (array slots) assigned to the hash table (always a prime number), and the number of those which are used, respectively. “used cells” was removed as part of Bug #36941 because of it having perform issues by scanning the hash table to find the used cells - in 5.1.28 - so you may not see this on your server. 

“node heap has 1574 buffer(s)” is where you can find out how much memory the hash index is using - the number here is the number of pages that are allocated to the hash index (and a page is 16K in InnoDB by default). 

The last row gives the number of searches both in the adaptive hash index, as well as those not satisfied by it - which can give you a hit ratio for hash read efficiency. 

I’ve made a few graphs for MEM to track these, so they can be monitored over time to get a better picture of adaptive hash index usage, to give you a better tool to judge for yourself what the impact may be for disabling it (see below). I’ll push these up today, and after some testing on a number of other servers, you should hopefully see them in a new advisor update soon.

InnoDB Adaptive Hash Index graphs for MEM

InnoDB Adaptive Hash Index graphs for MEM

Mark Leith MySQL 5.1 ,