innodb_stats_on_metadata / innodb_adaptive_hash_index

It occurred to me at the MySQL UC that I hadn’t blogged about either innodb_stats_on_metadata or innodb_adaptive_hash_index - Morgan Tocker convinced me to do a lightning talk at the end of the Percona conference about them – so I thought I’d do a quick write up on them here as well.

First – what’s the point in them? Being the guy that wrote the originating patches for both of them, I have a little background:

innodb_stats_on_metadata

InnoDB is fairly light weight when it gathers table and index statistics. In the default compiled in InnoDB for most users it will do 8 random(ish) dives in to the index and samples the page to try to ascertain the cardinality (selectivity) of the index for use by the optimizer. This being the case, not only is this done when you run ANALYZE TABLE on a table, it is also done during various other points:

The first is easy to digest.

The second and third both occur for each row change within InnoDB. row_update_statistics_if_needed() is called from row_insert_for_mysql()row_update_cascade_for_mysql(), and row_update_for_mysql().

The last in the list is interesting. In order to get an accurate(ish) count of the rows in the table, or the cardinality results etc. for the metadata commands, InnoDB runs the same procedure as an ANALYZE TABLE for all of the tables, because it doesn’t maintain accurate statistics in a table or index header somewhere like MyISAM (being MVCC getting an accurate row count is far more difficult for example).

This is great, if you run these commands periodically to check on their output yourself. Once you start trying to read these tables very often in a monitoring solution of any kind though, you will likely hit two distinct problems:

  1. God Damn that INFORMATION_SCHEMA access is SLLLOOOWWWWW
  2. If you are monitoring your queries, and their EXPLAIN plans, you may start to see “plan instability” - the optimizer randomly choosing different access paths for the queries

Behind door #1 there’s a couple of issues – we traverse the directories and table files in the data directory for a lot of the information used in the output, as well as the ANALYZE that is done for InnoDB. We’ve done work in Worklog #3732 to optimize certain accesses to these INFORMATION_SCHEMA tables, but this didn’t solve the statistics issue.

The second issue is a little more subtle. You may have benchmarked your application statements in Dev and QA environments – but data sets change when they are in production. If table and index statistics are constantly being recalculated, there is a high probability that the optimizer will in fact choose different access paths for certain queries and tables where the rows that need to be scanned could be satisfied by more than one index.

Unless you are closely monitoring individual statements for their run time, access paths (EXPLAIN) etc. this is far more difficult to catch (often you will be seeing the “good” EXPLAIN plan a lot, and the “bad” one only surfaces every now and then too – so interactive debugging is made harder as well).

So, at the request of a pretty large SaaS customer, who was having both problems, innodb_stats_on_metadata was born – originally in Bug #26598. If you see any of the situations above, give it a try, and see how it works for you, as well.

Originally it was created as a non-dynamic variable, you could only set it on the command line or in an option file. It was altered by the InnoDB guys to be dynamic, and settable by SUPER privileged users. It’s documented in the InnoDB plugin documentation under the “Changes for Flexibility, Ease of Use and Reliability” chapter.

I’m on the fence about allowing this dynamically, as it still allows tools to change the setting as well (which often have the SUPER privilege to get at things like SHOW INNODB STATUS), when the DBA may have wanted this explicitly not allowed for all users.

Be good citizens tool vendors, and respect this setting please.

innodb_adaptive_hash_index

InnoDB has a really unique feature in the database world called “Adaptive Hash Indexes“. It monitors which rows are being accessed via index searches, and dynamically decides to cache those that are used the most often, by taking the BTREE index for the table the rows are from and converting it to a HASH index in memory, for quick row lookups. Historically, this has performed really well (and still does today in most workloads).

Prior to the fix for the infamous Bug #15815 (the true start of our multi-core scalability jaunt, imho), it just hummed along with most people not noticing it around. Once we had the fix though, we started to see mutex contention in other places, in certain cases leading to InnoDB getting locked up, and stopping the server due to long semaphore waits – this happened within the code for adaptive hash indexes as well, and was reported as Bug #20358.

This has hit a pretty large number of users – many Support customers as well, and has had many engineer hours in trying to reproduce it reliably (without result, the bug is still “Open”, not “Verified”). As this started to hit a number of customers, I wrote a pretty simple patch for innodb_adaptive_hash_index to allow you to switch off the use of adaptive hash indexes altogether within InnoDB.

Now, for some this comes as a performance trade off – if you hit the above bug, it does allow you to work around it, but your application may benefit from Adaptive Hash Indexes too. If you favor reliability over performance, then use it – it fixes the problem (in a somewhat hackish way).

Yasufumi Kinoshita did a lot of great work, in Bug #29560 and Bug #26442, and there have been other issues found and fixed by the InnoDB team – Bug #39483 (closed at the time of writing) – and Mark over at Google – Bug #32149 (still verified at the time of writing). We hardly ever see Bug#20358 any more though, unless it’s on older versions.

This variable also gets a good mention in the InnoDB plugin documentation, under the “Changes for Flexibility, Ease of Use and Reliability” again, and in it’s own section “Controlling Adaptive Hash Indexing“, because the variable has a nice side effect as well – it can also help to scale, specifically because of the kind of mutex contention that also lead to Bug #20358. 

The InnoDB manual mentions that you can monitor the contention via the SEMAPHORES section of SHOW INNODB STATUS by looking for “many threads waiting on an RW-latch created in btr0sea.c” – quite correctly.

You can also check out how much memory the hash index is using, and how well it’s being utilized, in the INSERT BUFFER AND ADAPTIVE HASH INDEX section:

-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 5, seg size 7,
16104 inserts, 16104 merged recs, 15379 merges
Hash table size 2212699, used cells 846306, node heap has 1574 buffer(s)
0.03 hash searches/s, 0.63 non-hash searches

The last two rows apply to the Adaptive Hash index. “Hash Table size” and “used cells” are the number of cells (array slots) assigned to the hash table (always a prime number), and the number of those which are used, respectively. “used cells” was removed as part of Bug #36941 because of it having perform issues by scanning the hash table to find the used cells – in 5.1.28 – so you may not see this on your server.

“node heap has 1574 buffer(s)” is where you can find out how much memory the hash index is using – the number here is the number of pages that are allocated to the hash index (and a page is 16K in InnoDB by default).

The last row gives the number of searches both in the adaptive hash index, as well as those not satisfied by it – which can give you a hit ratio for hash read efficiency.

I’ve made a few graphs for MEM to track these, so they can be monitored over time to get a better picture of adaptive hash index usage, to give you a better tool to judge for yourself what the impact may be for disabling it (see below). I’ll push these up today, and after some testing on a number of other servers, you should hopefully see them in a new advisor update soon.

InnoDB Adaptive Hash Index graphs for MEM

InnoDB Adaptive Hash Index graphs for MEM

5 comments

  1. Bug #20358 is really fixed by other bug fixes, so I’ve added my work in progress “here’s how you find the bug you’re really affected by” guide to it.

  2. Mark. I would like to add an interesting observation I have made while trying to work around an issue we have had which was caused by a mixture of innodb_stats_on_metadata and badly skewed NULL data in a table. (We now have this turned off, EXCEPT when we have some cardinality go south which needs fixing).

    My observation suggests that your list of triggers for statistics calculation is not 100% correct.

    What I have noticed is… when innodb_stats_on_meta is ON, show indexes, show table status and select from information_schema.tables / information_schema.statistics table results in almost identical cardinality for certain problem table we have.

    In our case, we have a table which has badly skewed NULL data on a key we use for joins and is waiting for this bugfix http://bugs.mysql.com/bug.php?id=30423

    use / ignore index is not an option in our application for a couple of reasons that I wont bother going into.

    Every now and then, the random nature of the index dive will land us in a bunch of non null, well distributed values and thus give the optimizer a nice cardinality to use.

    However… this “full randomness” is ONLY triggered when you do a flush table. We never get the “good” cardinality by executing “show indexes” etc.

    I wonder if InnoDB is saving the index positions it dove into except when table is re-opened?

    I guess this is something for someone to look into…

    Cheers

    //Steve

    • Hey Steve,

      What you describe is actually correct – when innodb_stats_on_metadata is enabled, executing the SHOW INDEXES (and related statements) do not update the statistics for the table at all. The only time that they are updated then is in the first three items in the list above. Executing a FLUSH TABLES forces the table to be closed and reopened, which leads back to the first item in the list.

      The next two in the list come from row_update_statistics_if_needed():

      /* Calculate new statistics if 1 / 16 of table has been modified
      since the last time a statistics batch was run, or if
      stat_modified_counter > 2 000 000 000 (to avoid wrap-around).
      We calculate statistics at most every 16th round, since we may have
      a counter table which is very small and updated very often. */

      if (counter > 2000000000
      || ((ib_int64_t)counter > 16 + table->stat_n_rows / 16)) {

      dict_update_statistics(table);
      }

      There is one final thing that also forces statistics generation which I did not mention, and that is the obvious ANALYZE TABLE command, which I would recommend you use on the problem table(s) individually, instead of flushing all tables.

      We’re working on Bug#30423 right now, so hopefully we should have that resolved for you soon as well.

Got something to say?