MySQL system status snapshots using the MySQL Enterprise Monitor Agent

MySQL Enterprise Monitor collects a huge amount of information about MySQL and the systems that it runs on. To do this, it employs an “Agent” to collect these statistics.

This can either sit locally to the database server, or on a remote host – perhaps even the same host as the Dashboard server if you decide you don’t want to distribute it to many boxes – and checks when it is local or remote, to decide whether to collect OS statistics or not. A single agent can monitor either a single or multiple instances from the same process.

Not all of the information that it can collect is fully exposed within the UI yet however (there are so many things to do lately!), so some people don’t realize the kind of data that we can gather on top of the default that is exposed. Another thing, that is even more non-obvious, is that the MEM Agent also collects all of the data for the various things that it has scheduled to run (either statements against MySQL, or calls to SIGAR for abstracted OS data, or data from MySQL Proxy for Query Analysis), even if it isn’t actually required to send all of that data back to the Dashboard server.

It keeps all of this as an internal “inventory”, in what we term the “item hash”. A few months ago, whilst trying to debug some memory issues, Jan Kneschke (of MySQL Proxy and lighttpd fame) implemented a way to dump this item hash in full, using YAML, to a file local to the agent – by sending a SIGUSR2 signal to the running process:

 
$ killall -USR2 mysql-monitor-agent

The agent then dumps it’s item hash structure to a file in the /tmp directory, with the date and time in the name, such as /tmp/mysql-monitor-agent-items.dump-20110414-124656.txt.

I’d been playing with this recently, and realized that this also allows you to maintain a local set of snapshots of performance and configuration of the entire system on a regular basis as well. Having the MEM UI is wonderful, but sometimes it’s handy to have something to {grep/sed/awk/your favorite} on the shell of the system you are trying to diagnose as well.

So a quick cron job later:

$ crontab -l
* * * * * killall -USR2 mysql-monitor-agent
$ ll /tmp/mysql-monitor-agent-items.dump*
-rw-rw-rw-  1 mark  wheel  88332 14 Apr 13:05 /tmp/mysql-monitor-agent-items.dump-20110414-130500.txt
-rw-rw-rw-  1 mark  wheel  88331 14 Apr 13:06 /tmp/mysql-monitor-agent-items.dump-20110414-130600.txt
-rw-rw-rw-  1 mark  wheel  88332 14 Apr 13:07 /tmp/mysql-monitor-agent-items.dump-20110414-130700.txt
-rw-rw-rw-  1 mark  wheel  88332 14 Apr 13:08 /tmp/mysql-monitor-agent-items.dump-20110414-130800.txt
...

Here’s an example of one of the above files, which is monitoring a stand alone 5.5.10 instance on my macbook.

The file starts off with some info on the agent process itself, followed by the output of various SQL statements that we run (those in the items-mysql-monitor.xml or custom.xml files), and then the main data collection items for the various in-built collectors. To point out some of the interesting parts, you can get:

  • All of the parsed variables from SHOW ENGINE INNODB STATUS
  •  "innodbstatus": 
        attributes:
          "innodb_bp_add_alloc": int
          "innodb_bp_created_per_sec": float
          "innodb_bp_db_pages": int
          "innodb_bp_dictionary_alloc": int
    ...
        instances:
          "5a41ac04-115d-448f-a0ca-144eeb5ac8b5": 
    ...
            values:
              "innodb_bp_add_alloc": 0
              "innodb_bp_created_per_sec": 0.000000
              "innodb_bp_db_pages": 191
              "innodb_bp_dictionary_alloc": 63928
    
  • SHOW GLOBAL STATUS
  •  
      "status": 
        attributes:
    ...
          "Bytes_received": int
          "Bytes_sent": int
    ...
        instances:
    ...
            values:
    ...
              "Bytes_received": 6073208
              "Bytes_sent": 70190554
    
  • SHOW GLOBAL VARIABLES
  •  
      "variables": 
        attributes:
          "auto_increment_increment": int
          "auto_increment_offset": int
    ...
        instances:
    ...
            values:
              "auto_increment_increment": 1
              "auto_increment_offset": 1
    
  • SHOW PROCESSLIST
  •  
      "processlist": 
    ...
        instances:
    ...
            values:
              "Command": "Query"
              "Host": "localhost:51682"
              "Id": 4
              "Info": "SHOW /*!40000 FULL */ PROCESSLIST"
              "State": undefined
              "Time": 0
              "User": "msandbox"
              "db": undefined
    ...
    
  • CPU details (the counters are CPU ticks), for each CPU in the system (counting from 0 upwards)
  •  
    "os": 
      "cpu": 
        attributes:
          "cpu_cache_size": int
          "cpu_idle": int
    ...
        instances:
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}.0": 
    ...
            values:
              "cpu_cache_size": 6144
              "cpu_idle": 445505030
    ...
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}.1": 
    ...
            values:
              "cpu_cache_size": 6144
              "cpu_idle": 442250880
    ...
    
  • Disk IO information
  •  
      "disk": 
        attributes:
          "disk_bytes_read": int
          "disk_bytes_written": int
    ...
        instances:
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}./dev/disk0s2": 
    ...
            values:
              "disk_bytes_read": 43988611584
              "disk_bytes_written": 101020259328
    ...
          "ssh:{86:17:98:e8:aa:c5:55:11:f8:a4:76:07:78:fb:5a:1e}./dev/disk2s3": 
    ...
            values:
              "disk_bytes_read": 12241350656
              "disk_bytes_written": 18404140032
    

    Without even showing the output you can get from SHOW SLAVE STATUS, SHOW MASTER STATUS, SHOW MASTER LOGS, mountpoint space usage information, memory usage, network interface information, load averages, OS details, all the info from the various SQL statement results, Query Analysis data (if enabled) – all in a single snapshot of data.

    So now if you’re interested how something has been running over time, and you can’t get to a MEM UI for some reason – or want to do some local analysis using your own scripts:

     
    $ grep -ih com_show_variables /tmp/mysql-monitor-agent-items* | grep -v int
              "Com_show_variables": 2781
              "Com_show_variables": 2782
              "Com_show_variables": 2783
              "Com_show_variables": 2784
              "Com_show_variables": 2785
    ...
    

    Or if somebody tells you some hiccup happened at 12:46 today – you can go and take a look at this file as well!

    Most of the data is collected in one minute intervals – especially all of the OS data, and the output of SHOW [VARIABLES | STATUS | INNODB STATUS | MASTER STATUS | SLAVE STATUS]. Others like some of the SQL statements are run on different intervals (some up to 6 hours or more apart), so these may not be super up to date for the current snapshot.

    SHOW PROCESSLIST by default only runs once on startup and is not scheduled for anything further by the MEM UI yet, so that will not auto-update. You can add a rule that uses one of the items from it’s output to force this however, such as mysql:processlist:time, using a dummy “instance” for the variable such as 0 (the instance in the processlist case is the connection id from SHOW PROCESSLIST), so that no instances are ever pulled back to MEM, wasting space there. Schedule that dummy rule for once a minute as well, and you should get the full output of SHOW PROCESSLIST in your item dump as well.

    4 comments

    1. RE: “so some people donâ??t realize the kind of data that we can gather on top of the default that is exposed.”

      Isn’t this a license violation? I did find a way to gather additional data AND display additional graphs, but I scrapped it once I interpreted the license as prohibiting this.

    Got something to say?