Grouping by Arbitrary Time Ranges (Graphing What You Can See)

First, the back story. One of the MEM developers asked me today about an interesting problem:

We have graphs in MEM that can be generated against an arbitrary time range in the UI – even for a year or more if the data is not purged. Currently MEM does not do any kind of rolling up of the data (in an RRD style), and pulls graph data from each agent/instance on a 1 minute interval. So if you wanted to pull, for instance, the last 3 months worth of data in to a graph – the server back end basically goes back to the database and requests all of the rows – all ~43,829 of them, oh, and that’s for each series – and then calculate deltas on the server side if need be (we store raw values), and stuffs the data in to a graphing library to draw the graph.

Further, graphs are only of a limited (but adjustable) size on the screen – I run MEM with a 900 pixel wide graph personally – so there’s only ~900 points (maybe a few less, with the axis etc. being shown) that can be filled with data.

Trying to fill 900 points with ~43,000 points is an exercise in futility. Damn those pixels. Why doesn’t everybody have a cinema in their ops centers? Filling up your memory with all that data on the app server side is also another issue.

So we’ve been looking at various ways of a) rolling this data in storage, and b) only selecting what can be shown on the screen if possible. The MEM dev hit me up about the latter today, and basically asked – “Given a time range, and a number of pixels, how we can we group a set of rows together to get an aggregated result that only returns the same number of rows as we have pixels?”

Math to the rescue!

Luckily we store the timestamps for the data we collect in a BIGINT column – we store a unix style timestamp with millisecond resolution times. If you’re not doing this, then you should convert your DATETIME / TIMESTAMP etc. to an integer (with UNIX_TIMESTAMP) first.

So I started off with the three known constants:

SET @start_time = 1254325343000; // About a month ago
SET @end_time = 1256908959000; // About now
SET @pixels = 900;

Now, to get the interval that you need to group, you have to first get the full integer range you are dealing with, and then divide it by the number of pixels:

SET @intervals = ((@end_time - @start_time) / @pixels);

Now that we know the interval size that we are going to GROUP together, how do we group it? Math to the rescue again – to generate the time slices you round the whole timestamp number to the nearest interval, and then convert it back again, basically:

FLOOR( (timestamp_col) /  @intervals) *  @intervals

To generate the range, I used a little CONCAT magic (and note that you add one before converting back for the end of the range), and then GROUP BY the output, here’s an example based on the MEM data set – which stores it’s timestamp in “end_time”:

SELECT instance_attribute_id, 
       SUM(value) sum_value,
       MIN(value) min_value,
       MAX(value) max_value,
       AVG(value) avg_value,
       COUNT(value) agg_points,
       CONCAT( ROUND( FLOOR( (end_time) /  @intervals) *  @intervals), ' - ', 
               ROUND( (FLOOR( (end_time) /  @intervals ) + 1) *  @intervals)) AS time_range
  FROM dc_ng_long_now 
 WHERE instance_attribute_id IN (73, 76, 77, 79)  
   AND end_time BETWEEN @start_time AND @end_time
 GROUP BY instance_attribute_id, time_range

The above is what could be used to generate the CPU Usage graph (the instance ID’s are each of the series) in the MEM UI, for a month of data. Some sample output:

+-----------------------+----------------+--------------+--------------+-------------------+------------+-------------------------------+
| instance_attribute_id | sum_value      | min_value    | max_value    | avg_value         | agg_points | time_range                    |
+-----------------------+----------------+--------------+--------------+-------------------+------------+-------------------------------+
|                    73 |   884176993340 | 442088260860 | 442088732480 | 442088496670.0000 |          2 | 1254322602524 - 1254325473209 | 
|                    73 | 21220811440010 | 442089201580 | 442111284920 | 442100238333.5417 |         48 | 1254325473209 - 1254328343893 | 
|                    73 | 21221895300670 | 442111747690 | 442133893930 | 442122818763.9583 |         48 | 1254328343893 - 1254331214578 | 
|                    73 | 21222981255310 | 442134365100 | 442156523250 | 442145442818.9583 |         48 | 1254331214578 - 1254334085262 | 
|                    73 | 20781887896940 | 442156993950 | 442178660650 | 442167827594.4681 |         47 | 1254334085262 - 1254336955947 | 
|                    73 | 21225129132600 | 442179133270 | 442201248030 | 442190190262.5000 |         48 | 1254336955947 - 1254339826631 | 
|                    73 | 21226213026150 | 442201720960 | 442223827720 | 442212771378.1250 |         48 | 1254339826631 - 1254342697316 | 
|                    73 | 21227293251850 | 442224296570 | 442246191720 | 442235276080.2083 |         48 | 1254342697316 - 1254345568000 | 
|                    73 | 21228371431870 | 442246663300 | 442268791290 | 442257738163.9583 |         48 | 1254345568000 - 1254348438684 | 
|                    73 | 21229455531730 | 442269262370 | 442291404770 | 442280323577.7083 |         48 | 1254348438684 - 1254351309369 | 
|                    73 | 20788226878750 | 442291875550 | 442313515300 | 442302699547.8723 |         47 | 1254351309369 - 1254354180053 | 
|                    73 | 21231576621020 | 442313990190 | 442333971060 | 442324512937.9167 |         48 | 1254354180053 - 1254357050738 | 
|                    73 | 21232474653460 | 442334346060 | 442352121880 | 442343221947.0833 |         48 | 1254357050738 - 1254359921422 | 
|                    73 | 21233360433480 | 442352500510 | 442371671120 | 442361675697.5000 |         48 | 1254359921422 - 1254362792107 | 
|                    73 | 21234392588940 | 442372144460 | 442394227610 | 442383178936.2500 |         48 | 1254362792107 - 1254365662791 | 
|                    73 | 21235472883050 | 442394698990 | 442416721710 | 442405685063.5417 |         48 | 1254365662791 - 1254368533476 | 
|                    73 | 21236557207050 | 442417196420 | 442439347780 | 442428275146.8750 |         48 | 1254368533476 - 1254371404160 | 
|                    73 | 20795177199970 | 442439809050 | 442461261420 | 442450578722.7660 |         47 | 1254371404160 - 1254374274844 | 
|                    73 | 21238693926960 | 442461733210 | 442483851610 | 442472790145.0000 |         48 | 1254374274844 - 1254377145529 | 
|                    73 | 21239778454900 | 442484323030 | 442506442450 | 442495384477.0833 |         48 | 1254377145529 - 1254380016213 | 

We can see how many rows have actually been aggregated in to the interval too (scroll to the right) – so that we can decide if there were enough intervals during the first row to be a good enough average (sometimes the first and last intervals may not be, and should perhaps be discarded), and the the interval that each row is computed for, i.e “1254325473209 – 1254328343893″

And a little verification:


mysql> select count(*) 
    ->   from dc_ng_long_now 
    ->  where instance_attribute_id in (73, 76, 77, 79) 
    ->    and end_time between @start_time and @end_time;
+----------+
| count(*) |
+----------+
|   172240 | 
+----------+
1 row in set (0.12 sec)

mysql> select count(*) from(
    -> select instance_attribute_id, 
    ->        MIN(value),
    ->        MAX(value),
    ->        AVG(value),
    ->        CONCAT( ROUND(FLOOR( end_time /  @intervals) *  @intervals), ' - ', 
    ->                ROUND(( FLOOR( end_time /  @intervals ) + 1 ) *  @intervals)) AS time_range
    ->   from dc_ng_long_now 
    ->  where instance_attribute_id in (73, 76, 77, 79) 
    ->    and end_time between @start_time and @end_time
    ->  group by instance_attribute_id, time_range
    -> ) s1;
+----------+
| count(*) |
+----------+
|     3604 | 
+----------+
1 row in set (1.01 sec)

Note, I’m getting 4 sets of 901 rows, in the above case I’m getting roughly a 48 to 1 compression ratio for a month of data – going from 172,240 rows to 3604!.

For MEM, there’s still work to do, but we’re on a roll now! Hopefully graph performance will be greatly improved in coming versions! :)