Oracle Buys Times Ten

http://www.oracle.com/timesten/index.html

This seems like an interesting move from Oracle. Times Ten’s Cache is an in-memory cache type database that works very much like MySQL Cluster – with read/write functionality – along with the ability to “divert” to a backend (Oracle) database when the data needed is not memory resident.

It will be interesting over the next few months to see what Oracle does with this. On the surface it’s quite obvious that it’s an attempt to get their foot in the door with the likes of telcos and financial services companies, for their “frontline” systems, where they are rather lacking due to the cumbersome nature of the beast for “real time” data.

So you may have read this through the MySQL category on my blog, and wondered “How the hell does this relate to MySQL”? As any seasoned MySQL user would know, MySQL Cluster has been around for a while now, and whilst it’s great for “smaller” apps – that require very fast “real time” access, but don’t necessarily store huge amounts of data – many have noted the serious lack of any disk based “second stage” storage for older data that may not be used very often. MySQL Cluster is already geared towards the areas where Oracle is trying to get to right now through acquisitions. MySQL Cluster however has fallen down on the VLDB back end requirements of some of these systems.

Until 5.1 it seems. Today I saw a commit to the internals email list from Pekka Nousiainen that refers to “Pgman” – which seems to handle buffer pool caching and flushing from/to disk based storage – for the NDB storage engine. I’ll save you all the code, but this comment seems to sum it up fairly well for now:

+ * PGMAN
+ *
+ * PAGE ENTRIES AND REQUESTS
+ *
+ * Central structure is “page entry”. It corresponds to a disk page
+ * identified by file and page number (file_no, page_no).
+ *
+ * A page entry is created by first request for the disk page.
+ * Subsequent requests are queued under the same page entry.
+ *
+ * A new entry must wait to be “bound” to an available buffer page
+ * (called “real page” here). If the disk page is not “new” (empty),
+ * the entry must also wait to be “mapped” via “pagein” from disk.
+ *
+ * Entries are released on demand when page requests arrive for unknown
+ * pages. Release candidates are entries which point to a disk page
+ * which is “clean and not used”. They are ordered by some variant of
+ * least recently used (LRU).
+ *
+ * A background clean-up process makes “dirty” pages clean via “pageout”
+ * to disk. UNDO log entries are first flushed up to the maximum log
+ * sequence number (LSN) of the page. This is called write ahead
+ * logging (WAL). The clean-up process prefers “dirty and not used”
+ * pages and lower LSN values.
+ *
+ * A local check point (LCP) performs complete pageout of dirty pages
+ * since given LSN. It needs a stable entry list to iterate over.
+ *
+ * Page entries are put on ordered lists accordingly:
+ *
+ * The “main” list contains all entries. It is ordered by its unique
+ * key (file_no, page_no). It is used to look up page entries. LCP
+ * iterates over it to cover all candidate pages.
+ *
+ * Each entry also belongs to exactly one “sublist”. These drive page
+ * processing. The sublist is determined by page “state”. The state
+ * contains page properties such as discussed above.

Great!

4 thoughts on “Oracle Buys Times Ten”

  1. I agree this Oracle buy is very important event – it leads Oracle to be more “real-time” database which is very important new market. I am trying to understand how it could be done for MySQL.

    In our project we use MySQL to store the data from sensors. When our users are looking for the data, they mostly looking into last 10 seconds, not for the whole data which were acquired, but they want to do it very quickly. Later they also want to analyze all data which were acquired, but the speed is no longer important.

    My idea was that if MySQL could have a limited length table in memory where the data are stored
    when they are coming. The “last 10 seconds” queries could be run against that short table which could
    be like a “window” into the latest data. However, once the table is full (it has limited size) old
    data will be moved into usual MySQL table on first in – first out order.

    Here the “fast” table works like a stack where data are immediately available. The “usual” or
    “slow” tables would be like the elements of the stack which were pushed down. Here is the
    picture:

    “slow” table
    with unlimited
    size

    +------------+--           /--------->
    |            |            /                     
    |            |           /                     
    |            |
    |            |   
    |------------|
    

    Such type of the database could be successfully used as “sensor” database, where
    you have to run constant queries over some “window” of data and you have to do it
    quickly.

    Sensor databases are new type of database which is growing quickly. Recently Michael
    Stonebraker started up a company, StreamBase Systems, which does exactly that.

    I wonder how difficult to do similar things with MySQL? Do you have to write new engine?

    Regards,

    Jacob Nikom

  2. Hi Jacob,

    Interesting, I’ve never actually come across “sensor databases” in the past.

    What you describe above however lit a little light in the back of my head when you talked about “windows” – that being partitioning. The likes of Oracle have had this in place for quite some time (since 8i I believe) – and MySQL have this in development right now – with it slated for 5.1.

    I’m not sure if you’re familiar with partitioning, so to give you a quick idea, it’s a way of breaking up a table in to “windows” / “partitions” of data, the most common form being “RANGE” partitioning of dates, such as creating partitions of tables for quarterly financial periods, or having this years financial details in 4 quater partitions, and then the previous 3 years worth of data partitioned by year etc.

    One of the interesting things I have noted with the MySQL implementation of partitioning is that it seems like they are going to attempt to allow different storage engines for particular partitions. Here is the inital syntax description for partitioned tables, taken from http://forums.mysql.com/read.php?106,20743:

    CREATE TABLE …. (column-list ..)
    PARTITION BY
    KEY ‘(‘ ( column-list ) ‘)’
    | RANGE ‘(‘ ( expr ) ‘)’
    | LIST ‘(‘ ( expr ) ‘)’
    | HASH ‘(‘ ( expr ) ‘)’
    [PARTITIONS (num) ]
    [SUBPARTITION BY
    KEY ‘(‘ ( column-list ) ‘)’
    | HASH ‘(‘ ( expr ) ‘)’
    [SUBPARTITIONS (num) ]
    ]
    [ ‘(‘
    ( PARTITION logical-name
    [ VALUES LESS THAN ‘(‘ ( expr | MAX_VALUE ) ‘)’ ]
    [ VALUES IN ‘(‘ (expr)+ ‘)’ ]
    [ TABLESPACE tablespace-name ]
    [ [ STORAGE ] ENGINE [ ‘=’ ] storage-engine-name ]
    [ NODEGROUP nodegroup-id ]
    [ ‘(‘
    ( SUBPARTITION logical-name
    [ TABLESPACE tablespace-name ]
    [ [ STORAGE ] ENGINE [ ‘=’ ] storage-engine-name ]
    [ NODEGROUP nodegroup-id ]
    )+
    ‘)’
    )+
    ‘)’
    ]

    Now, with this in mind, it seems like in the future (of course, it’s not going to be out for a while) you may be able to create a “current” (MAX_VALUE) partition that uses the HEAP storage engine, whilst the “history” data can be stored in MyISAM for example.

    With partitions you can MERGE partitions as well – so it will be interesting to see how MySQL’s implementation of partitions handles the adding of new MAX_VALUE partition, as well as merging the old MAX_VALUE partition to lower range partitions.

    In the Oracle world it’s quite common to see “rolling window” paritions, where automatic scripts, or internal jobs are run to add a new partition to handle the next “range”.

    I had a chat with one of the guys over at MySQL about sensor database, and he mentioned that one of the main issues with sensor databases is getting the data in to the database as fast as possible – so things like indexes on the table are usually not used unless absolutely necessary. With partitions there are two basic types of indexes “local” (to the partition) and “global” (the entire table), so you should be able to create local indexes on the old data, whilst leaving the HEAP table without one.

    This is all entirely theoretical at the moment from the MySQL side, however 5.0 will probably be out in the not too distant future, so the 5.1 alpha preview releases will probably be along shortly after that. If you take a look on the Partitioning forum you can see that Mikael already has a fair bit of the partitioning functionality in place, so it shouldn’t be too long until we can start taking a look to see how it fits. Hopefully the overhead of altering the table to add an extra partition doesn’t fall in to the old “rebuild the entire table” trap – thus making it entirely unsuitable for a high streaming “sensor database”.

    One other thing to note is that cluster may also be a valid option, with the NDB storage engine already being an in-memory type. Cluster with Replication would probably be even better – with a periodic delete from the cluster – as long as replication could keep up. Again, apparently, cluster and replication working together is also being talked about for MySQL 5.1 as well.

    I’m not sure that this could be done with writing your own storage engine specifically, it would be interesting to see a storage engine that reacted like this though!

    Thanks for the great comment, it prompted me to learn a little about sensor databases!

    Mark

  3. Another way to describe what this sensor database does is a queue.
    A queue has random access. It can be filled very fast and have a job that marks rows for deletion/writing to disk,
    and another job that consumes such marked rows.

    The constraint on the sensor database would seem to be the size of buffer available in memory. Given enough memory to capture the
    volume of data from the sensors, it should run adequately fast.

    Another Oracle technology that may provide a useful pattern is RAC. This moves blocks of memory between database nodes in a cluster.
    It’s a hardware/software solution for high availability, but may be interesting for sensor capture.

Comments are closed.