I mentioned “fuzzy foreign key” matches in my post the other day, and thought that I’d give it a crack tonight. The aim of the fuzzy match is to try and find relationships when a) They are InnoDB and haven’t been created or b) Foreign keys are not avilable to the storage engine that is being used.

Most DBA’s will know their schemas pretty well, this however should be pretty handy for consultants or anyone starting fresh at a new job / on a new database etc.

It should be noted that this will only “fuzzy” match against a schema that follows strict naming conventions – Where the tablename and “id” is used for the primary key column, and the same name is used for the foreign key in the child table. For instance, given the following example tables:

mysql> desc company; +--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | company_id | int(11) | NO | PRI | NULL | auto_increment | | company_name | varchar(100) | YES | MUL | NULL | | +--------------+--------------+------+-----+---------+----------------+ mysql> desc contacts; +-------------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+----------------+ | contact_id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(50) | YES | | NULL | | | company_id | int(11) | YES | MUL | NULL | | | direct_line | varchar(20) | YES | | NULL | | +-------------+-------------+------+-----+---------+----------------+

Here’s what I came up with:

SELECT t1.table_name as parent_table,
t1.column_name as primary_key,
t2.table_name as child_table,
t2.column_name as foreign_key
FROM information_schema.columns t1
JOIN information_schema.columns t2 USING (column_name,data_type,table_schema)
WHERE t1.table_name != t2.table_name
AND t1.column_key = 'PRI'
AND t1.table_schema NOT IN ('information_schema','mysql')
ORDER BY t1.table_name

| parent_table | primary_key | child_table | foreign_key |
| company | company_id | contacts | company_id |


You can change the table_schema WHERE clause as required, or swap it out for “AND t1.table_name LIKE ‘wp%'” fpr example..
This makes use of the INFORMATION_SCHEMA.COLUMNS view.

Of course, there’s no guarentee this will work in all cases. People seem to have an in-built urge to use some of the most outrageously stupid names for their objects in so many cases…

Edit 21-06-05 Removed extra comma in second query (left from my own testing where I excluded extra db names) – Thanks inviso


Arjen Lentz (“Community Relations Manager” for MySQL AB) posted “The MySQL 5.0 Beta Challenge” on his blog today – so in the spirit of community I decided to give it a crack, by putting together some posts on the new features of MySQL 5.0 up here to my humble blog. Who knows – I may win a mug! Then I can have a list of ALL the people to bug over at MySQL AB!

I’m a DBA at heart, so whilst I’ll take a look at the new features that are available for developers, I’m going to start from the DBA tack, and take a look at INFORMATION_SCHEMA.

So what is the INFORMATION_SCHEMA? Those that have used other RDBMS’ in the past will recognize it as MySQL’s “Data Dictionary”. An RDBMS Data Dictionary is a collection of tables or views (that appear as tables), that utilize the database metadata to give easy access to information about the objects (tables, indexes, procedures, functions, views etc.) within the database, as well as information on things such security/privileges, performance statistics, auditing trails etc.

On installing MySQL 5.0, and issuing “SHOW DATABASES;” you will immediately see “information_schema” listed. The INFORMATION_SCHEMA is implemented to look like a database in it’s own right. However, it doesn’t conform with the conventional database privilege system. You can “USE” it, but you can’t run any type of DML against it. GRANT does seem to work against it, but attempting to run a DELETE results in an error.

mysql> use information_schema;
Database changed
mysql> show grants;
| Grants for mark@localhost                                            |
| GRANT USAGE ON *.* TO 'mark'@'localhost'                             |
| GRANT SELECT, DELETE ON `information_schema`.* TO 'mark'@'localhost' |
2 rows in set (0.00 sec)

mysql> delete from information_schema.tables;
ERROR 1288 (HY000): The target table tables of the DELETE is not updatable

“What good are they to me?”

Well, one question I’ve seen a lot around IRC is “Can I use the output of a SHOW command in a statement, such as a subquery?”. This mostly comes down to something to do with SHOW TABLES – getting databases sizes, table sizes etc. SHOW TABLES still works in 5.0, as do all other SHOW commands, however, INFORMATION_SCHEMA.TABLES now also parses the exact same output. Think of it of as view (more on these later) against the SHOW TABLES command. The great thing about this is we can now use conventional SQL to use these values – we no longer have to write external scripts to do what SUM() can do, for example. So here’s one for those guys on IRC that want to find out how big their databases are from SQL:

SELECT s.schema_name,
       CONCAT(IFNULL(ROUND((SUM(t.data_length)+SUM(t.index_length))/1024/1024,2),0.00),'Mb') total_size,
       CONCAT(IFNULL(ROUND(((SUM(t.data_length)+SUM(t.index_length))-SUM(t.data_free))/1024/1024,2),0.00),'Mb') data_used,
       CONCAT(IFNULL(ROUND(SUM(data_free)/1024/1024,2),0.00),'Mb') data_free,
       IFNULL(ROUND((((SUM(t.data_length)+SUM(t.index_length))-SUM(t.data_free))/((SUM(t.data_length)+SUM(t.index_length)))*100),2),0) pct_used,
       COUNT(table_name) total_tables
  FROM information_schema.schemata s
LEFT JOIN information_schema.tables t ON s.schema_name = t.table_schema
 WHERE s.schema_name != 'information_schema'
GROUP BY s.schema_name

*****************1. row **************
 schema_name: test
  total_size: 0.06Mb
   data_used: 0.06Mb
   data_free: 0.00Mb
    pct_used: 100.00
total_tables: 2
*****************2. row **************
 schema_name: flightstats
  total_size: 2950.14Mb
   data_used: 2949.71Mb
   data_free: 0.43Mb
    pct_used: 99.99
total_tables: 81

So now we can get the information we want, in the format we want. We can also JOIN the INFORMATION_SCHEMA tables to create better reports – such as above, if there are no actual tables in a database, they won’t be shown within INFORMATION_SCHEMA.TABLES, so to get a full report on all databases, we LEFT JOIN from INFORMATION_SCHEMA.SCHEMATA.

This also leads us in to getting helpful information on any object optimizations easily. Perhaps you have performed an archive of a lot of data, and want to reclaim freespace. You want to find all tables with > 40% of the datafile free, so that you can perform an OPTIMIZE TABLE against them:

SELECT table_name,
       ROUND(((t.data_length+t.index_length)-t.data_free)/(t.data_length+t.index_length)*100) pct_used
  FROM information_schema.tables t
 WHERE table_schema = 'flightstats'
HAVING pct_used < 60
ORDER BY table_name;</blockquote>

Let’s take this one step further – now we can actually use MySQL to generate the scripts that we’ve had to use an external scripting language to achieve. Let’s focus on another question often asked on IRC – “How do I drop all my tables, without dropping my database?” – previous to MySQL 5.0 the answer is “get table_name from the output of SHOW TABLES, and loop through this with your scripting language of choice, running a DROP TABLE command for each one”.

Now let’s use the INFORMATION_SCHEMA, and SQL to do the job for us:

SELECT CONCAT('DROP TABLE ',table_name,';')
  INTO OUTFILE '/sql/drop_tables.sql'
  FROM information_schema.tables
 WHERE table_schema = 'test';

Now all we have to do is “SOURCE /sql/drop_tables.sql;” and all of the tables in the test database will be gone, and we didn’t have to leave the mysql command line interface! Don’t try this on your production schemas kids!

Or to go back to our OPTIMIZE TABLE:

  INTO OUTFILE '/sql/optimize_tables.sql'
  FROM information_schema.tables
 WHERE table_schema = 'flightstats'
AND ROUND(((data_length+index_length)-data_free)/(data_length+index_length)*100) < 60;<

This makes life a lot easier! There are a number of other INFORMATION_SCHEMA tables, but I won’t go in to all of those now. We can use these to get information on grants, build grant scripts, get information about indexes, columns, foreign keys etc. For instance – it’s probably possible now to perform “fuzzy” foreign key matches – searching for columns that are named the same within different tables, that may not necessarily be the InnoDB engine. As I put together more scripts in readiness for 5.0, I’ll post them here, and to the other site that I run – (which I haven’t really started filling yet!! But it’s wiki! ;))

It should be noted that all users have access to the INFORMATION_SCHEMA. They will however only be able to see the objects for which they have some privileges on.

Oracle Buys Times Ten

This seems like an interesting move from Oracle. Times Ten’s Cache is an in-memory cache type database that works very much like MySQL Cluster – with read/write functionality – along with the ability to “divert” to a backend (Oracle) database when the data needed is not memory resident.

It will be interesting over the next few months to see what Oracle does with this. On the surface it’s quite obvious that it’s an attempt to get their foot in the door with the likes of telcos and financial services companies, for their “frontline” systems, where they are rather lacking due to the cumbersome nature of the beast for “real time” data.

So you may have read this through the MySQL category on my blog, and wondered “How the hell does this relate to MySQL”? As any seasoned MySQL user would know, MySQL Cluster has been around for a while now, and whilst it’s great for “smaller” apps – that require very fast “real time” access, but don’t necessarily store huge amounts of data – many have noted the serious lack of any disk based “second stage” storage for older data that may not be used very often. MySQL Cluster is already geared towards the areas where Oracle is trying to get to right now through acquisitions. MySQL Cluster however has fallen down on the VLDB back end requirements of some of these systems.

Until 5.1 it seems. Today I saw a commit to the internals email list from Pekka Nousiainen that refers to “Pgman” – which seems to handle buffer pool caching and flushing from/to disk based storage – for the NDB storage engine. I’ll save you all the code, but this comment seems to sum it up fairly well for now:

+ *
+ *
+ * Central structure is “page entry”. It corresponds to a disk page
+ * identified by file and page number (file_no, page_no).
+ *
+ * A page entry is created by first request for the disk page.
+ * Subsequent requests are queued under the same page entry.
+ *
+ * A new entry must wait to be “bound” to an available buffer page
+ * (called “real page” here). If the disk page is not “new” (empty),
+ * the entry must also wait to be “mapped” via “pagein” from disk.
+ *
+ * Entries are released on demand when page requests arrive for unknown
+ * pages. Release candidates are entries which point to a disk page
+ * which is “clean and not used”. They are ordered by some variant of
+ * least recently used (LRU).
+ *
+ * A background clean-up process makes “dirty” pages clean via “pageout”
+ * to disk. UNDO log entries are first flushed up to the maximum log
+ * sequence number (LSN) of the page. This is called write ahead
+ * logging (WAL). The clean-up process prefers “dirty and not used”
+ * pages and lower LSN values.
+ *
+ * A local check point (LCP) performs complete pageout of dirty pages
+ * since given LSN. It needs a stable entry list to iterate over.
+ *
+ * Page entries are put on ordered lists accordingly:
+ *
+ * The “main” list contains all entries. It is ordered by its unique
+ * key (file_no, page_no). It is used to look up page entries. LCP
+ * iterates over it to cover all candidate pages.
+ *
+ * Each entry also belongs to exactly one “sublist”. These drive page
+ * processing. The sublist is determined by page “state”. The state
+ * contains page properties such as discussed above.


Open Source DB Comparisons

I came across the following document posted to the mysql mailing list today. It’s a great feature comparison between MySQL, PostgreSQL, Firebird, MaxDB and Ingres.

You can download the paper here.

Whilst this may only list supported features as of MySQL 4.1.10 – once you know the new features available in MySQL 5.0, it’s easy to see there really isn’t very much holding MySQL back when compared to other open source offerings.

This is certainly one of those papers that should be handed to all of those PostgreSQL “fanboys” that constantly spout about MySQL *not* supporting ACID transactions, transaction isolation, row level locking and foreign keys etc. It even shows that MySQL supports *more* transaction isolation levels than PostgreSQL.

A very interesting read for anybody currently trying to decide which open source offering to go with!