Friday, December 15, 2006

Implementation of Foreign Keys for the PBXT storage engine

I have just committed the initial implementation of foreign keys for the PrimeBase XT storage engine, whew! The implementation is similar to that of InnoDB and supports the RESTRICT, CASCADE and SET NULL options.

Currently the PBXT implementation requires that the data types of the referenced columns are identical. InnoDB allows references between CHAR and VARCHAR and strings of different length. PBXT also requires that the index used by foreign keys have the exact number of components. InnoDB can use indexes that have more components than the number of columns specified in the foreign key declaration.

On the other hand, PBXT supports all SQL syntax that can be used to create foreign keys. This includes the following syntax which is not supported by InnoDB:

CREATE TABLE t2 (
id INT PRIMARY KEY,
fk int REFERENCES t1(id)
) ENGINE=pbxt;

Maybe you are wondering why one engine supports a certain syntax, and another does not. Well, this was one thing that surprised me as well: MySQL parses but ignores the foreign key definitions. In fact, when you execute SHOW CREATE TABLE, MySQL calls the storage engine for the text of the foreign key definitions. So one of the biggest jobs has been to parse the SQL statements and extract the foreign key information.

Unfortunately, its not just done with CREATE TABLE. The engine also has to support ALTER TABLE, and all other statements that modify a table (e.g. CREATE/DROP INDEX) have to be parsed to determine which table is being altered. The reason for this is, when MySQL alters a table it creates a new table and copies the data from the old to the new table. So when the new table is created the engine has to identify the source table and copy over the foreign key definitions or they will be lost!

One thing to watch out for when parsing the SQL statements is that column and table names have to be converted to UTF-8. The conversion depends on the charset of the text which in turn depends on the current charset of the database connection. Table names have to be further converted into file names which use a special encoding schema to avoid non-US ASCII characters. All this requires calling certain character conversion routines in MySQL.

The other major problem that needed to be solve involved the MySQL st_table structure. This structure contains all the information about a table, including: column definitions, data types, character sets and index descriptions. This is actually the data from the .frm file of the table. Since the engine has access to this structure when a table is opened by MySQL, there is normally no need for the engine to store this information itself.

However, foreign keys force the engine to access tables that are not necessarily open by MySQL. For example if MySQL does an INSERT on table t2 (see definition above), then MySQL will open t2 before it performs the operation. But the engine is required by the foreign key definition to check column id on table t1. This table may not be open by MySQL (depending on the table cache). And without the st_table structure accessing the table is not possible, unless the engine stores all the information itself.

It took a little while to figure out, but my solution was to open the MySQL table myself. This means that PBXT calls routines in MySQL that load the information from the .frm file as needed.

Of course the engine will have a bit of a problem if the definition of any of the functions change. Since the engine is linked dynamically to MySQL this could occur without notice. So I think there is a general need to make these routines part of an official engine callback API.

Next on my list is CHECK and REPAIR table but before I get on to that, I will be on holiday in South Africa until the 12th of January! :)

If you would like to test PBXT foreign keys, checkout the trunk of the subversion tree directly as follows:

$ svn co https://pbxt.svn.sourceforge.net/svnroot/pbxt/trunk/pbxt pbxt

Then, assuming you have your MySQL tree in the directory /home/build/mysql-5.1.12 you can build PBXT as follows:

$ cd pbxt
$ ./configure --with-mysql=/home/build/mysql-5.1.12 --libdir=/home/build/mysql-5.1.12/mysql-test/lib/mysql
$ make install


This will install the PBXT plug-in (libpbxt.so) in the directory required by mysql-test-run. If you built MySQL as a debug version then remember to include --with-debug=full when configuring PBXT.

Instructions about how to test PBXT with mysql-test-run are given in the README file in the mysql-test-update directory. I have modified the InnoDB test scripts that test foreign keys to work with PBXT.

Tuesday, November 28, 2006

PBXT: Experience with testing Storage Engines

Many users may be unaware of the fact that there is a great tool for automatically testing MySQL called mysql-test-run. The most recent version of this program is written in perl and resides in the mysql-test directory of the MySQL source code tree.

Tests are written as scripts (.test files) which contains SQL statements and a number of meta-commands. Each .test file has a corresponding .result file which is used to verify the test output. A text-based "diff" is used to confirm that the output generated by the test is identical to the expected result.

To run a test mysql-test-run automatically sets up a clean environment and starts the MySQL server. It then executes the test scripts specified (by default all tests are executed) and verifies the output.

Most of the tests shipped with MySQL are intended to run with the MyISAM storage engine. There are also tests specifically created for other MySQL engines like NDB and InnoDB.

But what about testing 3rd party engines like PBXT? Firstly, its easy enough to start the test, just execute the following:

$ ./mysql-test-run --mysqld=--default-storage-engine=pbxt

This tells mysql-test-run to start mysqld with the option that sets PBXT to the default engine. So all CREATE TABLE statements that do not explicitly specify an engine now use PBXT. After this the real work starts:

There are over 630 tests in MySQL 5.1.12 and currently PBXT requires 223 individual changes to the test scripts for them to run. Of the 630 test results, 187 need to be changed. This is a lot of work given that verifying a difference is not a bug can be quite time consuming.

Meanwhile many differences are trivial, so ideally they would not appear at all. Lets have a look at some of the most common test differences:

SHOW CREATE TABLE:

The SHOW CREATE TABLE statement output depends on the default engine. So CREATE TABLE (...) ENGINE=MyISAM must be changed to CREATE TABLE (...) ENGINE=PBXT in over 360 instances.

The correct way to solve this problem would be to use the mysql-test-run command --replace-regex to transform the output to something like ENGINE=?. But, of course, this change would need to be checked into the MySQL source code tree to be of long term help.

MERGE, FULLTEXT or GEOMETRY:

All these features are only supported by MyISAM tables. When using these features the test should explicitly specify ENGINE=MyISAM. An even better solution would be if MySQL could automatically pick an engine that supported the features used in the CREATE TABLE. In this case the default engine will take preference.

ENGINE=InnoDB:

In order to use the InnoDB tests to test some other transactional engine, all the CREATE TABLE (...) ENGINE=InnoDB statements must be changed. Ideally MySQL would support syntax of the form: ENGINE=TRANSACTIONAL, which would automatically pick a transactional engine from those available. If the default engine is transactional then that would obviously be used.

ORDER BY Required:

By far the biggest problem when testing PBXT is that the "natural" SELECT order often differs from that of MyISAM. This means that a SELECT without an ORDER BY returns the correct rows, but in a different order to MyISAM. This is due to the way PBXT manages free space.

Making this problem even worse is the fact that the order can differ from one run to the next because PBXT manages the free space asynchronously. The only thing that helps is to add an ORDER BY clause to the SELECT, but I am wondering if the problem cannot be solved in general by mysql-test-run. For example this could be done by a more intelligent diff function.

EXPLAIN Differences:

Each engine has its own way of calculating the heuristics used by the optimizer. This causes many differences in the result files due to slightly different EXPLAIN output. On the other hand there is a need to verify that queries are being optimized correctly. I'm clueless on this one at the moment, so suggestions are welcome.

I think other engine developers will agree, its a major task so any help in testing will be greatly appreciated!

Monday, November 06, 2006

The PBXT pluggable storage engine and the MySQL 5.1.12 Beta release

OK, it took a while, but it has been worth the wait. At last I can now guarantee that when you download the PBXT source code, it will compile! But seriously, with the released of MySQL 5.1.12 Beta we now have an excellent platform for testing the pluggable storage engine API.

These are interesting times because we are seeing the future of storage engines in MySQL. The PBXT 0.9.73 release demonstrates the ease with which externally built storage engines can be combined with MySQL in the future. And, as I have mentioned before, PrimeBase XT is the first engine to take full advantage of this new feature in MySQL.

And while I am singing my praises let me remind you of 2 other reasons why you should try out PBXT: the engine achieves high concurrency using a pure MVCC implementation (MVCC stands for multi-generational concurrency control, if you want to know why, ask Jim! ;) and great performance with a write-once update strategy.

Download and further details can be found at http://www.primebase.com/xt as usual. The PBXT project is now hosted by SourceForge.net so for SVN access, bug reports and tracking, please go to: http://sourceforge.net/projects/pbxt.

To help you get started I have included 6 steps to build and test PBXT with MySQL 5.1.12 below.

Building and testing PBXT for MySQL 5.1.12

This description assumes the following:
  • The user's name is myname.

  • The home directory of myname is: /home/myname

  • myname is using a working directory called: /home/myname/mysql-dev

1. Download the MySQL 5.1.12 source code tree, and unpack it in the /home/myname/mysql-dev directory.

2. Build MySQL as follows:

$ cd /home/myname/mysql-dev/mysql-5.1.12-beta
$ ./configure --prefix=/home/myname/mysql-dev/test --with-debug=full --with-mysqld-user=myname --with-extra-charsets=complex --with-plugins=max-no-ndb --with-embedded-server --enable-thread-safe-client --with-big-tables
$ make


3. Download the source code of PBXT 0.9.73 (or later), and unpack it in the /home/myname/mysql-dev directory.

4. Build and install PBXT as follows:

$ cd /home/myname/mysql-dev/pbxt-0.9.73-beta
$ ./configure --with-mysql=/home/myname/mysql-dev/mysql-5.1.12-beta --libdir=/home/myname/mysql-dev/mysql-5.1.12-beta/mysql-test/lib/mysql --with-debug=full
$ make install


5. Install the PBXT version of the mysql-test-run scripts:

$ cd mysql-test-update
$ ./update_mysql_test.sh /home/myname/mysql-dev/mysql-5.1.12-beta


6. Run the MySQL test scripts:

$ cd /home/myname/mysql-dev/mysql-5.1.12-beta/mysql-test
$ ./mysql-test-run --mysqld=--default-storage-engine=pbxt --force


NOTES:
  • Almost all the tests should run through without error (I am still working on 2 or 3 tests that fail).

  • Step 4 installs the PBXT shared library, libpbxt.so, in the mysql-test/lib/mysql directory where it is required by mysqld, which it is started by mysql-test-run.

  • Step 5 modifies the file mysql-test/lib/init_db.sql, adding the statement: INSERT INTO plugin VALUES ('pbxt', 'libpbxt.so');, which will cause mysqld to load the plug-in when started.

  • Both MYSQL and PBXT must be compiled with the same value for the --with-debug switch (on/off/full). If not, the plug-in will not load.

Normally you would install a plug-in using the command: INSTALL PLUGIN pbxt SONAME 'libpbxt.so';. However, unless you add this statement to every test script, this will not work when using mysql-test-run because the server environment is re-installed every time mysql-test-run starts.

By the way, the new perl based mysql-test-run is much faster on startup than the old UNIX shell based program. Makes testing much more fun ... thanks MySQL :)

Wednesday, October 25, 2006

PBXT now hosted by SourceForge.net!

I would have liked to have done this on MySQLForge but, as you know, this is not yet possible. However, if MySQL were to provide even part of their, obviously, excellent development management system for external projects I would be one of the first on board.

In the meantime, PrimeBase XT source code and development will be managed on SourceForge.net: http://sourceforge.net/projects/pbxt.

For new projects it really makes sense to use Subversion instead of CVS, so to get the latest version of the source code you enter the following:

svn checkout https://svn.sourceforge.net/svnroot/pbxt/trunk/pbxt pbxt

If you are a Mac OS X user, or your UNIX distribution does not include the Subversion client, you can download it from Tigris.org. I recommend Mac OS X users install the version from a .dmg since the Fink version is not yet available in binary form (with SSL support).

[ By the way, this may be of interest to developers using Mac OS X: the only way I have managed to compile the latest MySQL 5.1 sources is by using Fink to download and install the latest versions of automake, autoconf, make, bison and libtool! ]

Following the Subversion convention, the current Beta release is available as a snapshot in the 'tags' directory:

https://svn.sourceforge.net/svnroot/pbxt/tags/pbxt-0.9.72-beta

And, of course, bug reporting and tracking will also be handled by SF.net, so from now on please report all bugs at Projects » PrimeBase XT » Tracker » Bugs, and feature requests at Projects » PrimeBase XT » Tracker » Feature Requests.

Tuesday, October 24, 2006

PBXT mentioned on SlashDot by CEO Mårten Mickos

Following up on what Stewart and Colin reported: this definitely is the first time PBXT has been mentioned on SlashDot: MySQL CEO Mårten Mickos Answers Your Questions.

Mårten, speaking of the the innovative power of MySQL AB, says: "But more massive is the innovation that happens in the MySQL ecosystem", and lists PBXT amongst a lot of other great work.

So I am very proud that PBXT was mentioned in this context!

Friday, September 29, 2006

New PBXT Beta release includes a pluggable storage engine for MySQL 5.1

I have just released the first Beta version of PrimeBase XT. Besides MySQL 4.1.21 support it is also available as a pluggable storage engine for MySQL 5.1. As far as I know, PBXT is the first full featured engine to be released in this form.

The engine can be downloaded (at http://www.primebase.com/xt) and built separately from MySQL. Currently, a reference to a MySQL 5.1 source tree is required to build the plug-in. However, in the future things will be even easier. Lenz Grimmer has told me that the plan is to include the required headers in the mysql-devel package. Then it will only be necessary to install the mysql-devel package in order to build the engine.

Still further in the future, once 5.1 is released, I will be able to make the PBXT engine available as a binary download for a number of platforms.

Once you have built the PBXT engine, you can install it by copying the binary, libpbxt.so, to the MySQL plug-in directory. Then enter the following command:

mysql> INSTALL PLUGIN pbxt SONAME 'libpbxt.so';

After this, you can create tables that use the PBXT engine by setting ENGINE=PBXT in the table declaration as usual. It's that easy!

I think this makes PrimeBase XT a great demo of the ease and power of the new pluggable storage engine API for MySQL 5.1.

Wednesday, September 06, 2006

PBXT Presentation at the Hamburg MySQL September Meetup

My thanks to Lenz Grimmer for the opportunity to present PrimeBase XT at the Hamburg MySQL September Meetup last Monday night. The meeting took place in a well equipped conference room belonging to the Chinese restaurant "Ni Hao" in Wandsbek. With over 20 attendees the venue was filled to capacity.

I must admit, I was a bit surprised (although I probably should not have been) by the highly technical nature of the questions and comments during the presentation and afterwards. As a result we had great time and spent quite a while discussing the various algorithms and implementation strategies used by XT.

I have posted the presentation as PDF on the PrimeBase XT home page. Among the topics of the presentation were: our motivation, a brief history of XT, future plans, design aspects and the results of some preliminary performance testing.

Of particular interest to many was the garbage collection mechanism. There was some concern as to just how much load the garbage collector would place on the server. When I presented a graphic showing the data throughput rate vs the record size written, someone asked why there was no dropoff in the performance when the garbage collector kicks in.

Indeed, performance is not affected adversely by the garbage collector. This is mainly due to the fact that the garbage collector is a low-priority thread, so it is immediately descheduled when it would affect foreground activity. This is not a problem as long as the system has enough disk space to be able to delay garbage collection until the peak access times are over. This can be contrasted with a conventional RAM garbage collector which is forced to run due to the fact that it manages a relatively scarce resource.

But besides this, the XT garbage collector generally does not require much CPU time. For example, it does not need to search for garbage, this is marked as such by the "sweeper" thread (see my white paper). I plan to discuss this topic in more details in a future blog, so stay tuned...

Monday, August 07, 2006

PBXT on schedule for Beta in September

Its a dirty job, but somebody's got to do it! I am talking about testing and debugging, of course, which is what I have been doing for the last month. But at least I have the satisfaction of knowing that XT is a lot more stable.

Over 95% of the "mysql-test-run" scripts now run through correctly with PBXT as the default engine. I have documented the changes made to the test scripts here: pbxt-test-run-changes.txt. This file also explains the major differences between MyISAM and PBXT, and lists all features are not yet implemented.

If you have compiled XT and would like to run the tests, then enter the following commands:

cd mysql-test
./mysql-test-run --force --mysqld=--default-storage-engine=pbxt


I have also compiled and tested XT on a multi-processor, 64-bit machine. I used sysbench 0.4.7 for multi-threaded, stability and performance testing. If you have downloaded and build sysbench, then you can measure the speed of XT using the following commands:

To create a test table containing 1000000 rows:

sysbench --test=oltp --oltp-table-size=1000000 --mysql-table-engine=pbxt --mysql-engine-trx=yes --mysql-user=root --mysql-socket=/tmp/mysql.sock prepare

Then, to perform a test with READ-ONLY queries:

sysbench --test=oltp --oltp-table-size=1000000 --init-rng=on --mysql-engine-trx=yes --oltp-table-name=sbtest --oltp-test-mode=complex --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on --oltp-read-only=on --max-requests=0 --num-threads=4 --oltp-dist-type=special --init-rng=on --mysql-db=sbtest --max-time=60 --mysql-user=root --mysql-socket=/tmp/mysql.sock run

But more interesting is a test with update transactions, as follows:

sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --oltp-test-mode=complex --num-threads=2 --mysql-engine-trx=yes --init-rng=on --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --mysql-user=root --mysql-socket=/tmp/mysql.sock --oltp-table-name=sbtest --oltp-dist-type=special --max-requests=50000 run

In this test, each transaction performs 3 UPDATEs, a DELETE and an INSERT. According to my tests, SELECT speed is similar to InnoDB, but the update queries are significantly faster. It would be great if someone could confirm these results.

Judging by the stability of the current version, it looks like I will be able to achieve my goal of a Beta release next month. As I mentioned before, the only feature still to be completed for the Beta version is a pluggable storage engine for MySQL 5.1.

Tuesday, July 04, 2006

New PBXT Release 0.9.5: Smaller, Faster, Better...

The new release of PBXT is, in fact, smaller than the previous version. This is due to a major re-structuring of the code. The result is a more compact and direct implementation. With less code to execute and some major changes this version is also a lot faster.

The most significant change is that all files (except the transaction logs) are now associated with a particular table. All table related files begin with the name of the table, and can be easily identified. If a database is not being accessed you can copy a table by copying the associated files, or drop a table by deleting the files. However, the main reason for this change was to improve the speed of sequentially reading and writing.

Records are now divided into a fixed and a variable length component. The fixed length component is stored together with the handle in the record data file. The variable component is referenced by the handle and is stored in one of a number of data log files associated with the table.

This new format reduces the number of disk accesses needed to read, create and update a row significantly. Furthers details of changes in this version can be found in the release notes.

In this release I have also resolved almost all of the issues on my to-do list. This includes the items regarding auto-increment, caching, unique key checking and all the bugs reported so far.

The only feature still due for the Beta release in September is the 5.1 implementation which I will be working on next.

I would much appreciate any feedback, bug reports and performance test results!

Thursday, June 08, 2006

PBXT featured on the MySQL home page

Last week I spoke to Lenz Grimmer, MySQL Community Relations Manager for Europe, ME and Africa, about the Users Conference, PBXT and the Community.

The interview has been featured under Articles on the MySQL home page.

For those of you that missed it, here is the link to the interview again:

Interview with Paul McCullagh, developer of the PrimeBase XT Storage Engine.

Thursday, April 27, 2006

PBXT mentioned by CEO Mårten Mickos at the MySQL Users Conference Keynote

Ok, so here is a story:

I was standing with Giuseppe Maxia waiting to go into this mornings keynote when a smartly dressed man approached Giuseppe to congratulate him. The reason was that Giuseppe had won an award the day before for service to the MySQL community. So Giuseppe introduced the guy as Marten (the rest of the name I don't get, and he was not wearing a name tag).

Marten seemed to recognize my name and asked if I was a frequent visitor to the conferences. I said, "no, I am the PBXT guy". Thats when he realized why he had heard my name.

So then we chatted a bit, and he asked me how difficult it had been to write the engine, how long it had taken me and what my engine can do well. I said I was more concerned about what it could not do so well then what it could do well, but then I told him what was good in my performance tests, etc.

A little later Giuseppe and I moved into the hall for the keynote. The CEO was introduced and I started thinking, hey, isn't this the guy I just spoke to.

Well, I soon realized it was Mårten Mickos the CEO of MySQL whom I had just spoken to because he included a few details of our conversation in his keynote!

When discussing innovation in the MySQL community, this is what he said (at the risk of being immodest):

"I just spoke to the guy that programmed the PBXT engine: He developed the engine in the 6 months. I thought we had hired all the best database guys. Well there are still a lot of very clever people out there in the community."

(The last sentence is very true, of course).

And later he mentioned that PBXT is particularly good at handling variable length records.

Then on the slide which listed the storage engines: Falcon (by MySQL themselves), Solid (new partner) and InnoDB (new partner Oracle), he mentioned PBXT again, even though it was not on the slide.

So this is very encouraging because, with Falcon, Solid and InnoDB around, in any other conventional company PBXT would be sidelined. I know this from experience. This is because Falcon, Solid and InnoDB are built on existing code developed by well established companies.

So why are things different with MySQL. Because the community is extremely important to MySQL! I mean, just think about it: MySQL was build by the community!

Tuesday, April 04, 2006

PBXT and the Community

I would just like to thank all those who have downloaded and tested PBXT for their help and suggestions. I have updated my to-do list to include all the issues raised so far.

Although my own performance tests indicate that PBXT has great potential, the more general MySQL performance tests show that the implementation is still in its early days.

Of course, there is still a lot to be done. Everybody knows there is a big difference between a running program and a production system. It is the most challenging part of writing a program. But also the most rewarding because it will eventually see the program being put to useful work.

Some people at MySQL have expressed surprise that an effort to develop a new engine has come out of the community. Personally I think I was more surprised that I am already part of the community!

I guess I am used to the turf wars that pervade most other large companies, so I really was not expecting the warm welcome I have received from everybody.

Thanks for making PBXT part of the community!

Wednesday, March 29, 2006

PBXT: May the source be with you!

Now that I have PBXT running under both Mac OS X (Power PC) and Linux, I have posted the source code. Here is the download URL:

http://www.primebase.com/xt/download/mysql-4.1.16-pbxt-0.9.1.tar.gz

As the name suggests, the engine has been integrated into MySQL 4.1.16 (the nightly build of November 4, 2005). I have labeled this version Alpha since it is almost feature complete, but please check my to-do list before you build and run it:

http://www.primebase.com/xt/download/pbxt-to-do.txt

Here are some brief instructions for building MySQL with PBXT:

http://www.primebase.com/xt/download/how-to-build.txt

You can send any questions, suggestions or bug reports to me directly: paul.mccullagh(a)primebase.com.

My initial tests show that the performance of PBXT under Linux/Intel is very promising, to put it modestly. More about this later.

Tuesday, March 21, 2006

Concerning: PBXT, MySQL and Mountains

Is PrimeBase XT (PBXT) just another transactional engine for MySQL? I think (hope) not! But, even if it was, it would still be justified considering the current lack of an independent (i.e. not owned by a monopolist) transactional engine. But I started developing PBXT at the beginning of last year for a much more basic reason, after all I had no idea what Oracle was about to do.

I started programming PBXT simply because it was there!

But wait, you say, "we climb mountains because they are there, we don't program software because it is there", because it isn't (before we start). Well I think many programmers can relate to that statement anyway. Programming is much like climbing a mountain. You start out with great enthusiasm, wanting to do things better and quicker than before. But as you get higher, the air gets thinner. You struggle on and as you approach the top it gets tougher and tougher (we all know how long it takes to finish that last 10%)!

But when you reach the peak, the summit, there is nothing like it! You have conquered the beast of chaos and doubt, the air is fresh and you can see for miles. Inspiring, isn't it? Well I think so.

OK, so programming is like climbing a mountain, but how can you climb a mountain that isn't there? Well, it is there, actually, isn't it? Right there, in your head.

And so it was with PBXT. After quite a few years of programming databases I had some ideas (questions actually), for example: what would it be like if the database wrote only sequentially? Must a database always write twice? Can we eliminate read locks? Is it possible to commit or rollback without doing any work? Is it possible to startup without recovery?

Using some known techniques and combining them with a few new ideas, I have found some answers to these questions. You can read about it in my White Paper at http://www.primebase.com/xt.

So have I reached the top? Of course not, nobody ever reaches the top! Seriously though, I have a few things on my to-do list, but mostly PBXT is untested. So I am thankful that testing is the first area in which I have been offered help (thanks Ronald) and also from MySQL themselves (thanks Brian). Any more help would be welcome and we'll turn this into an expedition.

I plan to keep the address above posted with the latest information, versions, etc. and we'll just take it from there, one step at a time...