PrimeBase XT

Tuesday, June 05, 2007

"GET"ing a BLOB from the database with the BLOB Streaming Engine

Current plans call for the use of the HTTP protocol to upload and retrieve BLOBs to and from the database. The BLOB Streaming Engine makes this possible by integrating a lightweight HTTP server directly into the MySQL server.

I am currently working on an alpha implementation of this, and would like to give a short demonstration of what is possible using this system.

We start by creating a table using any streaming enabled storage engine (a streaming enable storage engine, is an engine that supports the server-side streaming API):

use test;
CREATE TABLE notes_tab (
  n_id        INTEGER PRIMARY KEY,
  n_text      BLOB
) ENGINE=pbxt;
INSERT notes_tab VALUES (1, "This is a BLOB streaming test!");

Now assuming the MySQL server is on the localhost, and the BLOB streaming engine has been set to port 8080, you can open your browser, and enter this URL:

http://localhost:8080/test/notes_tab/n_text/n_id=1

With the following result:

So without even doing a SELECT, you can GET a BLOB directly out of the database!

Note that there is no need for the BLOB in the database to be explicitly "streamable" for this to work.

Friday, May 18, 2007

The Scalable BLOB Streaming discussion so far

Having discussed BLOB streaming for MySQL with a number of people I have gathered quite a bit of input on the subject.

So here are some details of the issues raised so far:

Server-side API

One of the first things that Brian Aker pointed out to me was that the server-side API must make it possible to use the sendfile() system call. The call does direct disk to network transfer and can speed up delivery of a BLOB stream significantly.

The server-side API must also include a mechanism to inform the stream enabled engine that an upload is complete. Assuming the streaming protocol used is HTTP, there are 2 ways of determining the end of a download: either the connection is closed when download is complete, or the length of the BLOB data is specified in a HTTP header.

ODBC Issues

Monty was concerned about compatibility with the MySQL ODBC driver. The ODBC function SQLPutData() is used to send BLOB data, and SQLGetData() is used to retrieve BLOB data. Both functions transfer data in chunks. The implementation of these functions would have to be made aware of streamable BLOB values.

This can probably be done in a way that is transparent to the user. It may be necessary to designate certain database types as streaming BLOBs. One suggestion is to add 2 new types to MySQL. LONGBIN and LONGCHAR. These keywords are not yet used by MySQL.

However, from the OBDC side the data types LONGVARBINARY and LONGVARCHAR are already mapped to the MySQL types BLOB and TEXT respectively. But this simply means that it would be transparent to an ODBC application whether a BLOB is being streamed, or retrieved using the standard client/server connection.

Upload Before or After INSERT

There is some debate about how exactly a BLOB should be inserted. The 2 main possibilities are:

1) Upload the BLOB. The upload returns a handle (or key). Perform the INSERT setting the BLOB column to the key value.

2) Perform an INSERT, but specify in a CREATE_BLOB() function the size of the BLOB that is to be inserted into the column. The caller then SELECTs back the inserted data. In the BLOB column the caller finds a handle. The handle can then be used to upload the data.

For method (1) it was also suggested that the client application specify a name, instead of using a handle returned by the server. The name could then be used as input to a "create_blob" function which is specified in the INSERT statement. The advantage of this is that the text of the insert statement is readable.

One advantage of method (1), as pointed out by Monty, is that the client does not have to wait for the upload to complete before doing the INSERT. The server could wait for the upload to complete. This would require a client side generated identifier for each BLOB uploaded.

Note that method (2) requires use of a transaction to prevent the row with the BLOB from being selected before the BLOB data has been uploaded.

Storage of BLOBs

Stewart Smith suggested that the BLOBs be stored by the streaming engine itself. This is something to be considered, and probably will be necessary in most scale-out scenarios. Otherwise the current plan is that each engine that is streaming BLOB enabled (i.e. that supports the server-side streaming API) will store the BLOBs themselves.

Scaling Writes

Also mentioned in regard to scaling is the fact that read scale-out is usually more important than write. However, there are web-sites that have a heavy write load. To be investigated would be how NBD could used in combination with the streaming engine to scale-out uploading of BLOBs.

mysqldump & mysql

As pointed out by Monty, the output of mysqldump is a readable stream. So any changes to mysqldump in order to support streamable BLOBs should ensure that this continues to be the case. This is related to the question of how to upload BLOBs using the mysql client. One idea would be to provide a command in mysql to upload a file or a block of binary data or text.

Security

In some cases it may be necessary to check whether a client is authorized to download or upload a BLOB. The most straight forward method would be for the server to issue authorization tokens which are submitted with an upload or download. These tokens expire after a certain amount of time, or when the associated server connection is closed. However, if the streaming protocol is to be HTTP then we need to investigate the possibility of using standard HTTP authentication when retrieving and sending BLOBs.

In-place Updates

In-place updates of BLOBs is allowed by some SQL servers (for example by using functions UPDATETEXT and WRITETEXT supported by MS SQL Server). In this case the streaming enabled engine would be responsible for locking or using some other method to ensure that concurrent reads of the BLOB remain consistent.

PHP Upload

Georg Richter, who is responsible for the PHP Native Driver at MySQL, noted that using BLOB streaming, data could be uploaded from the Web directly into the database. The PHP extension to the standard MySQL client software should include a function to make this possible.

ALTER TABLE

Georg also pointed out that it should be possible to convert BLOBs currently stored in MySQL tables into streamable BLOBs using ALTER TABLE. This would enable developers to quickly try out streaming BLOBs and test the performance characteristics, in combination with their applications.

These are the main issues raised so far. Any further ideas, comments and questions are welcome! :)

Tuesday, May 01, 2007

PBXT and the MySQL Conference & Expo 2007

The conference is over, and it was a great week but pretty exhausting! From what I saw, and from what I have heard the sessions were of a very high standard this year. I have come back with a number of new ideas and quite a few things I would like to implement in PBXT. One of these is new online backup API for MySQL which will enable you to make a snapshot backup of data stored by all engines, and do point in time recovery.

For me the week started off with the BoF on scalable BLOB streaming on Tuesday evening. The BoF was well attended and there was significant interest in the topic. I will be reporting on some of the issues discussed soon.

On Wednesday morning I presented: PrimeBase XT - Design and Implementation of a Transactional Storage Engine. I was pleased with the number of questions, interest and feedback I received.

When I mentioned to Mårten Mickos that someone had said it was the best session they had heard so far at the conference, Mårten told me that they pay guys to say this to the speakers so that they feel good! So I must thank MySQL for this very thoughtful gesture - joke, of course ;)

Thanks also to Sheeri Kritzer for video recording the talk so I hope to get a link to that soon. In the meantime I have posted my slides to the PBXT home page:

http://www.primebase.com/xt/download/pbxt_mysql_uc_2007.pdf

After that I also took part in 2 lightning rounds sessions: Top MySQL Community Contributors and State of the Partner Engines. Both sessions were interesting in their diversity. One of the top contributors, Ask Bjørn Hansen, described how meeting his goal of filing a bug a week was a lot easier in the early days of MySQL. However, more recently he has been helped by the current state of the GUI tools!

During the Partner Engines session, it came as a surprise to some people in the audience that not all engines are for free. Actually there is a wide spectrum from GPL over partially free to highly priced. In fact, one of the developers of a data-warehousing engine found the question as to whether it may be free quite amusing. Solid has not decided to what extent it high-availability offering will be commercial, and the Amazon S3 engine is free, but the service behind it not. So that's something to watch out for in general.

I took the opportunity during the Partner Engines session to mention our plans for the future of PBXT which involve building a scalable BLOB streaming infrastructure for MySQL. This is relevant to a number engine developers as we will be providing an open system which will enable all engines to stream BLOBs directly in and out of the server. So please check out blobstreaming.org for more details.

Thursday, April 19, 2007

What makes the design of PBXT similar to MyISAM?

PBXT is a transactional storage engine, but what does the design have in common with MyISAM?

I'll be answering this and other questions during my session at the MySQL Users Conference next week:

PrimeBase XT: Design and Implementation of a Transactional Storage Engine
Date: Wednesday, April 25
Time: 10:45am - 11:45am
Location: Ballroom F

So be sure to check it out! :)

Monday, April 09, 2007

PBXT 0.9.86 the MySQL Conference 2007 release!

I have just released PrimeBase XT 0.9.86 which will be my last release before the MySQL User Conference this year. The most significant change in this version is the reduction of the number of data logs used per table. This, and a number of other modifications, makes PBXT fit to handle databases with 1000's of tables.

If you would like to learn more about the development and design of PBXT then join me for my session at the MySQL User Conference in Santa Clara, on Wed, April 25, 10:45am - 11:45am, Ballroom F:

PrimeBase XT: Design and Implementation of a Transactional Storage Engine

Looking ahead, I would also like to invite all who are interested to the "Scalable BLOB Streaming Infrastructure" BoF, which is scheduled for Tuesday, April 24, 8:30pm - 9:30pm.

This will be an informal discussion of our plans for the implementation of scalable BLOB streaming in MySQL, and how this all fits together with PBXT and other engines.

We would like as much input as possible at this early stage, so I hope you can make it!

I will also be taking part in a number of lightning rounds so check out this page for a complete list of my sessions:

I'm looking forward to seeing you all there!

Friday, March 23, 2007

PBXT and a scalable BLOB streaming infrastructure

As Kaj Arnö already mentioned in his blog, we have a vision for MySQL and it is called a scalable BLOB streaming infrastructure. Our plan is to build this into and around the MySQL architecture with the help of MySQL and the community.

It is a "big picture" idea and in this way a response to Robin Schumacher's question: The MySQL Vision - What do you see?. But at the same time it is very relevant and practical in the context of the Web 2.0 world.

The design of the system includes a client-side library which extends the existing MySQL client API, a stream based communications protocol and a scalable back-end which is (at least partially) linked into the MySQL server. In short, we want to make it possible to put BLOBs of any size in the database.

The system will be open, and available to all engines running under MySQL. Of course, I believe PBXT has the ideal design and performance characteristics to support this new infrastructure.

Many say that their programs run fine with the BLOBs stored in the filing system. And this is true, but what about transactional consistency, backup and replication? And what about scale-out for the storage and delivery of this data? Simply put, we know these are significant issues from our customers in the print and web publishing business that need this functionality.

And, when you think about it, media streaming has a very promising future. I believe it's becoming more important for an increasing number of developers. So this is more about opening up new opportunities and extending the reach of MySQL into new areas.

We have a pretty good idea of the basic design of the system, but this is just the starting point. The design must reflect the requirements of the users and developers, so our first step is to get as many as possible in the MySQL community together who have an interest in this technology.

We want to know what features and characteristics of such a system are important to you. What are your requirements of such a system?

To this end I am arranging a BoF on scalable BLOB streaming at the MySQL user conference next month. This will be a great opportunity to get together and discuss the subject with all who are interested, so I hope you will join me.

Of course you are welcome to contact me any time in this regard by e-mail at paul-dot-mccullagh-at-primebase-dot-com.

Monday, March 19, 2007

PBXT does Windows!

On Friday I posted PBXT version 0.9.85 which concludes my initial work on porting the engine to Windows NT/XP.

I have built a MySQL 5.1 executable (mysqld-nt.exe) to make it easy to try it out. The binary package can be downloaded from http://www.primebase.com/xt. Instructions on how to install are given in the README file.

If you would like to build it yourself, then download the source code from SourceForge.net, and follow the instruction in the windows-readme.txt file.

Unfortunately MySQL 5.1 does not support runtime loading of storage engines, as done on Linux and Mac OS X. Last I heard, this feature will also be a bit slow in coming, and is scheduled for 5.2. So if there is anybody else out there who would like to see this feature sooner, let your voices be heard!

On the whole the port to Windows was fairly straightforward. I miss atomic "seek-and-read/write" functions like pread and pwrite under Windows. But the only real source of problems was the pthread stuff. PBXT uses a bit more of the pthread library than the MySQL server, so I had to add implementations for several functions.

I would like to just link one of the available pthread libraries for Windows (such as pthreads-win32) but MySQL already implements some of the functions, which could cause link errors. In general, I think, MySQL should consider using a LGPL library such as pthreads-win32.

But this is a minor point, PBXT for Windows is running anyway...

Thursday, February 08, 2007

The first binary distribution of PBXT!

It's quite an effort to compile and test 3 versions of MySQL on 4 different machines, but that is what I have done for the first binary distribution of the PrimeBase XT pluggable storage engine.

Of course, its worth the effort because this is the "holy grail" of the pluggable storage engine strategy. Namely, binary plug-ins that work with the binary distributions prepared by MySQL and others in the community.

And by creating an installer script I have made it easier than ever to install the plug-in. So after you have downloaded the binary package from http://www.primebase.com/xt, all you have to do is:

1. Start your MySQL server.

2. Enter: sudo ./install

The installer will automatically determine the version of the server running and install the corresponding plug-in.

The package includes plug-ins for MySQL 5.1.14, 5.1.15 and 5.2.0 on Linux (32/64-bit) and Mac OS X (x86/PowerPC). Nevertheless you may wonder why the download is 22MB in size.

The reason is I have had to include patched versions of mysqld with the plug-ins for MySQL 5.1.14 and 5.2.0. The patch fixes a bug that causes MySQL to crash when restarted after a plug-in has been installed. The installer generates an uninstall script which restores the unpatched mysqld and removes the plug-in.

The bug is fixed in MySQL 5.1.15, so in the future the package should be much smaller, even when it includes more platforms.

Wednesday, January 31, 2007

PBXT 0.9.8 Beta with Referential Integrity released!

I have just released version 0.9.8 of the PrimeBase XT storage engine for MySQL 5.1. The major feature of this new version is foreign key support. As far as I know, this makes PBXT the first 3rd party storage engine to fully implement referential integrity, and the second after InnoDB.

With this version PBXT is practically feature complete. Only CHECK TABLE remains to be implemented for the first GA release (see pbxt-to-do.txt for details). So my plan to complete the PBXT GA release on time for MySQL 5.1 GA looks quite doable (especially now that I have heard MySQL 5.1 GA has been pushed to 3rd quarter 2007).

The latest package can be downloaded from www.primebase.com/xt or sourceforge.net/projects/pbxt. I have updated the instructions on how to build and install PBXT as a pluggable storage engine: how-to-build.txt. This text also explains how to download and build MySQL 5.1.14.

Although building everything from source code is relatively straight forward I plan to make binary versions of the plug-in available for MySQL 5.1.14 shortly. So a quick test drive of PBXT is about to become easier than ever...

Tuesday, January 23, 2007

PBXT Session at the 2007 MySQL Conference & Expo

Thanks to MySQL, and in particular Jay, for giving me the opportunity to talk about PBXT at the up-coming MySQL Conference in April. I will be presenting a session with the title: PrimeBase XT: Design and implementation of a transactional storage engine.

Topics include the design of XT and how it differs from conventional implementations, experience with implementing storage engines, surprises and gotchas, and performance. So I think this talk will be of interest to both end-users and other developers of storage engines.

See you there!

Friday, December 15, 2006

Implementation of Foreign Keys for the PBXT storage engine

I have just committed the initial implementation of foreign keys for the PrimeBase XT storage engine, whew! The implementation is similar to that of InnoDB and supports the RESTRICT, CASCADE and SET NULL options.

Currently the PBXT implementation requires that the data types of the referenced columns are identical. InnoDB allows references between CHAR and VARCHAR and strings of different length. PBXT also requires that the index used by foreign keys have the exact number of components. InnoDB can use indexes that have more components than the number of columns specified in the foreign key declaration.

On the other hand, PBXT supports all SQL syntax that can be used to create foreign keys. This includes the following syntax which is not supported by InnoDB:

CREATE TABLE t2 (
id INT PRIMARY KEY,
fk int REFERENCES t1(id)
) ENGINE=pbxt;

Maybe you are wondering why one engine supports a certain syntax, and another does not. Well, this was one thing that surprised me as well: MySQL parses but ignores the foreign key definitions. In fact, when you execute SHOW CREATE TABLE, MySQL calls the storage engine for the text of the foreign key definitions. So one of the biggest jobs has been to parse the SQL statements and extract the foreign key information.

Unfortunately, its not just done with CREATE TABLE. The engine also has to support ALTER TABLE, and all other statements that modify a table (e.g. CREATE/DROP INDEX) have to be parsed to determine which table is being altered. The reason for this is, when MySQL alters a table it creates a new table and copies the data from the old to the new table. So when the new table is created the engine has to identify the source table and copy over the foreign key definitions or they will be lost!

One thing to watch out for when parsing the SQL statements is that column and table names have to be converted to UTF-8. The conversion depends on the charset of the text which in turn depends on the current charset of the database connection. Table names have to be further converted into file names which use a special encoding schema to avoid non-US ASCII characters. All this requires calling certain character conversion routines in MySQL.

The other major problem that needed to be solve involved the MySQL st_table structure. This structure contains all the information about a table, including: column definitions, data types, character sets and index descriptions. This is actually the data from the .frm file of the table. Since the engine has access to this structure when a table is opened by MySQL, there is normally no need for the engine to store this information itself.

However, foreign keys force the engine to access tables that are not necessarily open by MySQL. For example if MySQL does an INSERT on table t2 (see definition above), then MySQL will open t2 before it performs the operation. But the engine is required by the foreign key definition to check column id on table t1. This table may not be open by MySQL (depending on the table cache). And without the st_table structure accessing the table is not possible, unless the engine stores all the information itself.

It took a little while to figure out, but my solution was to open the MySQL table myself. This means that PBXT calls routines in MySQL that load the information from the .frm file as needed.

Of course the engine will have a bit of a problem if the definition of any of the functions change. Since the engine is linked dynamically to MySQL this could occur without notice. So I think there is a general need to make these routines part of an official engine callback API.

Next on my list is CHECK and REPAIR table but before I get on to that, I will be on holiday in South Africa until the 12th of January! :)

If you would like to test PBXT foreign keys, checkout the trunk of the subversion tree directly as follows:

$ svn co https://pbxt.svn.sourceforge.net/svnroot/pbxt/trunk/pbxt pbxt

Then, assuming you have your MySQL tree in the directory /home/build/mysql-5.1.12 you can build PBXT as follows:

$ cd pbxt $ ./configure --with-mysql=/home/build/mysql-5.1.12 --libdir=/home/build/mysql-5.1.12/mysql-test/lib/mysql $ make install

This will install the PBXT plug-in (libpbxt.so) in the directory required by mysql-test-run. If you built MySQL as a debug version then remember to include --with-debug=full when configuring PBXT.

Instructions about how to test PBXT with mysql-test-run are given in the README file in the mysql-test-update directory. I have modified the InnoDB test scripts that test foreign keys to work with PBXT.

Tuesday, November 28, 2006

PBXT: Experience with testing Storage Engines

Many users may be unaware of the fact that there is a great tool for automatically testing MySQL called mysql-test-run. The most recent version of this program is written in perl and resides in the mysql-test directory of the MySQL source code tree.

Tests are written as scripts (.test files) which contains SQL statements and a number of meta-commands. Each .test file has a corresponding .result file which is used to verify the test output. A text-based "diff" is used to confirm that the output generated by the test is identical to the expected result.

To run a test mysql-test-run automatically sets up a clean environment and starts the MySQL server. It then executes the test scripts specified (by default all tests are executed) and verifies the output.

Most of the tests shipped with MySQL are intended to run with the MyISAM storage engine. There are also tests specifically created for other MySQL engines like NDB and InnoDB.

But what about testing 3rd party engines like PBXT? Firstly, its easy enough to start the test, just execute the following:

$ ./mysql-test-run --mysqld=--default-storage-engine=pbxt

This tells mysql-test-run to start mysqld with the option that sets PBXT to the default engine. So all CREATE TABLE statements that do not explicitly specify an engine now use PBXT. After this the real work starts:

There are over 630 tests in MySQL 5.1.12 and currently PBXT requires 223 individual changes to the test scripts for them to run. Of the 630 test results, 187 need to be changed. This is a lot of work given that verifying a difference is not a bug can be quite time consuming.

Meanwhile many differences are trivial, so ideally they would not appear at all. Lets have a look at some of the most common test differences:

SHOW CREATE TABLE:

The SHOW CREATE TABLE statement output depends on the default engine. So CREATE TABLE (...) ENGINE=MyISAM must be changed to CREATE TABLE (...) ENGINE=PBXT in over 360 instances.

The correct way to solve this problem would be to use the mysql-test-run command --replace-regex to transform the output to something like ENGINE=?. But, of course, this change would need to be checked into the MySQL source code tree to be of long term help.

MERGE, FULLTEXT or GEOMETRY:

All these features are only supported by MyISAM tables. When using these features the test should explicitly specify ENGINE=MyISAM. An even better solution would be if MySQL could automatically pick an engine that supported the features used in the CREATE TABLE. In this case the default engine will take preference.

ENGINE=InnoDB:

In order to use the InnoDB tests to test some other transactional engine, all the CREATE TABLE (...) ENGINE=InnoDB statements must be changed. Ideally MySQL would support syntax of the form: ENGINE=TRANSACTIONAL, which would automatically pick a transactional engine from those available. If the default engine is transactional then that would obviously be used.

ORDER BY Required:

By far the biggest problem when testing PBXT is that the "natural" SELECT order often differs from that of MyISAM. This means that a SELECT without an ORDER BY returns the correct rows, but in a different order to MyISAM. This is due to the way PBXT manages free space.

Making this problem even worse is the fact that the order can differ from one run to the next because PBXT manages the free space asynchronously. The only thing that helps is to add an ORDER BY clause to the SELECT, but I am wondering if the problem cannot be solved in general by mysql-test-run. For example this could be done by a more intelligent diff function.

EXPLAIN Differences:

Each engine has its own way of calculating the heuristics used by the optimizer. This causes many differences in the result files due to slightly different EXPLAIN output. On the other hand there is a need to verify that queries are being optimized correctly. I'm clueless on this one at the moment, so suggestions are welcome.

I think other engine developers will agree, its a major task so any help in testing will be greatly appreciated!

Monday, November 06, 2006

The PBXT pluggable storage engine and the MySQL 5.1.12 Beta release

OK, it took a while, but it has been worth the wait. At last I can now guarantee that when you download the PBXT source code, it will compile! But seriously, with the released of MySQL 5.1.12 Beta we now have an excellent platform for testing the pluggable storage engine API.

These are interesting times because we are seeing the future of storage engines in MySQL. The PBXT 0.9.73 release demonstrates the ease with which externally built storage engines can be combined with MySQL in the future. And, as I have mentioned before, PrimeBase XT is the first engine to take full advantage of this new feature in MySQL.

And while I am singing my praises let me remind you of 2 other reasons why you should try out PBXT: the engine achieves high concurrency using a pure MVCC implementation (MVCC stands for multi-generational concurrency control, if you want to know why, ask Jim! ;) and great performance with a write-once update strategy.

Download and further details can be found at http://www.primebase.com/xt as usual. The PBXT project is now hosted by SourceForge.net so for SVN access, bug reports and tracking, please go to: http://sourceforge.net/projects/pbxt.

To help you get started I have included 6 steps to build and test PBXT with MySQL 5.1.12 below.

Building and testing PBXT for MySQL 5.1.12

This description assumes the following:

The user's name is myname.

The home directory of myname is: /home/myname

myname is using a working directory called: /home/myname/mysql-dev

1. Download the MySQL 5.1.12 source code tree, and unpack it in the /home/myname/mysql-dev directory.

2. Build MySQL as follows:

$ cd /home/myname/mysql-dev/mysql-5.1.12-beta $ ./configure --prefix=/home/myname/mysql-dev/test --with-debug=full --with-mysqld-user=myname --with-extra-charsets=complex --with-plugins=max-no-ndb --with-embedded-server --enable-thread-safe-client --with-big-tables $ make

3. Download the source code of PBXT 0.9.73 (or later), and unpack it in the /home/myname/mysql-dev directory.

4. Build and install PBXT as follows:

$ cd /home/myname/mysql-dev/pbxt-0.9.73-beta $ ./configure --with-mysql=/home/myname/mysql-dev/mysql-5.1.12-beta --libdir=/home/myname/mysql-dev/mysql-5.1.12-beta/mysql-test/lib/mysql --with-debug=full $ make install

5. Install the PBXT version of the mysql-test-run scripts:

$ cd mysql-test-update $ ./update_mysql_test.sh /home/myname/mysql-dev/mysql-5.1.12-beta

6. Run the MySQL test scripts:

$ cd /home/myname/mysql-dev/mysql-5.1.12-beta/mysql-test $ ./mysql-test-run --mysqld=--default-storage-engine=pbxt --force

NOTES:

Almost all the tests should run through without error (I am still working on 2 or 3 tests that fail).

Step 4 installs the PBXT shared library, libpbxt.so, in the mysql-test/lib/mysql directory where it is required by mysqld, which it is started by mysql-test-run.

Step 5 modifies the file mysql-test/lib/init_db.sql, adding the statement: INSERT INTO plugin VALUES ('pbxt', 'libpbxt.so');, which will cause mysqld to load the plug-in when started.

Both MYSQL and PBXT must be compiled with the same value for the --with-debug switch (on/off/full). If not, the plug-in will not load.

Normally you would install a plug-in using the command: INSTALL PLUGIN pbxt SONAME 'libpbxt.so';. However, unless you add this statement to every test script, this will not work when using mysql-test-run because the server environment is re-installed every time mysql-test-run starts.

By the way, the new perl based mysql-test-run is much faster on startup than the old UNIX shell based program. Makes testing much more fun ... thanks MySQL :)

Wednesday, October 25, 2006

PBXT now hosted by SourceForge.net!

I would have liked to have done this on MySQLForge but, as you know, this is not yet possible. However, if MySQL were to provide even part of their, obviously, excellent development management system for external projects I would be one of the first on board.

In the meantime, PrimeBase XT source code and development will be managed on SourceForge.net: http://sourceforge.net/projects/pbxt.

For new projects it really makes sense to use Subversion instead of CVS, so to get the latest version of the source code you enter the following:

svn checkout https://svn.sourceforge.net/svnroot/pbxt/trunk/pbxt pbxt

If you are a Mac OS X user, or your UNIX distribution does not include the Subversion client, you can download it from Tigris.org. I recommend Mac OS X users install the version from a .dmg since the Fink version is not yet available in binary form (with SSL support).

[ By the way, this may be of interest to developers using Mac OS X: the only way I have managed to compile the latest MySQL 5.1 sources is by using Fink to download and install the latest versions of automake, autoconf, make, bison and libtool! ]

Following the Subversion convention, the current Beta release is available as a snapshot in the 'tags' directory:

https://svn.sourceforge.net/svnroot/pbxt/tags/pbxt-0.9.72-beta

And, of course, bug reporting and tracking will also be handled by SF.net, so from now on please report all bugs at Projects » PrimeBase XT » Tracker » Bugs, and feature requests at Projects » PrimeBase XT » Tracker » Feature Requests.

Tuesday, October 24, 2006

PBXT mentioned on SlashDot by CEO Mårten Mickos

Following up on what Stewart and Colin reported: this definitely is the first time PBXT has been mentioned on SlashDot: MySQL CEO Mårten Mickos Answers Your Questions.

Mårten, speaking of the the innovative power of MySQL AB, says: "But more massive is the innovation that happens in the MySQL ecosystem", and lists PBXT amongst a lot of other great work.

So I am very proud that PBXT was mentioned in this context!

Friday, September 29, 2006

New PBXT Beta release includes a pluggable storage engine for MySQL 5.1

I have just released the first Beta version of PrimeBase XT. Besides MySQL 4.1.21 support it is also available as a pluggable storage engine for MySQL 5.1. As far as I know, PBXT is the first full featured engine to be released in this form.

The engine can be downloaded (at http://www.primebase.com/xt) and built separately from MySQL. Currently, a reference to a MySQL 5.1 source tree is required to build the plug-in. However, in the future things will be even easier. Lenz Grimmer has told me that the plan is to include the required headers in the mysql-devel package. Then it will only be necessary to install the mysql-devel package in order to build the engine.

Still further in the future, once 5.1 is released, I will be able to make the PBXT engine available as a binary download for a number of platforms.

Once you have built the PBXT engine, you can install it by copying the binary, libpbxt.so, to the MySQL plug-in directory. Then enter the following command:

mysql> INSTALL PLUGIN pbxt SONAME 'libpbxt.so';

After this, you can create tables that use the PBXT engine by setting ENGINE=PBXT in the table declaration as usual. It's that easy!

I think this makes PrimeBase XT a great demo of the ease and power of the new pluggable storage engine API for MySQL 5.1.

Wednesday, September 06, 2006

PBXT Presentation at the Hamburg MySQL September Meetup

My thanks to Lenz Grimmer for the opportunity to present PrimeBase XT at the Hamburg MySQL September Meetup last Monday night. The meeting took place in a well equipped conference room belonging to the Chinese restaurant "Ni Hao" in Wandsbek. With over 20 attendees the venue was filled to capacity.

I must admit, I was a bit surprised (although I probably should not have been) by the highly technical nature of the questions and comments during the presentation and afterwards. As a result we had great time and spent quite a while discussing the various algorithms and implementation strategies used by XT.

I have posted the presentation as PDF on the PrimeBase XT home page. Among the topics of the presentation were: our motivation, a brief history of XT, future plans, design aspects and the results of some preliminary performance testing.

Of particular interest to many was the garbage collection mechanism. There was some concern as to just how much load the garbage collector would place on the server. When I presented a graphic showing the data throughput rate vs the record size written, someone asked why there was no dropoff in the performance when the garbage collector kicks in.

Indeed, performance is not affected adversely by the garbage collector. This is mainly due to the fact that the garbage collector is a low-priority thread, so it is immediately descheduled when it would affect foreground activity. This is not a problem as long as the system has enough disk space to be able to delay garbage collection until the peak access times are over. This can be contrasted with a conventional RAM garbage collector which is forced to run due to the fact that it manages a relatively scarce resource.

But besides this, the XT garbage collector generally does not require much CPU time. For example, it does not need to search for garbage, this is marked as such by the "sweeper" thread (see my white paper). I plan to discuss this topic in more details in a future blog, so stay tuned...

Monday, August 07, 2006

PBXT on schedule for Beta in September

Its a dirty job, but somebody's got to do it! I am talking about testing and debugging, of course, which is what I have been doing for the last month. But at least I have the satisfaction of knowing that XT is a lot more stable.

Over 95% of the "mysql-test-run" scripts now run through correctly with PBXT as the default engine. I have documented the changes made to the test scripts here: pbxt-test-run-changes.txt. This file also explains the major differences between MyISAM and PBXT, and lists all features are not yet implemented.

If you have compiled XT and would like to run the tests, then enter the following commands:

cd mysql-test ./mysql-test-run --force --mysqld=--default-storage-engine=pbxt

I have also compiled and tested XT on a multi-processor, 64-bit machine. I used sysbench 0.4.7 for multi-threaded, stability and performance testing. If you have downloaded and build sysbench, then you can measure the speed of XT using the following commands:

To create a test table containing 1000000 rows:

sysbench --test=oltp --oltp-table-size=1000000 --mysql-table-engine=pbxt --mysql-engine-trx=yes --mysql-user=root --mysql-socket=/tmp/mysql.sock prepare

Then, to perform a test with READ-ONLY queries:

sysbench --test=oltp --oltp-table-size=1000000 --init-rng=on --mysql-engine-trx=yes --oltp-table-name=sbtest --oltp-test-mode=complex --oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-skip-trx=on --oltp-read-only=on --max-requests=0 --num-threads=4 --oltp-dist-type=special --init-rng=on --mysql-db=sbtest --max-time=60 --mysql-user=root --mysql-socket=/tmp/mysql.sock run

But more interesting is a test with update transactions, as follows:

sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --oltp-test-mode=complex --num-threads=2 --mysql-engine-trx=yes --init-rng=on --oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --mysql-user=root --mysql-socket=/tmp/mysql.sock --oltp-table-name=sbtest --oltp-dist-type=special --max-requests=50000 run

In this test, each transaction performs 3 UPDATEs, a DELETE and an INSERT. According to my tests, SELECT speed is similar to InnoDB, but the update queries are significantly faster. It would be great if someone could confirm these results.

Judging by the stability of the current version, it looks like I will be able to achieve my goal of a Beta release next month. As I mentioned before, the only feature still to be completed for the Beta version is a pluggable storage engine for MySQL 5.1.

Tuesday, July 04, 2006

New PBXT Release 0.9.5: Smaller, Faster, Better...

The new release of PBXT is, in fact, smaller than the previous version. This is due to a major re-structuring of the code. The result is a more compact and direct implementation. With less code to execute and some major changes this version is also a lot faster.

The most significant change is that all files (except the transaction logs) are now associated with a particular table. All table related files begin with the name of the table, and can be easily identified. If a database is not being accessed you can copy a table by copying the associated files, or drop a table by deleting the files. However, the main reason for this change was to improve the speed of sequentially reading and writing.

Records are now divided into a fixed and a variable length component. The fixed length component is stored together with the handle in the record data file. The variable component is referenced by the handle and is stored in one of a number of data log files associated with the table.

This new format reduces the number of disk accesses needed to read, create and update a row significantly. Furthers details of changes in this version can be found in the release notes.

In this release I have also resolved almost all of the issues on my to-do list. This includes the items regarding auto-increment, caching, unique key checking and all the bugs reported so far.

The only feature still due for the Beta release in September is the 5.1 implementation which I will be working on next.

I would much appreciate any feedback, bug reports and performance test results!

Thursday, June 08, 2006

PBXT featured on the MySQL home page

Last week I spoke to Lenz Grimmer, MySQL Community Relations Manager for Europe, ME and Africa, about the Users Conference, PBXT and the Community.

The interview has been featured under Articles on the MySQL home page.

For those of you that missed it, here is the link to the interview again:

Interview with Paul McCullagh, developer of the PrimeBase XT Storage Engine.