Friday, June 11, 2010

An Overview of PBXT Versions

If you follow PBXT development you may have noticed a number of different versions of the engine have been mentioned in various talks and blogs.

There is actually a consistent strategy behind all this, which I would like to explain here.

PBXT 1.0 - Current: 1.0.11-3 Pre-GA

Launchpad: lp:pbxt

This is the current PBXT production release. It is stable in all tests and environments in which it is currently in use.

The 1.0.11 version of the engine is available in MariaDB 5.1.47.

PBXT 1.1 - Stability: RC

Launchpad: lp:pbxt/1.1

PBXT 1.1 implements memory resident (MR) tables. These tables can be used for fast, concurrent access to non-persistent data.

1.1 also adds parallel checkpointing. To do this, PBXT starts multiple threads to flush several tables at once during a checkpoint.

This version is feature complete. Unless someone is interested in using MR tables in production, my plan is to leave 1.1 at the RC level and concentrate development on PBXT 1.5 and 2.0.

PBXT 1.1 is part of Drizzle.

PBXT 1.5 - Current: 1.5.01 Beta

Launchpad: lp:pbxt/1.5

PBXT 1.5 changes how the data logs are written, which makes the engine much faster, depending on the database schema.

Previously each user thread wrote its own data log. In version 1.5 the data logs are written the same way the transaction log is written. This means that group commit is implemented for the data logs.

I have also added a data log cache which can be help significantly if your data has hot spots.

The log-based architecture of PBXT makes it possible to write Terabytes of data without degrading performance. But, as the amount of data increases, garbage collection and random read speed can become a problem. I am currently focusing on solving these problems in 1.5.

PBXT 2.0 - Stability: Alpha

Launchpad: lp:pbxt/2.0

The major feature in PBXT 2.0 is engine level replication (ELR). This is an extremely efficient form of replication, while being fully transactional and reliable.

ELR works by transferring changes directly from the the PBXT transaction and data logs to the PBXT engine on the slave. This means the binary log does not need to be written or flushed, which can greatly increase the speed of the master server (up to 10x in some tests).

Currently the replication does not handle database schema changes, but it works and is ready for testing.

Setting Priorities

PBXT is a free, open source project which is largely funded by a big name database company.

Nevertheless, I am not bound as to how I set priorities, which means I usually focus on what is important to those using and testing the engine.

Now that you have an overview of what's happening in the PBXT world, let me know if you have a problem that PBXT might fix. I'd be happy to hear from you... :)