Friday, June 11, 2010

An Overview of PBXT Versions

If you follow PBXT development you may have noticed a number of different versions of the engine have been mentioned in various talks and blogs.

There is actually a consistent strategy behind all this, which I would like to explain here.

PBXT 1.0 - Current: 1.0.11-3 Pre-GA

Launchpad: lp:pbxt

This is the current PBXT production release. It is stable in all tests and environments in which it is currently in use.

The 1.0.11 version of the engine is available in MariaDB 5.1.47.

PBXT 1.1 - Stability: RC

Launchpad: lp:pbxt/1.1

PBXT 1.1 implements memory resident (MR) tables. These tables can be used for fast, concurrent access to non-persistent data.

1.1 also adds parallel checkpointing. To do this, PBXT starts multiple threads to flush several tables at once during a checkpoint.

This version is feature complete. Unless someone is interested in using MR tables in production, my plan is to leave 1.1 at the RC level and concentrate development on PBXT 1.5 and 2.0.

PBXT 1.1 is part of Drizzle.

PBXT 1.5 - Current: 1.5.01 Beta

Launchpad: lp:pbxt/1.5

PBXT 1.5 changes how the data logs are written, which makes the engine much faster, depending on the database schema.

Previously each user thread wrote its own data log. In version 1.5 the data logs are written the same way the transaction log is written. This means that group commit is implemented for the data logs.

I have also added a data log cache which can be help significantly if your data has hot spots.

The log-based architecture of PBXT makes it possible to write Terabytes of data without degrading performance. But, as the amount of data increases, garbage collection and random read speed can become a problem. I am currently focusing on solving these problems in 1.5.

PBXT 2.0 - Stability: Alpha

Launchpad: lp:pbxt/2.0

The major feature in PBXT 2.0 is engine level replication (ELR). This is an extremely efficient form of replication, while being fully transactional and reliable.

ELR works by transferring changes directly from the the PBXT transaction and data logs to the PBXT engine on the slave. This means the binary log does not need to be written or flushed, which can greatly increase the speed of the master server (up to 10x in some tests).

Currently the replication does not handle database schema changes, but it works and is ready for testing.

Setting Priorities

PBXT is a free, open source project which is largely funded by a big name database company.

Nevertheless, I am not bound as to how I set priorities, which means I usually focus on what is important to those using and testing the engine.

Now that you have an overview of what's happening in the PBXT world, let me know if you have a problem that PBXT might fix. I'd be happy to hear from you... :)

9 comments:

Mark Robson said...

Are all these versions of the engines compatible with main-line MySQL versions as well as Drizzle and MariaDB?

Paul McCullagh said...

Hi Mark,

All versions are compatible with the most recent versions of MySQL 5.1 and MariaDB.

Drizzle is a special case. It requires significant changes to the engine interface, so currently only version 1.1 is available for Drizzle.

PBXT 1.0 is shipped as part of the MariaDB tree, so you would have to disable or compile MariaDB without PBXT in order to load another version of PBXT as a plug-in.

This is identical to the way the InnoDB plug-in is run by MySQL.

If you want to try various versions of PBXT then I recommend you build the latest version of MySQL, and build PBXT outside of the tree.

Then you can very quickly switch between PBXT versions. How to do this is described here.

Mark Callaghan said...

Which version can co-exist with InnoDB in MySQL 5.1? bug 47134 is getting in my way.

More people need to express interest in getting a fix for it.

Paul McCullagh said...

I agree Mark!

It may even be easy to fix, I haven't had a look yet.

If we have a patch we could get it into MariaDB very quickly.

Andy said...

Hi Paul,

My vote is for data compression.

Making the entire data set fits in memory (or even in SSD) often provides the biggest boost to performance. Data compression would be very helpful for that.

Andy said...

Paul,

You mentioned that PBXT 1.5 is much faster depending on the schema.

What type of schema would make it fast, and what type of schema would not?

Also how much speedup do you expect?

Paul McCullagh said...

Hi Andy,

Data compression could be done fairly easily on the data logs, so this would be the first thing to try.

1.5 is faster if your database schema contains tables with variable length rows (fixed length rows do not use the data logs).

If there is heavy UPDATE activity on these tables, then my tests show that 1.5 can be up to 5 times faster than 1.1.

NMMM.NU said...

Hi,
Where I can report bugs? I have one - http://nmmmpic.blogspot.com/2010/08/strange-pbxt-issue.html and I can provide the db binary files.

Paul McCullagh said...

Hi,

Please report bugs at https://bugs.launchpad.net/pbxt. Thanks!