Friday, July 16, 2010

PBXT 1.5.02 Beta adds 2nd Level Cache

As many probably already know, PBXT is the first MySQL Storage Engine to use a log-based architecture. Log-based means that data that would normally first be written to the transaction log, and then to the database tables, is just written to the log, and the log becomes part of the database.

This result is that data is only written once, and is always written sequentially. The advantage when writing is obvious, but there is a down side (as always). The data is written to the disk in write order, which is seldom the order in which the data is retrieved. So this results in a lot of random reads to the disk when accessing the data later.

Placing the data logs on a Solid State Drive would solve this problem, because SSDs have no seek time. But the problem with this solution is that SSDs are still way to expense to base all your storage needs on such hardware.

The solution: an SSD-based 2nd Level Cache.

Using an SSD-based 2nd Level Cache you can store the most commonly accessed parts of your database on SSD for a reasonable price. For example, if you have a Terabyte database, you can cache about 15% (160 GB) of it on SSD for around $400. This can significantly affect the performance of your system.

With this thought in mind, I have just released PBXT 1.5.02 Beta, which implements a 2nd level cache for the data logs. How this works is illustrated below.

Data written to the data log is also written to the, main memory based, Data Log Cache. Once the Data Log Cache is full, pages need to be freed up when new data arrives. Pages that are freed from the Data Log Cache are written to the 2nd Level Cache.

Now, when the Data Log records are read, PBXT will read the corresponding page from the Data Log Cache. If the page is not already in the cache, it will first check to see if the page is in the 2nd Level Cache, before reading from the Data Log itself.

PBXT 1.5 is available for download from primebase.org, or you can check out lp:pbxt/1.5 from Launchpad using bazaar. The documentation has also been updated for 1.5.

Using the 2nd level cache is easy. It is controlled by 3 system variables:
  • pbxt_dlog_lev2_cache_file - the name and path of the file in which the data is stored.
  • pbxt_dlog_lev2_cache_size - the size of the 2nd level cache.
  • pbxt_dlog_lev2_cache_enabled - set to 1 to enable the 2nd level cache.
It also makes sense to set a higher value for the Data Log Cache, using the pbxt_data_log_cache_size variable, which has a default value of 16MB.

Of course it will be interesting to do some benchmarks on this implementation. But that will have to wait until after my holiday! I will be away until late August, but if you decide to test the new version, be sure to let me know.

2 comments:

Nils said...

Hey Paul, did you by any chance read this paper[1] about using USB thumb drives as log device? It shows pretty interesting results, unfortunately there doesn't seem to be any code available.

[1] http://www.pittsburgh.intel-research.net/~chensm/papers/FlashLogging-sigmod09.pdf

Mikael Ronstrom said...

Hi Paul,
Nice to see this idea implemented.
I had a similar idea for the NDB
storage engine. So will be very
interesting to see how much
performance can benefit from two
levels of caching here.