This result is that data is only written once, and is always written sequentially. The advantage when writing is obvious, but there is a down side (as always). The data is written to the disk in write order, which is seldom the order in which the data is retrieved. So this results in a lot of random reads to the disk when accessing the data later.
Placing the data logs on a Solid State Drive would solve this problem, because SSDs have no seek time. But the problem with this solution is that SSDs are still way to expense to base all your storage needs on such hardware.
The solution: an SSD-based 2nd Level Cache.
Using an SSD-based 2nd Level Cache you can store the most commonly accessed parts of your database on SSD for a reasonable price. For example, if you have a Terabyte database, you can cache about 15% (160 GB) of it on SSD for around $400. This can significantly affect the performance of your system.
With this thought in mind, I have just released PBXT 1.5.02 Beta, which implements a 2nd level cache for the data logs. How this works is illustrated below.
Data written to the data log is also written to the, main memory based, Data Log Cache. Once the Data Log Cache is full, pages need to be freed up when new data arrives. Pages that are freed from the Data Log Cache are written to the 2nd Level Cache.
Now, when the Data Log records are read, PBXT will read the corresponding page from the Data Log Cache. If the page is not already in the cache, it will first check to see if the page is in the 2nd Level Cache, before reading from the Data Log itself.
PBXT 1.5 is available for download from primebase.org, or you can check out lp:pbxt/1.5 from Launchpad using bazaar. The documentation has also been updated for 1.5.
Using the 2nd level cache is easy. It is controlled by 3 system variables:
- pbxt_dlog_lev2_cache_file - the name and path of the file in which the data is stored.
- pbxt_dlog_lev2_cache_size - the size of the 2nd level cache.
- pbxt_dlog_lev2_cache_enabled - set to 1 to enable the 2nd level cache.
Of course it will be interesting to do some benchmarks on this implementation. But that will have to wait until after my holiday! I will be away until late August, but if you decide to test the new version, be sure to let me know.
2 comments:
Hey Paul, did you by any chance read this paper[1] about using USB thumb drives as log device? It shows pretty interesting results, unfortunately there doesn't seem to be any code available.
[1] http://www.pittsburgh.intel-research.net/~chensm/papers/FlashLogging-sigmod09.pdf
Hi Paul,
Nice to see this idea implemented.
I had a similar idea for the NDB
storage engine. So will be very
interesting to see how much
performance can benefit from two
levels of caching here.
Post a Comment