Wednesday, March 25, 2009

Solving the PBXT DBT2 Scaling Problem

One little bit of wisdom I would like to pass on:

If a program runs fast with 20 threads, that does not mean it will run fast with 50. And if it runs fast with 50, it does not mean that it will run fast with 100, and if it runs fast with 100 ... don't bet on it running fast with 200 :)

In my last blog I discussed some improvement to the performance of PBXT running the DBT2 benchmark. Despite the overall significant increase in performance I noted a drop off at 32 threads that indicated a scaling problem. For the last couple of weeks I have been working on this problem and I have managed to fix it:

As before, this test was done using MySQL 5.1.30 on an 8 core, 64-bit, Linux machine with an SSD drive and a 5 warehouse DBT2 database. The test is memory bound and does not test the affects of checkpointing.

PBXT Baseline is the code revision indicated as PBXT 1.0.08 in my last blog. PBXT 1.0.07 is the current PBXT GA release version. PBXT 1.0.08 is the latest revision of the PBXT trunk. The baseline graph shows the extent of the scaling problem of the last version.

The latest version is over 20 times faster than PBXT 1.0.07 and 140% faster than the previous version. But most important is the fact that performance remains almost constant as the number of threads increases.

My thanks also to InnoDB which, looking at it positively, offers an excellent measure of how well you are doing! :) It looks like PBXT now actually scales better than InnoDB for this type of test.

So what has changed?

Basically I have made 2 changes, one major and one smaller but significant change. The first change, which got PBXT running faster with 50 threads has to do with conflict handling.

As I mentioned before DBT2 causes a lot of row level conflicts. This is especially the case as the number threads increase. In fact, at any given time during the test with 100 threads (performance results above), 80 of the threads are waiting for row locks. (Of the remaining 20, 4 are waiting for network I/O, and the rest are doing the actual work!)

The result is, if the handling of these conflicts is not optimal the engine looses a lot of time. Which you can clearly see from both the baseline and 1.0.07 results reported above.

To fix this I completely re-wrote the row-level conflict handling. Code paths are now much shorter for row-lock/update detection and handling and threads are now notified directly when they can continue.

The other change I made involved the opening and closing of tables when the MySQL open table cache is too small. This is something that really killed performance starting at about 100 threads. PBXT was doing quite a bit of unnecessary stuff on open and close table, which was fairly easy to move out.

So now that the scaling is good up to 200 threads, should I assume that performance will also be good for 400 threads? Of course it is! Well, at least until I test it... :)


Mark Callaghan said...

What version of InnoDB have you compared to: 5.1 no plugin, 1.0.2 plugin, 1.0.3 plugin?

Kevin Burton said...

That's interesting.... my benchmarks showed that you guys were like 5-15% slower than InnoDB....

Though I suspect that NOW you guys will blow them away with SSD.

I would try to grab an Intel SSD and play with it :)

Paul McCullagh said...

Hi Mark,

The test was done with the built-in version of InnoDB (no plugin).

arjenAU said...

Good show, Paul!
While I don't believe in gearing code just for faster benchmark execution... from your description I think you've identified and resolved genuine bottlenecks in the code that will benefit general use also. Excellent!

Mark Callaghan said...

Are your latest changes in a launchpad branch?

Paul McCullagh said...

Yes, the latest changes/optimizations are in the trunk (lp:pbxt).

Mikiya Okuno said...

Hi Paul,

Could you please tell me the configuration (my.cnf) used upon this benchmark? Or do you have any recommendation? I appreciate your advice!