Over the last few weeks I have been doing some work on improving the concurrency performance of
PBXT. The last Alpha version (1.0.03) has quite a few problems in this area.
Most of the problems have been with r/w lock and mutex contention but, I soon discovered that MySQL has some serious problems of it's own. In fact, I had to remove some of the bottlenecks in MySQL in order to continue the optimization of PBXT.
The result for simple SELECT performance is shown in the graph below.

Here you can see that the gain is over 60% for 32 or more concurrent threads. Both results show the performance with the newly optimized version of PBXT. The test is running on a 2.16 MHz dual core processor, so I expect an even greater improvement on 4 or 8 cores. The query I ran for this test is of the form
SELECT * FROM table WHERE ID = ?.
So what did it do to achieve this? Well first of all, as you will see below, I cheated in some cases. I commented out or avoided some locks that were a bit too complicated to solve properly right now. But in other cases, I used solutions that can actually be taken over, as-is, by MySQL. In particular, the use of spinlocks.
All-in-all though, my intension here is just
to demonstration the potential for concurrency optimization in MySQL.
Optimization 1: LOCK_plugin in plugin_foreach_with_mask()The
LOCK_plugin mutex in
plugin_foreach_with_mask() is the first bottleneck you hit in just about any query. In my tests with 32 threads it takes over 60% of the overall execution time.
In order to get further with my own optimizations, I
commented out the
pthread_mutex_lock() and
pthread_mutex_lock() calls in this function, knowing that the lock is only really needed if plug-ins are installed or uninstalled. However, later I needed to find a better solution (see below).
Optimization 2: LOCK_grant in check_grant()After removing the above bottleneck I hit a wall in
check_grant().
pthread_rwlock_rdlock() was taking 50%, and
pthread_rwlock_unlock() was taking 45.6% CPU time! Once again I commented out the calls
rw_rdlock(&LOCK_grant) and
rw_unlock(&LOCK_grant) in
check_grant() to get around the problem.
In order to really eliminate this lock, MySQL needs to switch to a different type of read/write lock. 99.9% of the time only a read lock is required because a write lock is only required when loading and changing privileges.
For similar purposes, in PBXT, I have invented a special type of read/write lock that requires almost zero time to gain a read lock ... hmmmm ;)
Optimization 3: Mutex in LOCK and UNLOCK tablesI then discovered that 51.7% of the time was taken in
pthread_mutex_lock() called from
thr_lock() called from
open_and_lock_tables().
And, 44.5% of the time was taken in
thread_mutex_lock() called from
thr_unlock() called from
mysql_unlock_tables().
Now this is a tough nut. The locks used here are used all over the place, but I think they can be replaced with a spinlock to good effect (see below). I did not try this though. Instead I used LOCK TABLES in my test code, to avoid the calls to LOCK and UNLOCK tables for every query.
Optimization 4: LOCK_plugin in plugin_unlock_list()Once again the
LOCK_plugin is the bottleneck, this time taking 94.7% of the CPU time in
plugin_unlock_list(). This time I did a bit of work. Instead of commenting it out, I replaced
LOCK_plugin with a spinlock (I copied and adapted the PBXT engine implementation for the server).
This worked to remove the bottleneck because
LOCK_plugin is normally only held for a very short time. However, when a plugin is installed or unstalled this lock will be a killer and some more work probably needs to be done here.
Optimization 5: pthread_setschedparam()I was a bit shocked to find
pthread_setschedparam() was now taking 17% of the CPU time required to execute the
SELECT. This call can be easily avoided by first checking to see if the schedule parameter needs to be changed at all. For the moment, I commented the call out.
Of course, the more optimized the code is, the worse such a call becomes. After all other optimizations
pthread_setschedparam() CPU time increases to 52.6%!
Optimization 6: LOCK_thread_count in dispatch_command()The
LOCK_thread_count mutex in dispatch_command() is next in line with 96.1% of the execution time.
Changing this to a spinlock completely removes the bottleneck.
Optimization 7: LOCK_alarm in thr_end_alarm() and thr_alarm()my_net_read() calls
my_real_read() which calls the functions
thr_end_alarm() and
thr_alarm(). At this point in the optimization these 2 calls required 99.5% of the CPU time between them. Replacing
LOCK_alarm with a spinlock fixed this problem.
Conclusion:Without too much effort it is possible to make a huge improvement to the threading performance of MySQL. The fact that such bottlenecks have not yet been investigated may be due the fact that MySQL currently has no performance analysis team.
Following the last optimization, execution time was divided as follows:
25.8% of the time in
net_end_statement(), which hangs in
net_flush()32.8% of the time in
my_net_read()7.6% in
ha_pbxt::index_read(), this is the time spent in the engine
32.2% in
init_sql_alloc() which waits on the spinlock in
malloc()From this you can see that the optimization is almost optimal because the program is spending almost 60% of its time waiting on the network.
However, it is also clear where the next optimization would come from. Remove the call to
malloc() in
init_sql_alloc() which is called by
open_tables(). This could be done by reusing the block of memory required by the thread, from call to call.
Ultimately, the goal of optimizing for scale like this is to bring the code to the point that it is either network, CPU, or disk bound. Only then will the end-user really see an improvement in performance as the hardware is upgraded.
I think I have shown that it is worth putting some effort into such optimizations. Even more so as multi-core systems become more and more commonplace.