tag:blogger.com,1999:blog-243594212024-03-07T10:47:34.389+01:00PrimeBase XTPrimeBase XT (PBXT) is a transactional storage engine for MySQL which can be loaded dynamically by the pluggable storage engine API of MySQL 5.1. It has been designed for modern, web-based, high concurrency environments. Full MVCC (multi-version concurrency control) support and a unique "write-once" strategy make PBXT particularly effective under heavy update loads.Unknownnoreply@blogger.comBlogger79125tag:blogger.com,1999:blog-24359421.post-14766845408277683192011-09-30T14:53:00.003+02:002011-09-30T19:16:27.323+02:00What's happened to the MySQL Storage Engine Vendor Advisory Board?As most of you know, the Engine Vendor Advisory Board was setup by Oracle under terms that Oracle specified for themselves when acquiring MySQL. I am referring to point number 8, on the <a href="http://www.dbms2.com/2009/12/14/oracle-mysql-storage-engine/">10 point list</a> that played a major role in quieting nervous Bureaucrats in Europe.<br /><br /><a href="http://primebase.org/">PrimeBase Technologies</a> is a member of the Board and so far we have heard nothing of a meeting this year.<br /><br />Has the Board been quietly disbanded?<br /><br />If so, what does this mean for the other promises on Oracles list...<br /><br /><span style="font-style: italic;">UPDATE: We are in contact with Oracle concerning this. I will keep you all posted. Hopefully it was just a communications error.</span>Unknownnoreply@blogger.com11tag:blogger.com,1999:blog-24359421.post-27718688518513619432011-04-12T17:58:00.000+02:002011-04-12T20:58:08.855+02:00PBXT "Secrets" at the MySQL ConferenceIn my presentation tomorrow at the MySQL Conference I plan to talk about some aspects of PBXT that I have never spoken about before. Here are the details of the presentation:<br /><br /><div style="text-align: center;"><span style="font-style: italic; font-weight: bold;">Update on the PBXT Storage Engine</span><br />10:50am Wednesday, 04/13/2011<br />Location: Ballroom D<br /><br /><div style="text-align: left;">Of course nothing about the engine is really a secret, if you are prepared to read the code. But who does that right? I am pretty sure that not even developers of other engines have spent much time (if any) on that.<br /><br />But really, there are some gems stuck away in those X 1000 lines of code, and I plan to pick out a few tomorrow and show them to you. So don't miss it! :)<br /><br /></div></div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-24359421.post-52067032033806148982010-12-17T22:02:00.000+01:002010-12-17T10:03:18.860+01:00HandlerSocket: Why did our version not take off?There is quite a buzz about <a href="http://www.mysqlperformanceblog.com/2010/12/14/percona-server-now-both-sql-and-nosql/">HandlerSocket</a> built into the latest Percona Server. I agree with Henrik that this is a brilliant idea that is going to <a href="http://openlife.cc/blogs/2010/december/handlersocket-nosql-innodb-added-percona-server-ps-mysql-55-ga-out">go very far</a>!<br /><br />But I did the same thing 2.5 years ago with the BLOB Streaming Engine. In <a href="http://pbxt.blogspot.com/2007/06/first-release-of-blob-streaming-engine.html">this blog</a> I explain how you can retrieve data out of the database using the BLOB Streaming Engine and a simple URL of the form:<tt> http://mysql-host-name:8080/database/table/blob-column/condition</tt><br /><br />Where <tt>condition</tt> has the form: <tt>column1=value1&column2=value2&...</tt><br /><br />Now I have to ask myself the question: <span style="font-weight: bold;">why did we not manage to generate more enthusiasm for the idea?</span><br /><br />Many agree that we can learn more from failure than success, so here is my list of top reasons for this particular failure:<br /><ol><li><span style="font-weight: bold;">Every idea has its time. </span>In the last 2 years the awareness of NoSQL solutions has grown a lot, making RESTful and non-transactional storage and retrieval much better known and generally acceptable.<br /><br /></li><li><span style="font-weight: bold;">We had no platform on which to launch the idea</span>. Without a server distribution a plug-in does not have a chance of real exposure (this was not obvious back when we started making plug-ins). Percona Server and MariaDB now present such a platform. This is great for the whole community, so support them! :)<br /><br /></li><li><span style="font-weight: bold;">Our software had not been proven in production</span>. And this is one reason why building software based on an idea, instead of an actual project requirement is quite likely to fail.<br /><br /></li><li><span style="font-weight: bold;">We did it with PBXT and not a the main stream storage engine</span> which everyone is already using. The really exciting thing about HandlerSocket is that you can use it to grab data in your existing database. This will allow it to spread like wild fire in a dry forest.<br /><br /></li><li>It is obvious to me that <span style="font-weight: bold;">we at PrimeBase have a marketing problem</span>! We have no clue how to get a message across to the public. It is really quite sad, and great technology like PBXT engine-level replication and BLOB streaming may die because of this. The following points also show lack of marketing skills - so next time you see him, hug your marketing guy! ;)<br /><br /></li><li>By using the name BLOB Streaming Engine, <span style="font-weight: bold;">we did not make it clear that this works for all kinds data</span>, not just BLOBs. (OK, and MyBS no was a terrible name - PBMS not much better - but "HandlerSocket" will prove that it has nothing to do with the name!)<br /><br /></li><li><span style="font-weight: bold;">We did not show benchmarks</span>. For me it was obvious that retrieval would be significantly faster if it did not have to go through the SQL interface. Besides, as a developer I know you can easily manipulate benchmark results, so I reluctant to present them (although I do) for my own software.<br /></li></ol>For me, as a developer, it is very important that my software gets used. This is why I can understand why there is open source, and why we give away software for free.<br /><br />But to developers it is not always obvious that giving it away for free does not automatically mean it will get used. So to my hacking compatriots: I hope this list will help you to do things better!<br /><br />P.S. Congratulations Oracle on release of 5.5!Unknownnoreply@blogger.com10tag:blogger.com,1999:blog-24359421.post-59452221261673509562010-11-18T10:25:00.002+01:002010-11-18T10:42:20.431+01:00My Presentation at the DOAG 2010Yesterday I presented <span style="font-style: italic;">PBXT: A Transactional Storage Engine for MySQL</span> at the German Oracle User Group Conference (DOAG) in Nuremberg. A number of people asked for the slides, so here is the <a href="http://primebase.org/download/doag-pbxt-2010.pdf">link</a>.<br /><br />The talk was scheduled to be in English, but since I had a German-only audience I presented in German. There was quite a bit of interest, particularly in the Engine Level replication built into PBXT 2.0.<br /><br />As Ronny observed, this feature can be used effectively for many tasks, including for online backup and maintaining a hot-standby. This all with the addition of a "small" feature:<br /><br />The Master could initially stream the entire database over to the Slave before actual replication begins. This would also make it extremely easy to setup replication.<br /><br />A brilliant idea, but a good 3 months work...Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-24359421.post-86885317995867182010-07-16T15:48:00.000+02:002010-07-16T15:48:50.395+02:00PBXT 1.5.02 Beta adds 2nd Level CacheAs many probably already know, PBXT is the first MySQL Storage Engine to use a log-based architecture. Log-based means that data that would normally first be written to the transaction log, and then to the database tables, is just written to the log, and the log becomes part of the database.<br /><br />This result is that data is only written once, and is always written sequentially. The advantage when writing is obvious, but there is a down side (as always). The data is written to the disk in write order, which is seldom the order in which the data is retrieved. So this results in a lot of random reads to the disk when accessing the data later.<br /><br />Placing the data logs on a Solid State Drive would solve this problem, because SSDs have no seek time. But the problem with this solution is that SSDs are still way to expense to base all your storage needs on such hardware.<br /><br />The solution: <span>an <span style="font-style: italic;">SSD-based 2nd Level Cache</span></span>.<br /><br />Using an SSD-based 2nd Level Cache you can store the most commonly accessed parts of your database on SSD for a reasonable price. For example, if you have a Terabyte database, you can cache about 15% (160 GB) of it on SSD for around $400. This can significantly affect the performance of your system.<br /><br />With this thought in mind, I have just released PBXT 1.5.02 Beta, which implements a 2nd level cache for the data logs. How this works is illustrated below.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.primebase.org/images/2nd-level-cache.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 521px; height: 351px;" src="http://www.primebase.org/images/2nd-level-cache.jpg" alt="" border="0" /></a>Data written to the data log is also written to the, main memory based, Data Log Cache. Once the Data Log Cache is full, pages need to be freed up when new data arrives. Pages that are freed from the Data Log Cache are written to the 2nd Level Cache.<br /><br />Now, when the Data Log records are read, PBXT will read the corresponding page from the Data Log Cache. If the page is not already in the cache, it will first check to see if the page is in the 2nd Level Cache, before reading from the Data Log itself.<br /><br />PBXT 1.5 is available for download from <a href="http://primebase.org/">primebase.org</a>, or you can check out <tt>lp:pbxt/1.5</tt> from <a href="https://launchpad.net/pbxt">Launchpad</a> using <a href="http://bazaar.canonical.com/">bazaar</a>. The <a href="http://primebase.org/documentation">documentation</a> has also been updated for 1.5.<br /><br />Using the 2nd level cache is easy. It is controlled by 3 system variables:<br /><ul><li><tt style="font-weight: bold;">pbxt_dlog_lev2_cache_file</tt> - the name and path of the file in which the data is stored.<br /></li><li><tt style="font-weight: bold;">pbxt_dlog_lev2_cache_size</tt> - the size of the 2nd level cache.</li><li><tt style="font-weight: bold;">pbxt_dlog_lev2_cache_enabled</tt> - set to 1 to enable the 2nd level cache.</li></ul>It also makes sense to set a higher value for the Data Log Cache, using the <tt style="font-weight: bold;">pbxt_data_log_cache_size</tt> variable, which has a default value of 16MB.<br /><br />Of course it will be interesting to do some benchmarks on this implementation. But that will have to wait until after my holiday! I will be away until late August, but if you decide to test the new version, be sure to let me know.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-24359421.post-72440998499738537562010-07-02T11:45:00.004+02:002010-07-02T11:52:05.992+02:00MySQL Track at the DOAG 2010 ConferenceThe <a href="http://www.doag.org/en/konferenz/doag/2010/i_text">DOAG 2010 Conference + Exhibition</a> is to be held from the 16th to 18th November in Nuremberg this year. DOAG stands for "Deutsche ORACLE-Anwendergruppe", in English: the German Oracle User's Group.<br /><br />We will be adding a MySQL track to the conference this year, much like Ronald and Sheeri did for the <a href="http://www.odtugkaleidoscope.com/">ODTUG Kaleidoscope 2010</a>. Volker Oboda (of <a href="http://primebase.org/">PrimeBase Technologies</a>) is organizing the track and I will be helping to review the submissions. More information is available in German on <a href="http://oboda.com/">Volker's MySQL Blog</a>.<br /><br />So, if you are planning to be in the area, please consider <a href="https://mydoag.doag.org/termine/cfp_nachmeldung.php">submitting a talk</a>. The deadline for submissions was the 30 June, but has been extended until 10 July for the MySQL track. Talks in English are welcome!<br /><br />We are looking forward to playing an active part in the German speaking Oracle community. Just the size is something to wonder about. The DOAG Conference draws over 2500 participants, which is larger than the MySQL Conference (but maybe not for long!).Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-24359421.post-17499529699598041742010-06-11T11:04:00.001+02:002010-06-11T11:05:07.852+02:00An Overview of PBXT VersionsIf you follow PBXT development you may have noticed a number of different versions of the engine have been mentioned in various talks and blogs.<br /><br />There is actually a consistent strategy behind all this, which I would like to explain here.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">PBXT 1.0 - Current: 1.0.11-3 Pre-GA</span></span><br /><br /><span style="font-weight: bold;">Launchpad: <a href="https://launchpad.net/pbxt/trunk">lp:pbxt</a></span><br /><br />This is the current PBXT production release. It is stable in all tests and environments in which it is currently in use.<br /><br />The 1.0.11 version of the engine is available in <a href="http://askmonty.org/wiki/MariaDB:Download">MariaDB 5.1.47</a>.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">PBXT 1.1 - Stability</span></span><span style="font-size:130%;"><span style="font-weight: bold;">: RC</span></span><br /><br /><span style="font-weight: bold;">Launchpad: <a href="https://launchpad.net/pbxt/1.1">lp:pbxt/1.1</a></span><br /><br />PBXT 1.1 implements <span style="font-weight: bold;">memory resident (MR) tables</span>. These tables can be used for fast, concurrent access to non-persistent data.<br /><br />1.1 also adds <span style="font-weight: bold;">parallel checkpointing</span>. To do this, PBXT starts multiple threads to flush several tables at once during a checkpoint.<br /><br />This version is feature complete. Unless someone is interested in using MR tables in production, my plan is to leave 1.1 at the RC level and concentrate development on PBXT 1.5 and 2.0.<br /><br />PBXT 1.1 is part of <a href="https://launchpad.net/drizzle">Drizzle</a>.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">PBXT 1.5 - </span></span><span style="font-size:130%;"><span style="font-weight: bold;">Current: 1.5.01 </span></span><span style="font-size:130%;"><span style="font-weight: bold;">Beta</span></span><br /><br /><span style="font-weight: bold;">Launchpad: <a href="https://launchpad.net/pbxt/1.5">lp:pbxt/1.5</a></span><br /><br />PBXT 1.5 changes how the data logs are written, which makes the engine much faster, depending on the database schema.<br /><br />Previously each user thread wrote its own data log. In version 1.5 the data logs are written the same way the transaction log is written. This means that <span style="font-weight: bold;">group commit is implemented for the data logs</span>.<br /><br />I have also added a <span style="font-weight: bold;">data log cache</span> which can be help significantly if your data has hot spots.<br /><br />The log-based architecture of PBXT makes it possible to write Terabytes of data without degrading performance. But, as the amount of data increases, garbage collection and random read speed can become a problem. I am currently focusing on solving these problems in 1.5.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">PBXT 2.0 - Stability</span></span><span style="font-size:130%;"><span style="font-weight: bold;">: Alpha</span></span><br /><br /><span style="font-weight: bold;">Launchpad: <a href="https://launchpad.net/pbxt/2.0">lp:pbxt/2.0</a></span><br /><br />The major feature in PBXT 2.0 is <span style="font-weight: bold;">engine level replication (ELR)</span>. This is an extremely efficient form of replication, while being fully transactional and reliable.<br /><br />ELR works by transferring changes directly from the the PBXT transaction and data logs to the PBXT engine on the slave. This means the binary log does not need to be written or flushed, which can greatly increase the speed of the master server (up to 10x in some tests).<br /><br />Currently the replication does not handle database schema changes, but it works and is ready for testing.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">Setting Priorities</span></span><br /><br />PBXT is a free, open source project which is largely funded by a <span style="font-style: italic;">big name</span> database company.<br /><br />Nevertheless, I am not bound as to how I set priorities, which means I usually focus on what is important to those using and testing the engine.<br /><br />Now that you have an overview of what's happening in the PBXT world, let me know if you have a problem that PBXT might fix. I'd be happy to hear from you... :)Unknownnoreply@blogger.com9tag:blogger.com,1999:blog-24359421.post-70294622792494092992010-05-12T17:00:00.000+02:002010-05-12T17:00:21.874+02:00PBXT 1.0.11 Pre-GA Released!I have just released PBXT 1.0.11, which I have titled "Pre-GA". Going by our internal tests, and all instances of PBXT in production and testing by the community this is a GA version!<br /><br />However, although PBXT has 1000's of instances in production, it is not used in very diverse applications. So I am waiting for wider testing and usage before removing the "Pre" prefix.<br /><br />You can download the source code from <a href="http://primebase.org/download">primebase.org</a>, or pull it straight from <a href="https://launchpad.net/pbxt">Launchpad</a>. Here are instructions <a href="http://primebase.org/download/#qg_source">how to compile and build the engine</a> with MySQL. PBXT builds with <a href="http://primebase.org/download/mysql-5.1.46.tar.gz">MySQL 5.1.46</a> GA, and earlier 5.1 versions.<br /><br />If you don't want to compile it yourself, PBXT 1.0.11 will soon be available in the <a href="http://askmonty.org/wiki/MariaDB:Download">5.1.46 release of MariaDB</a>. And, for the more adventurous, PBXT 1.1 is included in <a href="https://launchpad.net/drizzle">Drizzle</a>.<br /><br />A complete list of all the changes in this version are in the <a href="http://primebase.org/download/ChangeLog">release notes</a>.<br /><br />If you are testing PBXT and have any questions send me an e-mail. I will be glad to help.<br /><br />And, oh yes. If you are looking for development or production support for MySQL/MariaDB and PBXT then please write to: support-at-primebase-dot-org.<br /><br />We are working together with <a href="http://www.percona.com/">Percona</a> and <a href="http://askmonty.org/wiki/Main_Page">Monty Program Ab</a> to provide the service level you require.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-24359421.post-54708813584658683442010-04-19T12:59:00.000+02:002010-04-19T22:00:00.985+02:00Stuck in the US of AAs far as I know, nobody who was at the <a href="http://en.oreilly.com/mysql2010">MySQL User Conference</a> and lives in Europe has made it back home yet!<br /><br />Please leave a comment on this blog as soon as you get home. I am interested to know...<br /><br />My flight was yesterday, so I have the worst prospects. I am booked on a flight for next week Wednesday (10 days delay)! No joke! :(Unknownnoreply@blogger.com9tag:blogger.com,1999:blog-24359421.post-50275932420046180072010-04-16T14:04:00.000+02:002010-04-16T23:05:31.581+02:00The other Oracle ACE DirectorWhile the choice of <a href="http://ronaldbradford.com/blog/my-acceptance-with-oracle-as-ace-director-2010-04-15">Ronald Bradford</a> and <a href="http://www.pythian.com/news/10643/meet-the-first-oracle-ace-director-in-mysql-sheeri-cabral">Sheeri Cabral</a> were natural for Oracle ACE Director my own nomination was perhaps a bit of a surprise. Well, it was to me anyway.<br /><br />Those of you at the conference may have noticed that I had no (super-cool) ACE Director jacket when I was called up on the stage...<br /><br />Well that was because the jacket was too big, and I had already returned it to Lenz for it to be exchanged.<br /><br />Unfortunately I can't return the shoes because they are too big for me as well...Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-24359421.post-56705869140380920812010-04-14T14:27:00.002+02:002010-04-19T21:09:04.359+02:00Slides of the PBXT PresentationHere are the <a href="http://www.primebase.org/download/pbxt-uc-2010.pdf">slides</a> to <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13314">my talk</a> yesterday: <a href="http://www.primebase.org/download/pbxt-uc-2010.pdf">A Practical Guide to the PBXT Storage Engine</a>.<br /><br />For anyone who missed my talk, I think it is worth going through the slides, because the are fairly self explanatory.<br /><br />If there are any questions, please post them as a comment to the blog. I will be glad to answer :)Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-24359421.post-6880471732094297282010-04-09T11:54:00.000+02:002010-04-09T11:54:20.196+02:00PBXT at the MySQL User Conference 2010At this year's User Conference I have some interesting results to present. But more than anything else, my talk will explain how you can really get the most out of the engine. The design of PBXT makes it flexible, but this provides a lot of options. What tools are available to help you make the right decisions? I will explain.<br /><br />Every design has trade-offs. How does this work out in practice for PBXT? And how can you take advantage of the strengths of the storage engine? I will explain in:<br /><br /><p style="text-align: center;"><span style="font-weight: bold; font-style: italic;font-size:130%;" >A Practical Guide to the PBXT Storage Engine</span><br />Paul McCullagh<br /><a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13314">2:00pm - 3:00pm Tuesday, 04/13/2010</a><br />Ballroom E</p><p style="text-align: left;">Don't miss it! :)</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-24359421.post-30681566686041088352010-03-17T17:02:00.000+01:002010-03-17T17:03:05.252+01:00PBXT Engine Level replication, works!I have been talking about this for a while, now at last I have found the time to get started! Below is a picture from my 2008 MySQL User Conference presentation. It illustrates how engine level replication works, and also shows how this can be ramped up to provide a multi-master HA setup.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://primebase.org/images/engine-replication-2.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 369px; height: 239px;" src="http://primebase.org/images/engine-replication-2.jpg" alt="" border="0" /></a><br />What I now have running is the first phase: asynchronous replication, in a master/slave configuration. The way it works is simple. For every slave in the configuration the master PBXT engine starts a thread which reads the transaction log, and transfers modifications to a thread which applies the changes to PBXT tables on the slave.<br /><br /><span style=";font-family:arial;font-size:130%;" ><span style="font-weight: bold;">Where to get it</span></span><br /><br />I have pushed the changes that do this trick to PBXT 2.0 on <a href="https://launchpad.net/pbxt">Launchpad</a>. The branch to try out is <a href="https://code.launchpad.net/%7Epbxt-core/pbxt/2.0">lp:pbxt/2.0</a>.<br /><br /><span style=";font-family:arial;font-size:130%;" ><span style="font-weight: bold;">Getting started</span></span><br /><br />Setup of the replication is dead easy. Assuming you already have a PBXT database, what you need to do is the following:<br /><br /><span style="font-weight: bold;">1. Copy the Master data: </span>Shutdown the MySQL server and make a complete copy of the data directory.<br /><br /><span style="font-weight: bold;">2. Setup a Slave server: </span><span>Setup</span> a second MySQL server using the copy of the data directory.<br /><br /><span style="font-weight: bold;">3. Declare the Slave: </span>Create a text file called <tt>slaves</tt>, in the <tt>data/pbxt</tt> directory of the master server, with the following entry:<br /><pre>[slave]<br />name=slave-thread-name<br />host=host-name-of-slave<br />port=37656<br /></pre><tt>slave-process-name</tt> is any name you like, and is used to identify the replication thread running on the master. <tt>host-name-of-slave</tt> is the host name or IP address of the slave MySQL server. <tt>37656</tt> is the default port used by the PBXT slave engine to receive replication changes.<br /><br /><span style="font-weight: bold;">4. Enable replication: </span>On the master server set <tt>pbxt_enable_replication=1</tt>, and on the slave server set <tt>pbxt_enable_replication=2</tt>. Also make sure that both servers have different server IDs (system parameter: <tt>server_id</tt>).<br /><br /><span style="font-weight: bold;">5. Start both servers: </span>Replication will begin immediately if the slave server is started before master server, otherwise replication will begin after a minute (see below).<br /><span style=";font-family:arial;font-size:130%;" ><br /><span style="font-weight: bold;">How it works</span></span><br /><br />PBXT engine level replication, unlike MySQL replication, pushes changes to the slave. For every entry in the <tt>data/pbxt/slaves</tt> file, PBXT starts a thread (the <span style="font-style: italic;">supplier</span> thread). The thread connects to the slave on the given address, and pushes the changes to an <span style="font-style: italic;">applier</span> thread run by the PBXT engine on the slave side. If any error occurs, the supplier thread on the master will pause, and then try again in a minute.<br /><br />On connect the supplier thread requests the global transaction ID (GID) of the last transaction committed on the slave. The applier determines the GID of the last transaction by searching backwards through its own transaction logs.<br /><br />Replication is row-based, and fairly low level. Changes refer to the PBXT internal row and table IDs. The row data is transferred in the same format used to store the information on disk. This makes the replication extremely efficient. The supplier thread does not even have to read the log from disk if it is fairly up-to-date, because PBXT already caches the last changes to the transaction log for use by the writer and the sweeper threads.<br /><br />Probably the most important thing about this type of replication is that it (theoretically) has almost no affect on the "foreground" activity on the master machine. I am interested to find out if this really is the case.<br /><br /><span style=";font-family:arial;font-size:130%;" ><span style="font-weight: bold;">What's next?</span></span><br /><br />Replication of DDL changes are not implemented yet. So if you do <tt>ALTER TABLE</tt> or any other such operation, replication will stop, and have to be restarted by copying over the data directory to the slave again.<br /><br />After DDL changes the next step is to add synchronous replication, as illustrated above. This requires waiting for a commit from the slave before continuing. Latency in this case can be kept to a minimum by sending transactions to the slave before they have been committed on the master.<br /><br />I believe this would then provide the basis for an extremely simple (and efficient) HA solution based on MySQL.Unknownnoreply@blogger.com9tag:blogger.com,1999:blog-24359421.post-68088889128419729942010-02-26T16:49:00.000+01:002010-02-26T16:49:27.681+01:00Embedded PBXT is CoolMartin Scholl (<a href="http://twitter.com/zeit_geist">@zeit_geist</a>) has started a new project based on the PBXT storage engine: EPBXT - Embedded PBXT! In his first blog he describes how you can easily build the latest version: <a href="http://blog.erlang.de/building-embedded-pbxt-from-bzr?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+StillDontUnderstandAllThatIsTheCase+%28Still+don%27t+understand+all+that+is+the+case.%29">Building Embedded PBXT from bzr</a>.<br /><br />The interesting thing about this project is that it exposes the "raw" power of the engine. Some basic performance tests show this really is the case.<br /><br />At the lowest level, PBXT does not impose any format on the data stored in tables and indexes. When running as a MySQL storage engine it uses the MySQL native row and index formats. Theoretically it would be possible to expose this in an embedded API. The work Martin is doing goes in at this level. The wrapper around the engine determines the data types, data sizes, row and index format. Comparison operations for the data types are also supplied by the embedded code or user program.<br /><br />This flexibility will make it possible for an application to store its own data very efficiently. As Martin suggested, it would also be possible to use <a href="http://code.google.com/p/protobuf">Google's protobuf's</a> for the row format. This would eliminate the need to use an ALTER TABLE for many types of changes to a table's definition!<br /><br />Of course, EPBXT is still a way from realizing this vision, and Martin has some very specific problems he wants to solve with the development. However, judging by his command of the code within such a short time, this is going to be a project to watch in the future!Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-24359421.post-74919220188320342992010-02-08T12:34:00.000+01:002010-02-08T12:34:44.068+01:00Ken we will miss you!What does it take for someone, fiercely loyal to a company to suddenly leave? Ken Jakobs, Oracle employee number 18, a man that sincerely loves the company, <a href="http://news.cnet.com/8301-13505_3-10448783-16.html">has resigned</a>! The only reason I can think of is an extreme snub!<br /><br />I must say, I am very disappointed. The prospect of Ken running MySQL was a light at the end of the tunnel for the community. Why? Because Ken is a MySQL insider! He knows the project, he knows the community.<br /><br />As an engine developer I have come to know Ken well over the last 4 years. He lead the InnoDB team and is largely responsible for the improvements made to the engine since the Oracle acquisition. At the yearly Engine Summit he was always professional and constructive in his suggestions, with a deep technical knowledge of the subject. His track record shows that he has always kept his word with regard to Oracle's intensions with InnoDB, and I would trust him to do the same with MySQL.<br /><br />Goodbye Ken. This is great loss for both the MySQL community and Oracle!Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-24359421.post-18270269099922099532009-12-31T17:41:00.000+01:002009-12-31T17:41:19.395+01:00PBXT 1.0.10, New Year Release!I have just released PBXT 1.0.10 RC4. The sources can be downloaded from <a href="http://primebase.org/download">primebase.org</a>, or from <a href="https://launchpad.net/pbxt">Launchpad</a>.<br /><br />The major feature in this release is the implementation of the <tt style="font-weight: bold;">pbxt_flush_log_at_trx_commit</tt> system variable. Similar to the InnoDB equivalent, this variable allows you to determine the level of durability of transactions.<br /><br />This is a trade-off: by decreasing durability, the speed of database update operations can be increased.<br /><br />The default <span style="font-weight: bold;">setting is 1, which means full durability</span>: the transaction log is flushed on every transaction commit.<br /><br />Setting the <span style="font-weight: bold;">variable to 2 reduces durability</span>, by just writing the log on transaction commit (no flush is done). In this case, transactions can only be lost if the entire server machine goes down (for example a power failure).<br /><br />The <span style="font-weight: bold;">lowest level of durability is 0</span>. In this case the transaction log is not written on transaction commit. Transactions can be lost if the server crashes.<br /><br />In the case of 2 and 0, the engine flushes the transaction log at least once per second. So only transactions executed within the last second can be lost.<br /><br />Ironically, PBXT started life as a "partially durable" storage engine (level 2 according to the description above). Almost exactly 2 years ago I started the implementation of full durability. It has taken a while to build in the original "feature" :)<br /><br />The main reason for doing this has been the Mac version, and our work with <a href="http://teamdrive.net/">TeamDrive</a>. On the Mac the <tt>fsync()</tt> operations is a <span style="font-style: italic;">fake</span>. To do a true flush to disk you have to call <tt>fcntl(of->of_filedes, F_FULLFSYNC, 0)</tt>. Problem is, the real flush is incredibly slow (about 20 times slower than <tt>fsync</tt>), but necessary to avoid any corruption.<br /><br />The advantage of a lot of applications like TeamDrive is that they can tolerate a lower level of durability. So we can look forward to an even speedier TeamDrive in the future :)<br /><br />I would love to hear from anyone testing the new version. Bugs can be <a href="https://bugs.launchpad.net/pbxt">reported on Launchpad</a>, as usual.<br /><br /><span style="font-weight: bold;">Happy New Year to you all!</span>Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-24359421.post-30466477581145535302009-12-14T14:58:00.001+01:002009-12-14T14:58:52.564+01:00Monty's appeal is selfless!What many people don't get is that Monty's appeal to the MySQL community to <a href="http://monty-says.blogspot.com/2009/12/help-saving-mysql.html">help save MySQL</a> is really quite selfless.<br /><br />The fact is, Monty's own company, <a href="http://askmonty.org">Monty Program Ab</a>, stands to benefit the most from <span style="font-weight: bold;">bad stewardship of MySQL by Oracle</span>.<br /><br />If Oracle slows and closes up development, rejects community contributions and creates a commercial version of MySQL, then Monty Program's <a href="http://askmonty.org/wiki/index.php/MariaDB">MariaDB</a> fork will become very popular, very quickly.<br /><br />Which would translate into income for Monty Program Ab as customers come to his company for additions, features and bug fixes that they need to secure there own production.<br /><br />What Monty is concerned about is the commercial vendors of MySQL (one of which Monty Program is not).<br /><br />These vendors either:<br /><ul><li>OEM MySQL and integrate it into a commercial software or hardware product, or</li><li>they produce a closed source (or dual-license) storage engine, which is sold with a commercial version of MySQL.</li></ul>Oracle could kill both businesses, and this is Monty's main concern. As Monty explained in a phone call this morning: he sees the existence of commercial/dual-license vendors of MySQL as very important to the long-term survival of "his baby".<br /><br />Of course Oracle cannot prevent 3rd parties from continuing to offer consulting, support and training for MySQL. But close sourcing and vigorous enforcement of trademarks can make things very difficult for such companies.<br /><br />Unfortunately Oracles <a href="http://money.cnn.com/news/newsfeeds/articles/marketwire/0568514.htm">latest concessions</a> may not be enough to satisfy investors in MySQL based technology either, because there is no guarantee of what happens after 5 years.Unknownnoreply@blogger.com15tag:blogger.com,1999:blog-24359421.post-14472740983319323802009-11-11T13:10:00.000+01:002009-11-11T13:10:41.058+01:00The EU's real problem: MySQL and Oracle do not compete!I think that most people are missing the point, Oracle included. The main objection of the EU is <span style="font-weight: bold;">not</span> that Oracle is swallowing up a major competitor.<br /><br />To understand this you have to read between the lines of the EU decision:<br /><br /><span style="font-style: italic;">"The regulators see a major conflict of interest in the world's largest commercial database company owning its largest open-source competitor"<br /></span><br />This should actually read: "<span style="font-style: italic;">the world's largest commercial database company owning the largest open-source </span><span style="font-style: italic;">database</span>"<br /><br />The database market is divided into 2 parts: <span style="font-weight: bold;">the back-office</span> and <span style="font-weight: bold;">the online </span>world.<br /><br />And now you know what I am going to say ... <span style="font-weight: bold;">Oracle has an near monopoly in back-office</span> and <span style="font-weight: bold;">MySQL has a monopoly in online</span> applications.<br /><br />So let's do a little maths:<br /><br />If we assume that back-office and online applications divide the database market into 2 equal parts, and that <span style="font-weight: bold;">Oracle owns 60% of the back-office</span>, and <span style="font-weight: bold;">MySQL 90% of the online </span>world.<br /><br />This means that <span style="font-weight: bold;">Oracle controls 30%</span> (60% of 50%) <span style="font-weight: bold;">of the entire database market today</span>, but <span style="font-weight: bold;">after the acquisition this number will be 75%</span> (30% + 90% of 50%).<br /><br />Something to think about.Unknownnoreply@blogger.com8tag:blogger.com,1999:blog-24359421.post-10554758822468143752009-09-18T12:16:00.001+02:002009-09-21T09:57:34.554+02:00The mysterious Storage Engine Independent Test SuiteRecently <a href="http://www.facebook.com/note.php?note_id=140021445932">Mark observed</a> that we now all need a storage engine independent test suite, <span style="font-weight: bold;">Sun included</span>! Well, as far as I know, there is such a thing at Sun, sort of. Apparently it has been used to test PBXT and other engines, but I've heard it is not in good enough shape to be released.<br /><br /><span style="font-weight: bold;">But my question is, why not release it anyway?</span> We could turn it into an engine community project. I believe there are enough engine developers out there to get this moving forward.<br /><br />The secret is to start small, and just get a few tests to run with all engines. Then additional tests can be added step by step. Engines need a way to specify that they want to skip a test entirely (e.g. transactional tests), and it should be easy to customize results for various engines.<br /><br />An example of a simple and elegant solution can be found in Drizzle. As <a href="http://www.facebook.com/mordred">Monty Taylor</a> mentioned in a comment to <a href="http://www.facebook.com/note.php?note_id=140021445932">Marks blog</a>: "We have some patches to test-run in Drizzle to allow running the whole test suite with a specified storage engine".<br /><br />I think it has been long enough. This could be a good opportunity to start a Sun/Community project, something like Drizzle. In other words, get something out there, even if it is incomplete, and let the community also take a large part of the responsibility.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-24359421.post-53664379372672771052009-09-11T15:16:00.000+02:002009-09-11T15:16:40.058+02:00PBXT 1.0.09 RC3 implements XA and online backupI have just released PBXT 1.0.09 RC3. Besides bug fixes (details in the <a href="http://primebase.org/download/ChangeLog">release notes</a>), this version includes 2 Beta features:<br /><ul><li> XA/2-Phase Commit support</li><li> Native online backup Driver</li></ul>XA support has been around MySQL for quite a while, and we all know of it usefulness, for example when sharding. So I was surprised to find a bug in the XA recovery: <a href="http://bugs.mysql.com/bug.php?id=47134">Bug #47134</a>. Contrary to what is reported, the crash can also occur when using XA with just the default engines installed, so watch out for that one (the good news: the bug fix is simple).<br /><br />Online backup is really cool! I have heard that it may soon be released in a coming version of 5.4, so lets hope that this is true.<br /><br />In a little test, I did a backup of a 10GB database in 49.26 seconds! Admitedly this was on a system with 4 15K drives in a RAID 0 configuration. But that is still a fantastic, considering the tables are not even locked during this time!<br /><br />The database itself took 19 min. 56 sec. to generate. A complete restore took only 14 min. 29 sec.<br /><br /><span style="font-weight: bold;">But, it gets even better....</span><br /><br />I have been working on PBXT 1.1, where I have done a number of things to improve the I/O performance of the engine.<br /><br />In the same test as above, run with PBXT 1.1, the time to generate the database was 9 min. 35 sec., and the time to restore was 6 min 18 sec! (Time to generate the backup was identical.)<br /><br />PBXT 1.1 is available directly from Launchpad here: <a href="https://code.launchpad.net/%7Epbxt-core/pbxt/staging">lp:~pbxt-core/pbxt/staging</a>, if you are interested in trying it out. 1.1 also has full support for memory based tables.<br /><br />The new release candidate (PBXT 1.0.09) can be downloaded from <a href="http://primebase.org/download">primebase.org/download</a>. It is also available from Lauchpad as the rc3 series: <a href="https://code.launchpad.net/pbxt/rc3">lp:pbxt/rc3</a>.<br /><br />Please report bugs <a href="https://bugs.launchpad.net/pbxt">here</a>.<br /><br />Any feedback is welcome! You can use Launchpad <a href="https://answers.launchpad.net/pbxt">questions</a> or the PBXT <a href="https://launchpad.net/%7Epbxt-discuss">mailing list</a> for this purpose.Unknownnoreply@blogger.com11tag:blogger.com,1999:blog-24359421.post-60030961172696535432009-08-21T13:44:00.000+02:002009-08-21T13:44:58.830+02:00PBXT at the OpenSQL Camp hosted by the FrOSCon 2009Vladimir will be giving a presentation on PBXT at the FrOSCon 2009 in St. Augustin, near Bonn in Germany tomorrow:<br /><br /><div style="text-align: center; font-style: italic;">PBXT: Technology trends that affect your Database<br />Room: C120/OpenSQLCamp<br />Time: 22 Aug 2009, 18:15 - 18:45<br /></div><br />The talks is packed with interesting information about how the design of PBXT handles the major technological challenges of the future, including multiple cores, lots of RAM and solid state drives.<br /><br />If you are in the area, check it out! :)Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-24359421.post-50988616761879919332009-08-20T09:51:00.000+02:002009-08-20T09:51:43.339+02:00What if MySQL dropped the Dual License?In his blog <a href="http://redmonk.com/sogrady/2009/08/17/does-the-gpl-matter/">Does the GPL Matter? In a Word, Yes</a>, Stephen O'Grady makes the significant point that the dual-licensing model has a major drawback:<br /><br /><span style="font-style: italic;">Sun/MySQL can only include patches and contributions if they fully own the copyright to those changes.</span><br /><br />This gives forks like Drizzle, OurDelta, Percona and MariaDB a major advantage over the Sun version: they can include the best patches from all over. And it is clear that the momentum is building.<br /><br />In a follow-up blog, <a href="http://redmonk.com/sogrady/2009/08/19/does-copyright-matter-or-is-the-end-of-dual-licensing-near/">Stephen asks</a>: "what would the implications be if MySQL, of all projects, were forced to abandon the dual-licensing model it had long championed?"<br /><br />Thinking about this, there is something that really bothers me:<br /><br />Let's assume MySQL took on patches without ownership of the copyright, and thereby lost the ability to provide a commercial license to OEM customers.<br /><br />According to the GPL this would mean that nobody could ever ship a commercial product with MySQL built-in!<br /><br />To avoid this possibility from being lost to the world forever, <span style="font-weight: bold;">surely MySQL </span><span style="font-weight: bold;">would have to abandon the GPL</span>, and maybe change to LGPL or BSD!Unknownnoreply@blogger.com8tag:blogger.com,1999:blog-24359421.post-6459153870652536942009-08-11T21:28:00.000+02:002009-08-11T21:29:16.190+02:00Jeremy's article on PBXT in Linux Magazine<a href="http://jeremy.zawodny.com/blog">Jeremy Zawodny</a> of Craigslist wrote a great article on PBXT for Linux Magazine:<br /><br /><a href="http://www.linux-mag.com/cache/7462/1.html">PBXT: Your Next MySQL Storage Engine?</a><br /><br />Check it out...<br /><br />Thanks Jeremy :)Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-24359421.post-64973172132558215532009-06-30T19:56:00.001+02:002009-06-30T21:30:50.122+02:00PBXT 1.0.08 RC2 Released!The second Release Candidate of PBXT, version 1.0.08, has just been released.<br /><br />As I have mentioned in my previous blogs (<a href="http://pbxt.blogspot.com/2009/03/improving-pbxt-dbt2-performance.html">here</a> and <a href="http://pbxt.blogspot.com/2009/03/solving-pbxt-dbt2-scaling-problem.html">here</a>), I did a lot to improve performance for this version.<br /><br />At the same time I am confident that this release is stable as we now have a large number of tests, including functionality, concurrency and crash recovery. But even more important, the number of users of PBXT has increased significantly since the last RC release, and that is the best test for an engine.<br /><br />So there has never been a better time to try out PBXT! :)<br /><br />You can download the source code, and selected binaries from here: <a href="http://primebase.org/download">primebase.org/download</a>.<br /><br />Vladimir and I have made a lot of changes, for details checkout the <a href="http://primebase.org/download/ChangeLog">release notes</a>.<br /><br />Bugs can be reported on Launchpad, <a href="https://bugs.launchpad.net/pbxt">here</a>.<br /><br />There is also a new PBXT <a href="https://launchpad.net/%7Epbxt-discuss">mailing lis</a><a href="https://launchpad.net/%7Epbxt-discuss">t</a>, so if you have any questions this is the best place for them.<br /><br />PBXT is a high-performance, MVCC-based, transactional storage engine for MySQL. The project is open source (GPL) and hosted on <a href="https://launchpad.net/pbxt">Launchpad</a>. PBXT supports referential integrity, row-level locking and is fully ACID compliant.<br /><br />For more information please go to the PBXT home at: <a href="http://primebase.org/">primebase.org</a>.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-24359421.post-24944591650534247712009-05-13T18:52:00.000+02:002009-05-13T18:52:24.884+02:00At last we have a MySQL Foundation, its called The Open Database AllianceJust over a year ago we registered the domain name <a href="http://www.whois.net/whois/mysqlfoundation.org">mysqlfoundation.org</a> in the hopes that Sun/MySQL will actually create such an entity.<br /><br />My idea was to move the development of the MySQL Community server to the Foundation and make the development fully community orientated. The Foundation would have its own development goals and release schedule. Sun could then pull patches from the Foundation's Community server into the Enterprise server once they had stabilized.<br /><br />I pitched the idea to several people at Sun back then and over the last year, however, for some reason, the foundation concept just proved impossible to push through.<br /><br />I believe this would have been a great opportunity for Sun to take the leadership in the community, as the foundation idea dates back to before things really started <a href="http://developers.slashdot.org/article.pl?sid=09/03/30/228214">splitting up</a>. But Sun's loss is now that of Oracle, who perhaps doesn't care anyway.<br /><br />What is really most important is that we in the community now have an entity that is going to tie our side of things together: <a href="http://www.prweb.com/releases/2009/05/prweb2417854.htm">The Open Database Alliance</a>. For the community it is critical that things do not split up any further and that instead our efforts are bundled. I believe the Alliance can do this for us.<br /><br />So where does that leave Oracle?<br /><br />Well, as I see it, we now have a new, more relevant, community/enterprise split: the Oracle MySQL Enterprise server and the MariaDB Community server.<br /><br />And, I guess I have to stand up and say, for us (<a href="http://primebase.org/">primebase.org</a>) this difference is real and significant.<br /><br /><a href="https://launchpad.net/pbxt">PBXT</a> is already part of most community builds including <a href="https://launchpad.net/maria">MariaDB</a>, <a href="http://ourdelta.org/patches">OurDelta</a> and <a href="http://www.apachefriends.org/en/news.html">XAMPP</a>. But is is not part of the official <a href="http://dev.mysql.com/downloads/mysql/5.1.html">MySQL 5.1 Community Server</a>.<br /><br />Please note, this has nothing to do with my <span style="font-weight: bold;">many great friends</span> at MySQL! They help us in lots of other ways and I am very thankful for this :)<br /><br />But even with the "community" label, any download offered by Sun (now Oracle of course - no change there) is about business! That is very difficult to change, and I accept that.<br /><br />But the community does not need to change anything. It is, what it is.Unknownnoreply@blogger.com7