Tuesday, July 17, 2007

The MyBS Engine and the BLOB Repository

After some consideration I have decided to move the BLOB repository from PBXT to MyBS (§). This has the advantage that any engine that does not have its own BLOB repository (or is otherwise not suitable for storing large amounts of BLOB data) can reference BLOBs in the MyBS BLOB repository.

(§) MyBS stands for "BLOB Streaming for MySQL". The BLOB Streaming engine is a new storage engine for MySQL which allows you to stream media data directly in and out of the database. More info at www.blobstreaming.org.

Lets look at an example of this. Assume my standard example table:
CREATE TABLE notes_tab (
n_id int PRIMARY KEY,
n_text longblob
) ENGINE=PBXT;
And assume we have a file called blob_eg.txt with the contents "This is a BLOB Streaming upload test".

Firstly, I can upload a BLOB to the MyBS BLOB Repository using the HTTP PUT method:

% curl -T blob_eg.txt http://localhost:8080/test/notes_tab
test/1-326-4891cdae-0


Here I uploaded a BLOB to the repository and specified the database, test, and the table, notes_tab. The URL returned, test/1-326-4891cdae-0, is the reference to the BLOB in the BLOB repository, returned by MyBS. Note that the BLOB is not yet in the table (to store the BLOB directly in the table, I would have to specify a column and a condition which identifies a particular row in the table).

However, the BLOB is already stored in the database, and I can download as follows:

% curl http://localhost:8080/test/1-326-4891cdae-0
This is a BLOB Streaming upload test


Since the BLOB is not yet referenced by a table, the MyBS BLOB repository sets a timer. If the BLOB is not retained (reference count incremented) within a certain amount of time it is removed from the BLOB repository.

To actually insert the BLOB into the table you just insert the BLOB reference, for example:

mysql> insert notes_tab values (1, "test/1-326-4891cdae-0");

On the MySQL server the notes_tab table engine will call the MyBS engine (using the server-side BLOB Streaming API) and retain the test/1-326-4891cdae-0 BLOB reference. So I can now download the BLOB by referencing the table, column and row as follows:

% curl http://localhost:8080/test/notes_tab/n_text/n_id=1
This is a BLOB Streaming upload test


Note: this example will only work with MyBS 0.5 (www.blobstreaming.org/download) or later. Coming soon!

5 comments:

Sven Paulus said...

Hm, one thing looks odd:

I'd expect (for compability with other storage engines) that an insert like this ...

mysql> insert notes_tab values (1, "test/1-326-4891cdae-0");

... would insert the ASCII string "test/1-326-4891cdae-0" into the blob column.

So if I want to add a reference to a blob using a locator I'd have to use a special blob locator reference function like ...

mysql> insert notes_tab values (1, blob_ref("test/1-326-4891cdae-0"));

HTTP GET/PUT shouldn't be the only way to access the contents of blob columns in a blob streaming enabled storage engine. It would be great if the traditional MySQL way could exist in parallel.

Otherwise this whole thing looks like a great idea :-)

Brian said...

Hi!

So how is this different from the HTTP storage engine?

Cheers,
-Brian

Tim Soderstrom said...

@brian:

As far as I am able to tell, it looks like MyHTTP is for grabbing data from a website using MySQL (or rather allowing the user to create a MySQL query).

MyBS is for manipulating BLOBs in a fast and efficient manner by streaming them from MySQL (as opposed to buffering their entire contents in RAM, for instance). Using an HTTP protocol seems to be Paul's approach at accomplishing the goals.

I think MyBS has a ways to go, but I really like the fact that you can manipulate the data using GETs and PUTs. It seems quite fast and wildly more efficient than the way I usually see people handling BLOBs.

If all that doesn't answer your question, you should check out http://www.blobstreaming.org/ as it explains all you need to know (and has links to the alpha version).

Paul McCullagh said...

Hi Brian,

As far as I know, the most basic difference is the HTTP storage engine implements an HTTP client, while MyBS implements an HTTP server.

Paul McCullagh said...

Hi Sven,

You have raised a good point. Both ways of storing BLOBs in the database should work (as far as that is possible within the size limits of the SQL interface). I will look into creating a UDF like this for MyBS.

In addition, when selecting from the table, it should be possible to specify whether you want the reference, or the data. For example, to return the BLOB data:

SELECT n_id, n_text FROM notes_tab;

or, to return the BLOB reference:

SELECT n_id, blob_ref(n_text) FROM notes_tab;

There is a problem with SELECT blob_ref() though. When the row is returned by the table engine, it needs to know whether to return the reference or the BLOB data. However, the function runs after then engine has returned the row. So basically this is too late.

What could work is this:

SELECT n_id, blob_data(n_text) FROM notes_tab;

In this case, SELECT n_text FROM notes_tab, would return the BLOB reference.

Then the table engine will always return the reference, and the function blob_data() can just use the reference to get the BLOB data from MyBS.

I'll just have to check that there is no problem returning a large amount of binary data, because UDF can only be defined as returning STRING, INTEGER or REAL values.