So here are some details of the issues raised so far:
One of the first things that Brian Aker pointed out to me was that the server-side API must make it possible to use the sendfile() system call. The call does direct disk to network transfer and can speed up delivery of a BLOB stream significantly.
The server-side API must also include a mechanism to inform the stream enabled engine that an upload is complete. Assuming the streaming protocol used is HTTP, there are 2 ways of determining the end of a download: either the connection is closed when download is complete, or the length of the BLOB data is specified in a HTTP header.
Monty was concerned about compatibility with the MySQL ODBC driver. The ODBC function SQLPutData() is used to send BLOB data, and SQLGetData() is used to retrieve BLOB data. Both functions transfer data in chunks. The implementation of these functions would have to be made aware of streamable BLOB values.
This can probably be done in a way that is transparent to the user. It may be necessary to designate certain database types as streaming BLOBs. One suggestion is to add 2 new types to MySQL. LONGBIN and LONGCHAR. These keywords are not yet used by MySQL.
However, from the OBDC side the data types LONGVARBINARY and LONGVARCHAR are already mapped to the MySQL types BLOB and TEXT respectively. But this simply means that it would be transparent to an ODBC application whether a BLOB is being streamed, or retrieved using the standard client/server connection.
Upload Before or After INSERT
There is some debate about how exactly a BLOB should be inserted. The 2 main possibilities are:
1) Upload the BLOB. The upload returns a handle (or key). Perform the INSERT setting the BLOB column to the key value.
2) Perform an INSERT, but specify in a CREATE_BLOB(
For method (1) it was also suggested that the client application specify a name, instead of using a handle returned by the server. The name could then be used as input to a "create_blob" function which is specified in the INSERT statement. The advantage of this is that the text of the insert statement is readable.
One advantage of method (1), as pointed out by Monty, is that the client does not have to wait for the upload to complete before doing the INSERT. The server could wait for the upload to complete. This would require a client side generated identifier for each BLOB uploaded.
Note that method (2) requires use of a transaction to prevent the row with the BLOB from being selected before the BLOB data has been uploaded.
Storage of BLOBs
Stewart Smith suggested that the BLOBs be stored by the streaming engine itself. This is something to be considered, and probably will be necessary in most scale-out scenarios. Otherwise the current plan is that each engine that is streaming BLOB enabled (i.e. that supports the server-side streaming API) will store the BLOBs themselves.
Also mentioned in regard to scaling is the fact that read scale-out is usually more important than write. However, there are web-sites that have a heavy write load. To be investigated would be how NBD could used in combination with the streaming engine to scale-out uploading of BLOBs.
mysqldump & mysql
As pointed out by Monty, the output of mysqldump is a readable stream. So any changes to mysqldump in order to support streamable BLOBs should ensure that this continues to be the case. This is related to the question of how to upload BLOBs using the mysql client. One idea would be to provide a command in mysql to upload a file or a block of binary data or text.
In some cases it may be necessary to check whether a client is authorized to download or upload a BLOB. The most straight forward method would be for the server to issue authorization tokens which are submitted with an upload or download. These tokens expire after a certain amount of time, or when the associated server connection is closed. However, if the streaming protocol is to be HTTP then we need to investigate the possibility of using standard HTTP authentication when retrieving and sending BLOBs.
In-place updates of BLOBs is allowed by some SQL servers (for example by using functions UPDATETEXT and WRITETEXT supported by MS SQL Server). In this case the streaming enabled engine would be responsible for locking or using some other method to ensure that concurrent reads of the BLOB remain consistent.
Georg Richter, who is responsible for the PHP Native Driver at MySQL, noted that using BLOB streaming, data could be uploaded from the Web directly into the database. The PHP extension to the standard MySQL client software should include a function to make this possible.
Georg also pointed out that it should be possible to convert BLOBs currently stored in MySQL tables into streamable BLOBs using ALTER TABLE. This would enable developers to quickly try out streaming BLOBs and test the performance characteristics, in combination with their applications.
These are the main issues raised so far. Any further ideas, comments and questions are welcome! :)