One of the workshops on our Barcamp two weeks ago had to do with the MySQL-Proxy from Jan Kneschke.
Yet, we found out,
that the proxy is rather unusable for our task. Read here why.
For those, who don’t know what it is: MySQL-Proxy is a blue dolphin, which sits in front of a telephone exchange. You don’t believe it? Look at this link: Getting started with MySQL-Proxy.
What was our task on the barcamp?
Build a script, which copies a table on first access to a MEMORY-table. After copying, we have then two
tables: The ORIGINAL-table and the MEMORY-table. READs should go to the MEMORY-table, WRITEs to both tables.
Why is this a good idea?
For other databases I would say: „Hmmm, good idea, but let the optimizer do it itself“. But especially with
MySQL it’s a good idea which is worth thinking about!
The table-engines in MySQL have all their pros and
cons. The InnoDB-table-engine for example has very good non-blocking features, but it could not be used, if
you need a guaranteed answer-time. Real-time access could be done with MyISAM-table-engine. This is very
fast (compared to other databases) but only for INSERT or (exclusive!) SELECT. If you need to INSERT and
SELECT, MyISAM behaves ugly, because it does table-locking while writing. But the MEMORY-table-engine is very,
very fast for all operations on not so big tables. It has, however, the disadvantage, that after a restart of
MySQL the tables are gone.
Combining such pros of the table-engines could solve previously unsolvable tasks for a MySQL-database! (or maybe unsolvable for any database!)
In our case: The READ-Access should go only to the MEMORY-table, while the WRITEs should go both to the
ORIGINAL-table and the MEMORY-table.
At first glance, this looks like a task which is a perfect use-case for the MySQL-Proxy. But it
isn’t! Why? More about that later.
First steps and problems
David and me didn’t know much about the proxy, when we started. We thought this should be fun, we don’t read
too much docs. Not very professional, but ok for such a weekend.
So we dropped one clanger after another. :)
- Good: I think, we know now the most important parts of the proxy and how it works.
- Bad: Time consuming. We need to rewrite the program many times.
I don’t know if this would have worked faster, if we had read the docs before we began programming. But I think the result and the time necessary for it is in both cases the same.
I say this, because from the documentation you cannot imagine, how the proxy really works. You need to program with it to fully understand the concepts! This understanding is necessary, before you begin to program some serious things. My opinion, whether you read the docs or not, you need to play with the proxy, before you use it seriously.
At the end our script worked. But only for the most important cases:
- Simple SELECTs
- INSERTS and simple UPDATEs
- simple table-names
Where is the problem?
Why isn’t it possible to make that working for all queries?
Ok, let’s go through the algorithm step by step.
We started the proxy and then we fired a simple query like
SELECT * FROM table1;
What we need now with our LUA-program is A) create the MEMORY-table B) Rewrite the query, which selects from
A) is easy:
CREATE TABLE table1_memory SELECT * FROM table1;
This is the first problem: we need to search for the table-name, then we need to expand it. This works all very well for
such a simple SELECT, but how is it with JOIN? So when we find two or more tables, all of them must be copied
(Not to mention the problem, that it doesn’t make sense to scan the query every time, even when we don’t need
it, because the table-name is for example in a list of tables, which should not operate as MEMORY, because it is
for example too big).
B) is difficult: When I write for example
SELECT * FROM `table1`;
which is a valid table-name, the inbuilt MySQL-proxy-scanner finds the table-name „
table1„, which is then translated into
table1_memory„. Funny. We need to fix the C-source-code to make the parser work correct here. In MySQL it’s
also possible, to write table-names like
SELECT * FROM test.`table1`;
I think, you see the problem yourself. So, as you might get now, B) is not difficult, it is much more difficult. Much more difficult!
Think of JOINs. Or subselects. There are situations, where you might not be able to rewrite an UPDATE-query correctly! SQL can be very complicated. Think for JOINs without index: That could be extremely slow. And so on and so on.
So, with the current approach of the
MySQL-proxy the problem is at the moment not solveable for all queries! And maybe it doesn’t make sense to make those things with the proxy, ever!
PostgreSQL can do this better!
And to rub some salt in our wounds: It’s doesn’t make sense to make it with the MySQL-proxy! I came to this conclusion, because I discovered now a much more comfortable approach. PostgreSQL or Oracle have a feature already
working. For PostgreSQL you could use CREATE RULE. This is a very fine feature in PostgreSQL, which enables
to rewrite the rules how it reads or writes tables.
This is exactly the feature what we need for MySQL to solve the above mentioned task!
As explained above: Combining the
features of MySQL table-engines could be a very good idea for many use-cases.
And you can’t do this in this low-level manner, which is in the proxy available: You need to hook into the table-backend of MySQL,
to be precise in the software-part, which decides, which tables should be accessed. The proxy can’t do that!
- The proxy should optimize the workflow between servers and clients. (?)
I bring this to discussion, because when reading the introductions for the MySQL-Proxy, you can come to the conclusion, that the proxy is the swiss-army-knife for MySQL. It isn’t. And I think, the use-cases for the proxy should be defined very clearly now, because otherwise there will be too many people, which complain that MySQL is ugly. :)
- MySQL-proxy is not designed to rewrite queries. There is no way to do this correctly for every kind of
query. There is no MySQL-Parser, there is only a simple (but very fast) scanner. But to make things more complicated, an proxy-programmer needs for example to go through the parsed query with XPath, e. g. to look
up, if the query has some special table-name and to rewrite the query.
- LUA is not PHP. The learning curve is steeper. This means, it has not useable for „everyone“, like PHP. LUA is a language which needs some more knowledge and so the target-users of the proxy are very competent admins or database-designers. Not simple unix-administrators with database-knowledge …
- The work flow of the proxy makes simple things complicated. This is not a failure of the proxy, because it
is optimized for speed, but you need to know exactly how the proxy works and how things can be done, before
you can even think about implementing an algorithm. It would make things much easier, if you had
a „state-engine“ in the proxy, which handle all these cases.
- No fallback mechanisms possible. We wondered about that: Suppose a database dies while
the proxy awaited a result (for example), it is not possible to fire that query to the backup-host.
Or perhaps it is but we didn’t found out how. So, that’s maybe a lack in the documentation?
What MySQL-Proxy needs, to become a professional tool?
Due to my experiences I mention here now things, which I (and David) think they should be implemented in the proxy, before it’s going to 1.0.
(without any order)
- Loading proxy „modules“ without restarting the proxy. This is a must-have for production-environments!
- Introduction of „(waiting) queues“
- instead of sending the query to the server, we could put it into a queue. Every queue has it’s own handler. And now you can do „things“ with it. For example: The queue-handler leaves a query 1 second in the queue. Then it looks into the queue if there are also „equivalent“ queries in the queue and handles all of them in one rush.
- failed queries could go into the „fail queue“ (with a max-failed-counter) and could be tried once more with a specialized handler
- or think for write-queries in a replicated server-surrounding: I write in a table, then I put the query in the „solved writing“-queue. When a read for this table comes in, I look up in the queue if the time after write is long enough that it has been replicated. If not, I wait some time.
- so, you need also query-priorities and so on
- Security-mechanisms. Handler-functions, which handle situations, in which, for example, the server dies during a query.
- Some „use-cases“ for the proxy. I think, for example, to master-slave-configurations, multi-mastering or defining specialized servers. Finding standardizations for server-configurations is very difficult, but should be defined to make standard-situations easier to handle. This could lead to modules: A „slow-query-handler“ for example, which handles slow queries on a specialized server over a queue.
- Keeping more focus to the admins: For an admin, the proxy could be the savior: He found, for example, some special queries, which are slow and want to handle them differently. But he doesn’t want to learn LUA. He has no time for that. He wants to define a rule like „if a query looking like this appears, do that“, and „that“ is a ready implemented module, which he configures with some small (LUA-)scripting.
- Some more possibilities to get information:
- getting the load of my MySQL-servers
- server configurations (MySQL-versions, server-setup, which servers should be READ ONLY, etc.)
- (more) auto-created statistics, like average execution-time of (READ/WRITE) queries…
- Workflow of the proxy: as mentioned above I see a problem in the „statelessness“ of the proxy. Maybe, it isn’t necessary for the tasks the proxy was thought for. But in our simple example above this would make things much easier. This state-engine could handle the query- and result-lists by itself and define which method should be called with this ID. Just an idea, there might be better ones.
- a „real MySQL-parser“.
Of course I’m not sure, if this is really needed. The approach of the proxy is not to define how data should be stored in the database. As explained above, the proxy is the wrong place to do those things!
The proxy should optimize the workflow between servers and clients.
But there might be situations, where it makes sense to handle special cases differently. Maybe you define a „queue for long lasting queries“, I mean queries which couldn’t be optimized. Then you need a piece of programm, which decides which queries should go into this queue. So, maybe you could need some more intelligent parsing of queries?
If so, my idea would be a new command like „PARSE“: „PARSE AS XML SELECT NOW()“ returns a
XML, which represents the query.
For the proxy, MySQLd (or the proxy) could return a binary format which both (proxy and
MySQLd) understand. Subsequently, the afford of parsing the query must be done only once. If you implement this in the MySQLd and the proxy needs a parsed query, he could send a „PARSE AS BINARY MYQUERY“ to any MySQL-instance and get a binary-format
back. Then, the proxy could search through the query (with XPATH?) and modify it (with DOM?), if necessary, and
then it sends this binary-format with „EXECUTE MYBINARYDATA“.
Ok, many things which must be considered (security reasons etc.) but in general it should work well and fast enough, cause the parsing of the query is then also scaleable!
The MySQL-proxy is a rather fine piece of software which can solve very special problems in a fast and efficient way. It should not be used for the rewriting of complex queries before a „real“ MySQL-parser (and operations to this parse-tree) is implemented.
This is also my overall impression: The proxy is very useful if you
- have already played around with the proxy.
- have a special query. I mean queries which have a given syntax.
- definitively know how to make the query faster (or whatever you need to do with it). It takes too much time to try things out. (It’s an admin-tool and admins never have time.)
I think it is a long way, till the MySQL-proxy could be called „ready“. Of course, it’s already very well usable, but to be a „ready“ needs much more…