Playing around with the MySQL-Proxy on Mayflower-barcamp

One of the workshops on our Barcamp two weeks ago had to do with the MySQL-Proxy from Jan Kneschke.

Yet, we found out,
that the proxy is rather unusable for our task. Read here why.

For those, who don’t know what it is: MySQL-Proxy is a blue dolphin, which sits in front of a telephone exchange. You don’t believe it? Look at this link: Getting started with MySQL-Proxy.

What was our task on the barcamp?

Build a script, which copies a table on first access to a MEMORY-table. After copying, we have then two
tables: The ORIGINAL-table and the MEMORY-table. READs should go to the MEMORY-table, WRITEs to both tables.

Why is this a good idea?

For other databases I would say: „Hmmm, good idea, but let the optimizer do it itself“. But especially with
MySQL it’s a good idea which is worth thinking about!

The table-engines in MySQL have all their pros and
cons. The InnoDB-table-engine for example has very good non-blocking features, but it could not be used, if
you need a guaranteed answer-time. Real-time access could be done with MyISAM-table-engine. This is very
fast (compared to other databases) but only for INSERT or (exclusive!) SELECT. If you need to INSERT and
SELECT, MyISAM behaves ugly, because it does table-locking while writing. But the MEMORY-table-engine is very,
very fast for all operations on not so big tables. It has, however, the disadvantage, that after a restart of
MySQL the tables are gone.

Combining such pros of the table-engines could solve previously unsolvable tasks for a MySQL-database! (or maybe unsolvable for any database!)

In our case: The READ-Access should go only to the MEMORY-table, while the WRITEs should go both to the
ORIGINAL-table and the MEMORY-table.

At first glance, this looks like a task which is a perfect use-case for the MySQL-Proxy. But it
isn’t! Why? More about that later.

Kurze Unterbechung

Das ist dein Alltag?

Keine Sorge – Hilfe ist nah! Melde Dich unverbindlich bei uns und wir schauen uns gemeinsam an, ob und wie wir Dich unterstützen können.

First steps and problems

David and me didn’t know much about the proxy, when we started. We thought this should be fun, we don’t read
too much docs. Not very professional, but ok for such a weekend.

So we dropped one clanger after another. :)

Good: I think, we know now the most important parts of the proxy and how it works.
Bad: Time consuming. We need to rewrite the program many times.

I don’t know if this would have worked faster, if we had read the docs before we began programming. But I think the result and the time necessary for it is in both cases the same.

I say this, because from the documentation you cannot imagine, how the proxy really works. You need to program with it to fully understand the concepts! This understanding is necessary, before you begin to program some serious things. My opinion, whether you read the docs or not, you need to play with the proxy, before you use it seriously.

At the end our script worked. But only for the most important cases:

Simple SELECTs
INSERTS and simple UPDATEs
simple table-names

Where is the problem?

Why isn’t it possible to make that working for all queries?

Ok, let’s go through the algorithm step by step.

We started the proxy and then we fired a simple query like

SELECT * FROM table1;

SELECT * FROM table1;

What we need now with our LUA-program is A) create the MEMORY-table B) Rewrite the query, which selects from
the MEMORY-table.

A) is easy:

CREATE TABLE table1_memory SELECT * FROM table1;

CREATE TABLE table1_memory SELECT * FROM table1;

This is the first problem: we need to search for the table-name, then we need to expand it. This works all very well for
such a simple SELECT, but how is it with JOIN? So when we find two or more tables, all of them must be copied
(Not to mention the problem, that it doesn’t make sense to scan the query every time, even when we don’t need
it, because the table-name is for example in a list of tables, which should not operate as MEMORY, because it is
for example too big).

B) is difficult: When I write for example

SELECT * FROM <code>{{EJS0}}</code>;

SELECT * FROM <code>{{EJS0}}</code>;

which is a valid table-name, the inbuilt MySQL-proxy-scanner finds the table-name „

table1

table1„, which is then translated into
„table1 table1_memory„. Funny. We need to fix the C-source-code to make the parser work correct here. In MySQL it’s
also possible, to write table-names like

SELECT * FROM test.<code>{{EJS3}}</code>;

SELECT * FROM test.<code>{{EJS3}}</code>;

I think, you see the problem yourself. So, as you might get now, B) is not difficult, it is much more difficult. Much more difficult!

Think of JOINs. Or subselects. There are situations, where you might not be able to rewrite an UPDATE-query correctly! SQL can be very complicated. Think for JOINs without index: That could be extremely slow. And so on and so on.

So, with the current approach of the
MySQL-proxy the problem is at the moment not solveable for all queries! And maybe it doesn’t make sense to make those things with the proxy, ever!

PostgreSQL can do this better!

And to rub some salt in our wounds: It’s doesn’t make sense to make it with the MySQL-proxy! I came to this conclusion, because I discovered now a much more comfortable approach. PostgreSQL or Oracle have a feature already
working. For PostgreSQL you could use CREATE RULE. This is a very fine feature in PostgreSQL, which enables
to rewrite the rules how it reads or writes tables.

This is exactly the feature what we need for MySQL to solve the above mentioned task!

Why?

As explained above: Combining the
features of MySQL table-engines could be a very good idea for many use-cases.
And you can’t do this in this low-level manner, which is in the proxy available: You need to hook into the table-backend of MySQL,
to be precise in the software-part, which decides, which tables should be accessed. The proxy can’t do that!

Discussion

The proxy should optimize the workflow between servers and clients. (?)
I bring this to discussion, because when reading the introductions for the MySQL-Proxy, you can come to the conclusion, that the proxy is the swiss-army-knife for MySQL. It isn’t. And I think, the use-cases for the proxy should be defined very clearly now, because otherwise there will be too many people, which complain that MySQL is ugly. :)
MySQL-proxy is not designed to rewrite queries. There is no way to do this correctly for every kind of
query. There is no MySQL-Parser, there is only a simple (but very fast) scanner. But to make things more complicated, an proxy-programmer needs for example to go through the parsed query with XPath, e. g. to look
up, if the query has some special table-name and to rewrite the query.
LUA is not PHP. The learning curve is steeper. This means, it has not useable for „everyone“, like PHP. LUA is a language which needs some more knowledge and so the target-users of the proxy are very competent admins or database-designers. Not simple unix-administrators with database-knowledge …
The work flow of the proxy makes simple things complicated. This is not a failure of the proxy, because it
is optimized for speed, but you need to know exactly how the proxy works and how things can be done, before
you can even think about implementing an algorithm. It would make things much easier, if you had
a „state-engine“ in the proxy, which handle all these cases.
No fallback mechanisms possible. We wondered about that: Suppose a database dies while
the proxy awaited a result (for example), it is not possible to fire that query to the backup-host.
Or perhaps it is but we didn’t found out how. So, that’s maybe a lack in the documentation?

What MySQL-Proxy needs, to become a professional tool?

Due to my experiences I mention here now things, which I (and David) think they should be implemented in the proxy, before it’s going to 1.0.

(without any order)

Loading proxy „modules“ without restarting the proxy. This is a must-have for production-environments!
Introduction of „(waiting) queues“
- instead of sending the query to the server, we could put it into a queue. Every queue has it’s own handler. And now you can do „things“ with it. For example: The queue-handler leaves a query 1 second in the queue. Then it looks into the queue if there are also „equivalent“ queries in the queue and handles all of them in one rush.
- failed queries could go into the „fail queue“ (with a max-failed-counter) and could be tried once more with a specialized handler
- or think for write-queries in a replicated server-surrounding: I write in a table, then I put the query in the „solved writing“-queue. When a read for this table comes in, I look up in the queue if the time after write is long enough that it has been replicated. If not, I wait some time.
- so, you need also query-priorities and so on
Security-mechanisms. Handler-functions, which handle situations, in which, for example, the server dies during a query.
Some „use-cases“ for the proxy. I think, for example, to master-slave-configurations, multi-mastering or defining specialized servers. Finding standardizations for server-configurations is very difficult, but should be defined to make standard-situations easier to handle. This could lead to modules: A „slow-query-handler“ for example, which handles slow queries on a specialized server over a queue.
Keeping more focus to the admins: For an admin, the proxy could be the savior: He found, for example, some special queries, which are slow and want to handle them differently. But he doesn’t want to learn LUA. He has no time for that. He wants to define a rule like „if a query looking like this appears, do that“, and „that“ is a ready implemented module, which he configures with some small (LUA-)scripting.
Some more possibilities to get information:
- getting the load of my MySQL-servers
- server configurations (MySQL-versions, server-setup, which servers should be READ ONLY, etc.)
- (more) auto-created statistics, like average execution-time of (READ/WRITE) queries…
Workflow of the proxy: as mentioned above I see a problem in the „statelessness“ of the proxy. Maybe, it isn’t necessary for the tasks the proxy was thought for. But in our simple example above this would make things much easier. This state-engine could handle the query- and result-lists by itself and define which method should be called with this ID. Just an idea, there might be better ones.
a „real MySQL-parser“.
Of course I’m not sure, if this is really needed. The approach of the proxy is not to define how data should be stored in the database. As explained above, the proxy is the wrong place to do those things!
The proxy should optimize the workflow between servers and clients.
But there might be situations, where it makes sense to handle special cases differently. Maybe you define a „queue for long lasting queries“, I mean queries which couldn’t be optimized. Then you need a piece of programm, which decides which queries should go into this queue. So, maybe you could need some more intelligent parsing of queries?
If so, my idea would be a new command like „PARSE“: „PARSE AS XML SELECT NOW()“ returns a
XML, which represents the query.
For the proxy, MySQLd (or the proxy) could return a binary format which both (proxy and
MySQLd) understand. Subsequently, the afford of parsing the query must be done only once. If you implement this in the MySQLd and the proxy needs a parsed query, he could send a „PARSE AS BINARY MYQUERY“ to any MySQL-instance and get a binary-format
back. Then, the proxy could search through the query (with XPATH?) and modify it (with DOM?), if necessary, and
then it sends this binary-format with „EXECUTE MYBINARYDATA“.
Ok, many things which must be considered (security reasons etc.) but in general it should work well and fast enough, cause the parsing of the query is then also scaleable!

Conclusion

The MySQL-proxy is a rather fine piece of software which can solve very special problems in a fast and efficient way. It should not be used for the rewriting of complex queries before a „real“ MySQL-parser (and operations to this parse-tree) is implemented.

This is also my overall impression: The proxy is very useful if you

have already played around with the proxy.
have a special query. I mean queries which have a given syntax.
definitively know how to make the query faster (or whatever you need to do with it). It takes too much time to try things out. (It’s an admin-tool and admins never have time.)

I think it is a long way, till the MySQL-proxy could be called „ready“. Of course, it’s already very well usable, but to be a „ready“ needs much more…

My opinion.

Kommentare

5 Antworten zu „Playing around with the MySQL-Proxy on Mayflower-barcamp“

ThinkPHP /dev/blog

13. Februar 2008

Since 2005, every year we did some kind of „Mayflower weekend“. As we’re a bit of a distributed company (departments in Munich and Würzburg, called Herbipolis, several „on-site departments“ at some of our largest customers, an offshore department in bea

Antworten
Jay Pipes

13. Februar 2008

„If you need to INSERT and SELECT, MyISAM behaves ugly, because it does table-locking while writing.“

This is not true. MyISAM does not lock out SELECTs while doing an INSERT. It deploys table-level locks when an UPDATE or DELETE happens, or when concurrent inserts are switched off. See here:

http://www.jpipes.com/index.php?/archives/180-Correction-on-Concurrent-Insert-for-MyISAM.html
http://jpipes.com/index.php?/archives/182-Internals-of-MyISAMs-Concurrent-Insert-Functionality-Part-I.html

Also, „The InnoDB-table-engine for example has very good non-blocking features, but it could not be used, if you need a guaranteed answer-time. Real-time access could be done with MyISAM-table-engine“

This is not correct either. InnoDB can perform as well or faster than MyISAM for specific types of things. For instance, range or point queries on an InnoDB table’s primary key can perform significantly faster in my experience than MyISAM because InnoDB uses a clustered data and index organization, meaning no bookmark lookup operation is needed versus MyISAM, where only the index record is kept in memory and a lookup is needed into the .MYD data file to fetch the rest of the data…so, the point is that general statements such as in your article should be avoided. As in so many cases in technology, there are a number of scenarios where one solution works better than another, and other situations where the reverse is true… :)

Cheers,

Jay

Antworten
1. Alex Aulbach
  
  14. Februar 2008
  
  Nice explanations, and all true.
  
  But let me answer to your points:
  
  With the locking, I mean every type of locking, not table-locking alone. This could be misunderstand…
  
  MyISAM has 2 Queues: The WRITE- and the READ-Queue. WRITE has a higher priority than READ. And every queue is as long handled, as there are entries in it.
  
  http://dev.mysql.com/doc/refman/5.1/en/internal-locking.html
  
  So, if you have some big SELECTs in the READ-queue, you cannot INSERT, and if you have thousands of INSERTs in the WRITE-Queue, you cannot SELECT. That’s what I mean with locking in this case.
  
  To your second point: Yes, of course, that’s exactly what I mean: MyISAM is faster only for simple SELECTs and not so big tables, or preloaded tables. But then it is with my experience about 2-4 times faster.
  
  I excuse for this generality, but this article was very long, I’ve and had no time left to explain this better and the sourrounding in which this is true. So I thought it was clear, that I spoke about some website which makes 1000 times per second such queries. Next time I’ll do better.
  
  On the other hand, people dealing with optimizing MySQL should know, that every rule has an exception. ;)
  
  Antworten
Sheeri

13. Februar 2008

actually, I think one of the biggest boons, that would help your problem and probably most of the issues you list at the end of this post is to be able to specify new proxy objects/tables/structures — that way you could have a proxy object that is „tables already in memory“ and easily compare. This would also help develop something like CREATE RULE because someone write the code that’s the equivalent of preparing all the work that says „if the query looks like x, go to server y“ and then the admin fills in x and y.

Antworten
1. Alex Aulbach
  
  14. Februar 2008
  
  Hm. I’ve thought some about this.
  
  My opinion with this is: The proxy is just a proxy. The proxy is not another database-engine.
  
  So, I think the proxy – as said in the article – „should optimize the workflow between servers and clients“.
  
  And nothing else!
  
  For example: The proxy should not store data itself. It might be a good idea and if the performance-lag is so strong, I would recommend it, but I think, that it shouldn’t be done as a general solution, cause the proxy is not a DBRMS.
  
  That’s also my general impression of the proxy: It’s optimized for speed and non-blocking. It should feed the MySQL with queries as fast as it can!
  
  Simple concept, clear line.
  
  Antworten

Das ist dein Alltag?

Der #1-Killer für Deine AI-Projekte!

Aktuelle Artikel per Mail

Playing around with the MySQL-Proxy on Mayflower-barcamp

What was our task on the barcamp?

Why is this a good idea?

Das ist dein Alltag?

First steps and problems

Where is the problem?

PostgreSQL can do this better!

Discussion

What MySQL-Proxy needs, to become a professional tool?

Conclusion

Der #1-Killer für Deine AI-Projekte!

Kommentare

5 Antworten zu „Playing around with the MySQL-Proxy on Mayflower-barcamp“

Schreibe einen Kommentar Antworten abbrechen

Aktuelle Artikel per Mail