Some months ago I helped out in another project in which they had some performance problems. They had a very big table and the index of the table was bigger than the table itself. As every change in the table causes MySQL to recalculate/reload the index, this can take some seconds for such big tables.
So, I thought it would be a good idea to split that big table into very small ones. This should reduce the overhead of reloading big indices and instead reload only very small parts. And the next thought was: Is it possible to use the „new“ MySQL-Partitioning for that?
Short overview to partitioning in MySQL 5.1
First of all lets have a quick look over the partitioning features:
- Main aspect is the physical storage of the data which can be set for every partition extra
- There are some partitioning methods which define the rules of which data is stored in which partition
- Only horizontal partitioning possible
There are four „partition types“ which can be seen as the rules for „which data is stored in which partition“:
The definitions of that can be found here.
After reading through all that stuff I discovered that MySQL Partitioning is very suitable for partitioning by date. There are really several and useful functions which enables you to part tables by date. But there aren’t enough functions to calculate a partition-id from a string.
There is a limitation of 1024 partitions. In my case I had more than 400 partitions (and the project is growing), already 40 % of the limit.
I found it also very irritating, that the row, a partition expression is based on, must be part of the primary key or any other unique key. For me and in the above mentioned case this doesn’t make sense at all as this is exactly what I want to avoid with partitioning into smaller parts: I want split my problem into many smaller problems and not to share the problem. (See here Partitioning Keys, Primary Keys, and Unique Keys in partitions: „Every unique key on the table must use every column in the table’s partitioning expression.„)
Especially, this last limitation was the reason why I write now this article: for the kind of partition the MySQL-developers had in mind when implementing partitions, this limitation makes sense. But – as so often in MySQL – the idea is not obvious for application-developers like me. At first glance into documentation you may think that MySQL can handle direct partitions on a key. But it can’t. This limitation is for the chosen example so performance-limiting, that it should be written in bold at the first page of partition-documentation!
Let’s assume we have a table of „adresses of the world“:
CREATE TABLE adr ( country char(2), zipcode char(7), street varchar(50), PRIMARY KEY (country, zipcode, street) );
Very simple, just to illustrate the problem. We can assume, that this table can be very, very big, if you managed to get all adresses of the world in it.
The interesting thing is now: The index of this table gets bigger, than the table itself! And there are many thousand changes in it per day, maybe 1,000 per second. For every change the index must be recalculated. And this is a real problem, if you also have many thousand requests on this table per second.
The moment, where direct partitioning over a key begins to make sense
- SELECT * FROM adr WHERE street LIKE '%muster%'; makes no sense – with more than 1 billion records this can take days…
- Creating more indices to search faster: This new index will nearly double the total index-size necessary, doubles the required space in memory, the time to reload the index, etc.
- Analysis show also: The select for „all streets like“ is very seldom. Most queries are looking like SELECT * FROM adr WHERE country = 'de' AND zipcode = '12345%' and street like '%muster%'. This kind of query runs very fast if the table is warmed up and if there are no changes to the table.
- No matter what we optimize, the basic problem remains: If you want to speed up this table, we need to have much more RAM, more CPU, and extremly faster disks. And at some point, when the number of records increases faster than Moores Law, even this doesn’t help anymore. So the problem is: How to speed up the table for fast SELECTs WHILE MUCH INSERT/UPDATEs?
- Important logical step: As developer we are now able to say „We need to know WHERE, before we search, because otherwise it just makes no sense. Searching only for streets needs a completely different solution, but I’m currently not focused on that problem. I’m focused to search the adr-table.“
- Next important step: At this moment, we can also say „I’ve made sure, that I know at least the country code before I begin to search.„.
- With this prerequisites we are able to split our problem in as many parts as countries exists.
- This standpoint has also many other advantages:
- The total index is much smaller, because we save one „index-level“. We need less RAM and can store more records.
- An update is much smaller: Only operations to this part must wait, other selects aren’t influenced.
- Read is much faster. We read only those parts of the index from where we know that the results must be inside it.
- We can make many parallel requests; the probability, that two queries go to the same part at the same time is very low.
- This is perfectly scalable. If one machine is too slow, we can just put half of the parts on another machine and the whole application should work nearly twice as fast.
A partition is in this case maybe not a number, it is just the value of the key and not every partition must be generated manually.
This is useful, because if your application makes only SELECTs of known keys, the partition-manager doesn’t need to calculate which the correct partition is. It can directly look into the right part. No need to calculate an unnecessary index (= in which partition is my record?).
The borders in which this idea work
- Very big number of records. (So big, that a full table scan takes some time.) A search to the table only on non-indexed fields is not possible and so forbidden for this table.
- Many reads and changes to the same tables (update/delete). (I call this „real-life-data“, because in real-life everything changes at every time. :)
- The index should fit completely into the available RAM; only then we can search fast.
- The index for our search criteria needs not be unique.
- For every new partition-key, a new partition partition is created automatically.
In those cases, it makes sense to split/partition the big table into smaller pieces just by a key. This may result in not equal big partitions but this doesn’t matter much: The most important thing is, that the biggest partitioned table is much (!) smaller than the original. The pieces must be so small, that it takes in the worst case at the utmost one second to load the index of this partition into memory.
To test this kind of concept, I created a partitioned table, which implements just that. With some tricks it really worked that for every key in the table, the record was stored in another partition. In the end I had over 400 partitions. The results were really bad, so bad, that I haven’t kept them.
But this test brought me to the idea to make partitioning myself: I created as many tables as partitions before and stored every record in it’s correct table. Afterwards I made again some tests.
The results were overwhelmingly good!
- The machine was a dual quadcore with only 8 GiB RAM and some good raid on it.
- I created over 400 tables (MyISAM), with a total number of 200 million records (more was not possible due to the available amount of RAM at this time).
- The biggest table had then about 1,000,000 records, the smallest only about 5,000.
- I warmed the database up with about 100,000 SELECTs (10 minutes).
- I also created benchmarks with random SELECTs with 4 threads and one thread which INSERTs/UPDATEs random records in random tables.
- With a first test over this, I reached more than 500 Inserts/Selects per second. That was quite good. One rule I found was: The smaller the tables, the faster the access.
- With some other tricks (more partitioning for INSERT/UPDATE), I came to 4,000 INSERTs/UPDATEs and 6,000 SELECTs per second. Yes, per second! Yes, at the same time on one machine. That where the lower average values. In average and top speed it was much faster.
- Direct partitioning by a key value is a fascinating thing, which can solve problems with very big „real-life“-data-tables and it should be further looked into.
- MySQL concentrates the partitioning feature too much on „WHERE is the data stored“ and not on „WHERE does the data belong to„.
A storage has always physical limitations, a logical information of data which belongs together, however, doesn’t have any physical limitations; the storage itself can keep this information.
- MySQL isn’t currently able to handle partitions on a direct key in the table. The limitations forbid it, and even if you go over them: they are really slow.
- This was the reason why I wrote this article: it is of course possible to use the partitions in MySQL like that, but it makes no sense, because there is this need for this unique key, which creates more problems than it solves for such big tables.
- This feature should not be put together with the current „PARTITION“-implementation. I think, this is something different. Maybe we can call it „AUTO-MERGE“?
- Even if MySQL doesn’t support that kind of partition: It’s not so complicated to change your application. Make partitioning yourself! It’s easier than you think.
- It makes actually so much sense, that I want to raise my hand and say „I want this“. :) – If MySQL would support that, it would help a lot.
- Testing large partitions
- Partition by value of column list
- Partitioning on Application Side. The author here suggests to do this kind of partition in with MySQL-Proxy. I think, this is not the right place for that, because this is not parallelizable on one server.
- How MySQL does table locking in partitions.
Nice article. THX!
„With some other tricks (more partitioning for INSERT/UPDATE)“
Can you explain this part in more detail?
Yes, but not here and now in detail, because this takes me another blog-entry. :)
But in short terms: As explained, the problem with big indexes are the updates and inserts. I asked me: What if we make another table only for the inserts and a third table for the updates to this „shard“, so that the „select“-table is static and only generated new when needed (= too much changes on insert/update-table)?
Oh, yes, and at the IPC 2008 I will explain this in praxis.
Thank you for writing this entry.
Would you mind elaborating on your technique for having separate tables for select, insert, and update operations?
A few questions:
1. How will the application „see“ the new (inserts) and modified (updates) data?
2. When does the select table get generated?
I saw your slides for IPC 2008 and it didn’t discuss the details. Thanks again.
Sorry for long delay, but I’ve been ill.
To #1: There are some possibillities! a) The simplest thing: The application must do a SELECT … UNION.
b) Write a library, which creates the Queries you need to select.
c) Write a handler/deamon, wich will implement a SOAP or REST interface and does the UNION itself (or however you can do that). So it is much more scalable (endless scalable!).
To #2: The select-table must get generated a) at startup, when it doesn’t exist b) everytime, when there are too many updates/inserts to the tables (the UNION gets too slow) c) when there is time to regenerate the table (e.g. at night) d) when there is the need to split up one table into two, because it gets too big.
Fantastic article! I’ve also seen that you are holding a presentation next year about „100 billion rows with mysql“. Is this going in the same direction with self made partitioning (thats what I’m doing by the way too) or is it a mysql cluster based solution?
No Cluster. If you need speed (minimum answer time) the cluster sucks.
Cluster is very good for HA-applications and does some good scaling, but it isn’t really fast (some principal limits) and (yet) not useful for really big tables.
BTW: The presentation name is „Googol records (with MySQL)“ and with the presentation I’ve made on the IPC I’ve been a little bit too far in the future. I hope I get the chance to repeat this. But be free to ask me here questions about that.
Thanks for this article. I was looking to do partioning by key with mysql, and the part where you say At first glance into documentation you may think that MySQL can handle direct partitions on a key. But it can’t. This limitation is for the chosen example so performance-limiting, that it should be written in bold at the first page of partition-documentation!
Thanks for that, just saved me hours of re reading the docs. I am working on a oop solution that seperates data into „dataSections“ which I hopen could be mapped to mysql partions. But in the end it looks like I will map each „dataSection“ to it own mysql table.
Thanks for your article.
It’s good to see real life examples of partitioning in action, as the partitioning function seems little understood or used.
It looks like MySQL partitioning is in its infancy and it’ll take a release or two before its true power is available.
No, I wouldn’t say so. :)
It’s very powerful, when used in the customized borders. Especially partitioning by time/index etc. is told to be really good. Assume a scenario, where the log-data from the last week is stored on an SSD and the „rest“ is stored on a big RAID. As long as you are searching within this week, you have fantastic speed. If you search more into the past, the search gets slower. But only 0.1 per mille of your searches are going so far back, so the „slow“ RAID has no problems to deliver the requests and customers which are searching so far are also willing to wait longer.
I must admit, that I have not enough own experience with partitioning to give evidence in that, but for me the proves are obvious.
Indeed, that is a good example that I refer to often. Another popular one I think is IP->Country lookups. The country table stays very static, the IP range table gets updated monthly. You can partition the IP range table and get a very simple performance boost without any real/constant kind of performance hit elsewhere.
A useful link you may find interesting regarding progress of partitioning in MySQL…
Let me share with you a real-life example of what Mysql partitioning can do with huge data: http://www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1
Let me share with you this great post about time/date based partitioning and subpartitioning widh huge data flows:
It demonstrate how partitioning can be a good solution for scalable architectures.