Thursday, August 9, 2012

MySQL as a Key Value Store (KVS) - Why, some background and some more tests

You might have seen my attempts at testing MySQL as a Key Value Store (KVS) (here and here), i.e. used like MongoDB, and I have had some comments to those. Above all, many wanted to know the details of my test environment. To be honest, this started off as a very simple thing, but it turned out a bigger than I thought, so I guess its reasonable to give some background. I apologize for not giving you this background before, but as I said, my initial intention was to just run bog standard MongoDB versus MySQL and see what happened.

Background

Here at Recorded Future, when I joined about 2 years ago, we used a lot of MySQL, and it was causing us serious problems. We had a pretty standard school-book style relational schema, and performance was really hurting us bad. Because of the amount and type of data we manage, because we run on Amazon and because we are still developing and changing, this schema didn't fit our needs at all.
Read - scalability was not a good option either, as we are massaging so much data that we had big issues with just having the Slaves keep up with the master. Schema changes, as we developed our application, was also difficult.

What we ended up doing was a redesign of a lot of things, ending up with something that could use a KVS, as these are easier to scale, using sharding. This had some negative implication too, like it was no longer possible to go into MySQL and issue a query for data just like that. And no, using the mongo shell for this, as not only were we now using MongoDB, we had data spread over several MongoDB stores. On the good side though, things was not much more performant and easier to scale, write scalability with MongoDB is good and easy to use and configure. Which is not to say that MongoDB solves all problems in the world.

So, having implemented MongoDB and having gotten used to it and even starting to like it a fair bit, I also started seeing some other things. Like, Why can't MySQL be used for this? I mean MySQL is a bit more mature than MongoDB and should be better at handling, say disk-IO. On the other hand, MongoDB is newer. One thing I already knew was that MongoDB really needs to be in-memory to a large extent, when it hits the disk, performance goes downhill fast, so here MySQL is better. But when we DO have enough RAM (and as we use several MongoDB datastores for different purposes, and we are on Amazon, where you can get respectable amounts of RAM, this is usually the case) what happens? If I let MySQL use RAM it can get so I don't get any disk-IO for reads at all, will it be as fast as MongoDB? Or maybe even faster.
And among the MySQL Storage Engines, the one mostly targeted as a Key Value Store is NDB? How's that going to work.

 

Test environment

So, to test all this, but to make sure I wasn't affected by Amazons ups-and-downs, I pulled one of our MongoDB datastores, some 105.000.000 rows of real world data. Initially, my plan was to use the full MongoDB datastore, but I had to exclude some fields as these were VARCHAR and putting them on Disk with NDB was causing just too much disk-I/O, as NDB stores disk data as fixed length (so a UTF-8 VARCHAR(256) field occupies some 768 bytes). I ended up with a table schema like this:
CREATE TABLE `keyvalue` (
  `id` bigint(20) NOT NULL,
  `value1` int(11) DEFAULT NULL,
  `value2` double DEFAULT NULL,
  PRIMARY KEY (`id`)
)
In the case of MongoDB, the id column was used as _id, which is MongoDBs way of saying "PRIMARY KEY".

 Having done all that, I need some tools for my test bench:
  • Homebrew server with an AMD FX-8120 8 core CPU and 16 Gb RAM on an ASUS M5A88-V Mobo. This is a neat server MoBo really, supporting SATA 6, USB 3.0 and other niceties but I use a Lite-On LNE100TX network card instead of the built-in Realtek crap that many MoBos, inparticular AMD ones, seems to be equipped with.
  • A bunch of disks, the disks where database resides is a LVM Stripe of two Seagate Barracuda 7200.11 500 Gb disks using xfs. But this is pretty insignificant as there should be no disk-I/O at all, or at least extremely limited, when testing.
  • Ubuntu 10.10 (I have some reasons for this a bit ancient version. And yes, I have patched that 16 Gb memory issue in Linux with some AMD MoBos).
  • MySQL 5.6.5 64-bit (for all MySQL tests except NDB)
  • MySQL Cluster 7.2.6 64-bit (for MySQL NDB tests) 
  • MongoDB 2.0.5 64-bit
Having all that ready I now had a slightly smaller table taking up some 4.2 Gb when exported as JSON. So this data has been imported into MongoDB and MySQL with a number of different storage engines, and I was ready to do some testing. I use a test program that kicks of SQL queries with over a specific T of threads, each SQL (with Mongo this is not SQL of course, but the test program is the same) containing a variable part, which is the ID for the row to look up, The program starts by reading a file with a number of random IDs, the IDs are spread over the threads and then each thread looks up the rows N times.

In this case I picked some numbers for my test, so the number of threads I use is 100, not an unreasonable number, but big enough to make any concurrency issues visible and the number of rounds is 10. So to be clear, the 1.000.000 IDs are distributed over the threads, so each thread processes 10.000 IDs each, running then 10 times, which means I do some 10.000.000 random reads in parallel. Not the most exciting benchmark on the planet, I know, but it was all the same interesting and also to some extent a real-world test.

Before each test run, I did a few initial runs to ensure that the cached were warmed up. Also, I did look at vmstat and stuff to ensure no disk-I/O was taking place.

 

Current state of tests

I have so far tested MongoDB, MySQL Cluster and MySQL InnoDB and since last I have also tested the MySQL MEMORY engine and MySQL with MyISAM. So far the results are like this:
  • MongoDB - 110.000 rows read per second
  • MySQL Cluster - 32.000 rows read per second
  • MySQL with InnoDB - 39.000 rows read per second
  • MySQL with MEMORY / HEAP - 43.000 rows read per second
  • MySQL with MyISAM - 28.000 rows read per second
The last two ones were very disappointing to me, and trust me, I want them to be good. I really wanted MySQL to beat MongoDB, and I really wanted to see how the much lower overhead of MyISAM would make it run circles around InnoDB, but no.

As MyISAM only caches the keys itself, not the actual data, this was an issue, but still, the MyISAM data file should be well cached by Linux for me, and rightly so, there was no disk-I/O. Could context switching when going to Linux for data be an issue? I decided to try something fun. I created a new unique index on the table, covering all columns, which meant that if I used this covering index, all data would be fulfilled by the index (but the index in itself would be bigger of course). This didn't help at all, quite the opposite. Playing with named keycaches and stuff didn't help at all, nor did using mmap. Which is not to say that these guys do not help to improve performance, all this means is that when all data is already im memory, they don't seem to help. I did figure out one thing that did work though, which was using tcmalloc. Enabling tcmalloc when starting the mysql server, which is easy these days as MySQL comes with the tcmalloc library included and mysqld_safe has an command-line option for it. Real easy, this really should be the default.

With this in place, I managed to get:
  • MySQL with MyISAM - 37.000 rows read per second
I'll try using tcmalloc with the other engines also, and see what happens.

 

In conclusion so far

Don't take me wrong here,  I'm not out to prove that MySQL or MongoDB sucks. Nor am I out to do some exact benchmarking. What I am out to do is to see if the claim that an RDBMS can perform as well as a standard Key Value Store in an environment optimized for a Key Value Store (i.e. all data in memory). And such an environment isn't that specialized actually. databases around a few 10's of gigabytes are really common and can easily fit on memory. And although I do prefer the more strict datamodel of an RDBMS to the pretty schema-free design of a NoSQL alternative such as MongoDB (MongoDB really has some reliance on schema, but in essence, it's schema-free), I have to admit that sometimes you need more performance, and distributing the load over multiple servers is a lot easier when you don't have to worry about global transactions or global consistency, and frankly, in many cases you can live without that, to an extent,

What remains to be tried is MySQL with NDB using multiple mysqld servers, InnoDB using tcmalloc and tarantool and the MySQL HANDLER interface (actually, I have already tested the latter, but I am not sure I got the tests right).

/Karlsson
Who apologizes for the omission of the test setup before publishing the first results

13 comments:

Antony T Curtis said...

I am still very surprised at your NDB performance numbers... I had achieved much more in simple testing in the past.

Anders Karlsson said...

So am I. I'd really like to see NDB higher up here, but I have a feeling that what's holding it up is more mysqld than ndbd, but we'll know that real soon, after I have tested with multiple mysqld. ndbdmt is also something to try, as well as even more NDB connections (currently I use 4).

Johan Andersson said...

The NDB numbers are a bit low. However, for this kind of app I would go for the Asynch NDBAPI or the memcached API, not SQL that is for sure.

Also, NDB does not store VAR* attributes fixed size.
This was changed several years ago. So a varchar(255) will use 6 bytes if you write 6 bytes + one or two bytes for length (which is the case for any DB) and + 2B overhead for a hash table entry (stored on the 32KB page) pointing out the start of the record's varsized section on that page + one or two bytes pointing out the offsets in that varsized section for each varsized attribute.

Long explanation but that's it IIRC :)

Matthew Montgomery said...

If you are testing NDB for KV store. Here are some things you should do. You should certainly be using ndbmtd with MaxNoOfExecutionThreads = 4. You should have multiple connections performing reads/writes against each mysqld in parallel, but you must also allocate multiple API node slots for a single mysqld and set ndb_cluster_connection_pool to remove contention for that SQL->NDB connection. Increasing ndb_autoincrement_prefetch_sz (32 or more) in the mysqld node(s) can help with multi-row insert performance.

Anders Karlsson said...

Johan: For disk data, VARCHAR is fixed size still, as far as I know, and my initial tests showed that I could not fit them into RAM. I'll revisit this though, and see if I can cram it into RAM. Amd I'll test NDBAPI eventually for sure, but that requires a fair amount of coding before I can try that. But again, NDBAPI sort of isn't my point here. If I wanted to use a non-SQL access method and not have an SQL interface, then MongoDB serves me well. But NDB sort of gives me both, but not without considerable effort, but I'll try it eventually anyway!

Matthew: I have a few thing with NDB to try for sure. I used multiple NDBAPI threads, but now when I think about it, I might have done a mistake when I configures that one, so I'll revisit that one tomorrow.

As I said, I really want this to work!

/Karlsson

hingo said...

Turns out MyISAM doesn't scale so well on modern hardware. To get better MyISAM numbers, you might need to use MariaDB, their fragmented key-cache implementation improved MyISAM performance by several x.

For all MySQL engines, yeah, SQL really adds overhead. If you are doing key-value, then try HandlerSocket (or the Memcache API in 5.6) to get 7x performance improvement. (Which proves that NoSQL really is better, even inside MySQL itself!)

Anders Karlsson said...

No, I had the multiple ndbd connection settings right, I'll try ntdmtd though.

Hingo: I'm not convinced just yet. And no, I do not think NoSQL "is better", What I might think is that a much less functional and much more simplistic design can bring some performance enhancements. If you remove all security equipment and irrelevant stuff like a spare tyre or seats from your car, it might well go faster than before. But I'm not so sure that it is "better". Or even faster for that matter :-)

/Karlsson

Ellis said...

This is how I would've set it up if I were doing this test.

Setting 'unsigned' to the 'id' and value1' columns (unless you are storing signed values in the 'value1' column, then the 'id' would be the only one unsigned). This is matter of being 'proper'.

I would have used Oracle Linux 6 instead of Ubuntu. Also I would have used a real RAID card instead of trying to do it with LVM.

I would have played around with a couple different file systems.

Anders Karlsson said...

Well, a different RAID card wouldn't have made much difference, nor would a diffent file system, as all data was in memory and there isn't any disk-I/O. Ubuntu or Unbreakable Linux? I don't think it matters much in this case, really.
As for the UNSIGNED, well I was using the real world data model, and this is what it looks like, I have just changed the name of some things to protect the innocent. And "value1" really isn't unsigned. The id column current contains only positive value, that is true, but that doesn't mean that it being SIGNED is not a concious decision.

Cheers
/Karlsson

hingo said...

No really. Surely you are aware of Yoshinori's tests with HandlerSocket: http://yoshinorimatsunobu.blogspot.fi/2010/10/using-mysql-as-nosql-story-for.html

Pay attention to the oprofile outputs. The dead time really comes from SQL parsing and optimization.

Anders Karlsson said...

HandlerSocket I am aware of, yes. Parsing and Optimization sure takes it's toll, but I am surprised that there is such a big difference between, say MongoDB and MEMORY, I hope there is more to it than that.

Mikael Ronstrom said...

Hi Anders,
I have no idea whether the numbers for NDB are ok or not since I haven't quite understood what your queries are.

Anyways some suggestions:
1) To see where your bottleneck is use top. If ndbd is around 100-110% or at least close to this then that's your bottleneck and you need to move to ndbmtd. Given that your machine has up to 16 parallel threads working, I would think you can probably set MaxNoOfExecutionThreads even to 8. I would also recommend locking ndbmtd threads using LockExecuteThreadsToCPU and mysqld threads using taskset or numactl, this always gives me a nice boost in performance.
2) mysqld can usually use about 150% of CPU per ndb cluster connections (so in your case with 4 connections you should be able to get CPU usage of mysqld up to about 600%). If this is the bottleneck then simply use more connections (should be ok at least up to 8 or 16).

Finally you mentioned in an earlier post that you used ndb_force_send=0 and found better performance with this. This means that we do more batching, this is normally better when using NDB API directly sometimes, but for MySQL Server ndb_force_send=1 is normally better, this is why this is the default.

You mentioned ndb_use_exact_count=0 makes for a better performance, this is true, even the manual says:

Forces NDB to use a count of records during SELECT COUNT(*) query planning to speed up this type of query. The default value is ON. For faster queries overall, disable this feature by setting the value of ndb_use_exact_count to OFF.

So with default setting one get very good performance of SELECT COUNT(*).

Anders Karlsson said...

Mikael!

I have been testing a few more things, and I don't get the performance I want, although it's better now. As for the quesries, this is simple stuff, I try to use MySQL / NDB as a simple key value store, so I do SELECT * FROM EHERE id = , where id is the primary key. The table has one BIGINT (id), one decimal and one double value. That's it.

With NDB I really should get better read performance than this. Fast is, as it stands now, I am writing much faster than reading. There is overhead in mysqld of cource, but thar should affect InnoDB etc as well, and NDB is behind them in terms of performance still.

I hope to have some NDBAPI tests ready soon, and I am sure I will get better numbers there. But I am still curious why I don't get better performance than this with NDB / mysqld.

Cheers
/Karlsson