Monday, April 20, 2009

Virident hardware acceleration for MySQL

Today Virident announced a set of servers, called GreenCloud, aimed at increasing performance for MySQL and memcached servers. Last week I got a chance to talk to Vijay Karamcheti and Shridar Subramanian at Virident about their technology and get a preview on what they are up to.

The technology Virident use to improve performance is a third level of memory storage based on Flash. But it goes way beyond just adding SSD disks. To put things in perspective, look at how resources in an average server has developed in the last 20 years or so. We have now something like 1000 times more memory, and 1000 times more CPU performance, but disk performance has increased very little, maybe 5 times, and that is an optimistic number. Note that this is regarding disk performance, available disk storage has increased also 1000 times or so.

What does this mean then? Well disk I/O is an issue, probably the main issue for database performance. Now, database has still gotten faster, a lot so, as we have more memory and can hence cache A LOT more data, which speeds things up enormously. That performance comes from the fact that we can avoid disk I/O.
There are a couple of issues here though:
  • For writes, I still need to go to the disk, independent of how much RAM I have, a disk I/O will still need to happen, to the database or a logfile, but it must happen. The reason is simple. If I put my written and committed transaction in a log buffer in memory, by transaction will not be persisted.
  • Caching of databases only helps so much. Once you have cached up, say, 20 % of the data in the database, further caching will improve performance as much. The reason is of course that data access patterns are skewed, they are not evenly spread across the total size of the database.
So, then, this means that when you have some percentage of the database in the cache, and this is a block cache mostly (the Falcon storage engine has some interesting ways of dealing with this though), then we are still locked into the I/O performance of the disks.

But if we go back 20 years in time again, when we were then compensating for slow disks put caching data in RAM, there were compromises being done. Fast, and random, RAM access as opposed to slow disk block-level access. But what has happened now is that there is an even bigger gap in performance between size of RAM and disk performance. So can we not fill that gap?

Looking at attributes of the two types of memory we are looking at so far, in case of RAM:
  • Is fast and random accessed.
  • But RAM is also not persistent. It is this point that makes disks still so important. Having all the database in RAM is actually possible in many cases these days, but this is not useful, as that data will not be persistent.
Disk storage on the other hand:
  • Is persistent and has higher capacity.
  • But disks are also slow and use block-level I/O.
Looking at this, if we had a third memory level that was faster than disk and persistent, but possibly not as large as the disk system, that would solve a LOT of problems. A database that needs to persist something in a transaction log or a database today, needs to write to disk. The key is persist here. And that is exactly what Virident provides, a Flash memory based system with most of the attributes of RAM, although a fair bit slower (but still WAY faster that disk), which is random accessed, just like RAM (here is a significant difference from SSD disks) and is persistent.

I want to note that there are other ways of solving this problem. One is to do what MySQL Cluster is doing, which is “semi persisting” RAM by synchronous replication between nodes.

As anyone can realize, applications really need to be aware of this “third storage media” that Virident provides to work properly. Virident has a special version of the InnoDB plug-in to handle this. And the known scalability issues with InnoDB are not really present here either, and least to a much larger extent that in “normal” InnoDB, as this is the InnoDB Plug-in with a lot of fixed for this same problem.

And it doesn’t end there. As I wrote above, for the developer this Flash memory has similar attributes to RAM, i.e. it is not a block-level device but random access, and there are no context switching needed! These are the two features that makes this technology stand away from just plugging in SSD disks in any server!

All in all, I’m excited about this, there is a lot of performance potential to gain from this setup. By being able to scale write-performance on a single server to new higher level, means that technologies, in and of themselves good, like sharding, might be needed asmuch anymore. Also, any distributed technology to solve this problem, like MySQL Cluster, has limitations, cache invalidation and distributed locking, none of which makes for high scalability. Maybe Virident technology will be a standard component in any high-end MySQL server eventually?

/Karlsson

No comments: