- Avoid using the disk by instead putting as much data as you can in RAM
- Use faster disks (like SSDs)
Like this: As the CPU cores are getting faster, but not that much faster, anymore, due to physical limitations, we have more and more of these CPU cores instead. And suddenly, any limitation in getting those CPUs to work well together suddenly turns into a major headache! Like a mutex shared by all threads. Like the Query Cache mutex in MySQL for example!
With this in mind I am now ready to commence with the benchmarks that I wrote about in may. Yes, it took some time to get the data loaded into MySQL and in the process I managed to create a new Open Source project for exporting and importing JSON data from and to MySQL With that in place I now have something that is sort-of real world data. I had to remove a few columns (or Attributes of you are a JSON taliban) to make MySQL Cluster work with this data, this because MySQL Cluster stores VARCHAR data as fixed length data on disk which means a few things:
- There is a lot more stuff to write to disk.
- UTF-8 means that there is 3 times more data to write!
`id` bigint(20) NOT NULL,
`value1` int(11) DEFAULT NULL,
`value2` double DEFAULT NULL,
PRIMARY KEY (`id`)
As a side note, without disk data, MySQL Cluster feels MUCH more stable. The occasional data loss and other weirdness that I experienced what I tried to load the table with VARCHAR data are now completely gone. So not only does disk data limit you on in terms of data types (VARCHAR), it also seems to need some more development. And no, I wasn't in the mood of reproducing the errors I got.
Anyway, on my server here at home, having an 8-Core AMD CPU and 16 Gb RAM, waiting to run this benchmark. I'm testing MySQL with InnoDB, MySQL Cluster and MongoDB. The test program is the same in all cases, I read 1.000.000 rows 10 times distributed over 100 threads. To be fair to all, I made sure that what data I had would fit in memory and that it WAS in memory, so I did a couple of warmup runs first. In the case of NDB I used the MySQL API, not NDBAPI (I will try this eventually). The results I got was this:
- MongoDB - 110.000 reads per second
- MySQL with InnoDB - 30.000 reads per second
- MySQL with NDB - 32.000 reads per second
This is not the end of this story though, if we assume, as MySQL was way behind MongoDB in terms of performance, but InnoDB and NDB were on par with eachother, one might at least want to try the theory that it is the MySQL part that is slowing things down, and this can be tested by running MySQL / NDB with more than one mysqld, and that is the next thing to try. Then we have the HANDLER interface and proper NDBAPI also, the latter should be a lot faster, really. And yes, I really should see what MyISAM can do for me. And MariaDB.
And before I end, please MySQL Cluster developers, remove that silly 512 Mb HASH index size per partition limitation (see the manual here). As RAM get's less and less expensive, and if we, for example, wanted to avoid Network and Disk I/O as much as possible, and instead use RAM and CPU (like in an Amazon environment), then this turns into a rather sreious limitation. My tests above was on hard iron though.
Who will do some more benchmarking tomorrow.