Monday, June 4, 2012

Setting up MySQL Cluster 7.2

I decided to try the claim that MySQL Cluster is a great Key-Value store. I have been thinking about trying this for some time now, my JSON import tool for MySQL was a starting point for being able to cram some JSON data from our MongoDB Key-Value store into MySQL.

I tried this tool with an InnoDB table, and that worked great, importing 100.000.000+ records at about 10k records / s, all CPUs going at full speed. As JSON is native to MongoDB, I assumed it would be faster and consume less resources there, and that was true, 22k records / s were imported into MongoDB, but I still think my tool worked well.

Now, having done all that, and having all the necessary tools ready, it was to to set up MySQL Cluster on this box. I have 8 cords and 16 Gb RAM on it, so a 20 Gb table should be too bad, at least not if I use Cluster disk storage. But it was some time ago since I used MySQL Cluster, so after downloading Cluster 7.2.6 I decided to follow the manual. After all, I was running MongoDB in a pretty standard off-the shelf config, and that worked fine, as did MySQL Community edition, so MySQL 7.2.6 shouldn't really have much issues here. Right?

Well, I was wrong. Or rather, the docs were wrong. And still the defaults for many of the MySQL Cluster parameters seems to indicate that this is meant to run on an Andriod based Cellphone rather than a mid-range home-brew server. All this not considering the incorrect syntax used in many example.

After downloading MySQL Cluster 7.2.6. I followed the manual for installing the thing, and even at that state I fealt that I was in for a bumpy ride here. MySQL 5.5 Section 17.2 MySQL Cluster installation says. "This section describes the basics for planning, installing, configuring, and running a MySQL Cluster.". Fact is, it does not, rather it gives and overview of some of the architectural aspects and some hardware requirements for MySQL Cluster, and covers nothing on planning or installing MySQL Cluster. This is not to say this section is bad, it's just that it's not about what it say it should be about.

OK, I'll live with that, lets move on. I downloaded a tar-file, and I know how to deal with that, but just to figure out if there are any specifics, I have a look in the docs, as this is Ubuntu Linux I look in 17.2.1.1. Installing a MySQL Cluster Binary Release on Linux. This is some of the things mentioned on here that I take to heart. Not. If I did, stuff would work. Like this little command:
tar -C /usr/local -xzvf mysql-cluster-gpl-7.1.23-linux-i686-glibc23.tar.gz
Why the manual wants me to unpack a MySQL 7.1 tar file is beyond me, I don't think that would end up being a 7.2 Cluster. The Management nodes are set up like this, according to this document:
tar -zxvf mysql-5.1.61-ndb-7.1.23-linux-i686-glibc23.tar.gz
cd mysql-5.1.61-ndb-7.1.23-linux-i686-glibc23
cp bin/ndb_mgm* /usr/local/bin
This is just wrong, again. MySQL 5.1 has nothing to do with this. And why would I necessarily want the executables in /usr/local/bin.

Again, I can live with it, but I don't much like it, as someone else, who actually CAN benefit from MySQL Cluster would be put off due to issues like that. But as for myself, I move on to section 17.2.3. Initial Configuration of MySQL Cluster. Here, the config.ini settings are again incorrect, among the issues I found was that the portnumber setting (for the TCP DEFAULT section) is now obsolete, and for the management node the configdir setting, which is required, is missing.

Having fixed all this and possibly a few more issues that I have forgotten about, it was time to create some tables. As I was planning to store data on disk, I had a few more steps before I could create the table to load data into, I had to create a LOGFILE GROUP and a TABLESPACE for my disk data, where there is a working example in section 17.5.12.1. MySQL Cluster Disk Data Objects. This section has a sample configuration section where a disk based is set up. OK, fine. Time for setting up a LOGFILE GROUP:
CREATE LOGFILE GROUP lg_1
    ADD UNDOFILE 'undo_1.log'
    INITIAL_SIZE 16M
    UNDO_BUFFER_SIZE 2M
    ENGINE NDBCLUSTER;
Well that didn't work at all, you cannot state 16M (for 16 MegaBytes) at all it seems in practice, although the manual says differently. OK, again I can live with it, and I hope that any other newcomers to MySQL can live with some syntax errors in the manual. 16Mb also seems to be very small for a undo logfile. Having fixed all that, I was now able to create the LOGFILE GROUP and the same issue applies to creating TABLESPACEs, but by now I knew how what was wrong and how to fix it. I an then progress to create my table, using the usual CREATE TABLE command, but using the STORAGE DISK and TABLESPACE options. Hey, it worked, I have a MySQL Cluster disk based table!

I started loading data in MySQL Cluster. MyJSON loader failed, running it, it seemed to hang MySQL Cluster, I have yet to determine what the issue is, but I'll get to that real soon. Instead, I decidedd to load data into MySQL Cluster using the native MySQL method, by first mysqlexport data from my INNODB table, and then doing a mysql command line import (without the CREATE TABLE statement in the export file of course). This is running now, and it is dead slow. I'll have to figure out why. But the DataMemory and IndexMemory have been set to 2Gb and 1 Gb respectively, which isn't much but should be enough. Also, I have configured MySQL Cluster without replication to speed things up. I will eventually get this data loaded, but this method is way too slow, so I have to think up something else. Now I get a "table is full" after som 2 million rows (I have 100 million rows to load). Memory usage isn't high, and the data files on disk should be big enough (32 G) so I need to investigate what is going wrong. One thing is clear though, getting MySQL  Cluster going is WAY more complicated than either MySQL or MongoDB. And note I have done it before, although it was some time ago since I did it last.

Hence the story of this benchmark continues in another blogpost.

Cheers
/Karlsson

8 comments:

Massimo Brignoli said...

Hi Anders,

remember that all the indexes and indexed columns will be stored in memory. the ordered indexes in the data memory and the hash indexes in the index memory.

I don't know the table schema but probably the data memory is too small to store 100 millions rows.

x said...

Table full probably from not having enough table fragments per LCP. This can be hacked by using MAX_ROWS in the CREATE TABLE or a optimize table. it's a pain sometimes. Easier option is to use multithreaded data node, configure for more LCPs per node and run multiple nodes per machine.

Karlsson said...

Massimo!

yes, I know how Cluster stores data, but the table in full is only some 20 Gb in Size. 100 millions rows with a single bigint (which is my PK) is less that 1G, and I have 16G available, even with considerable overhead, it should fit. 20 Gb of data isn't that much, mind you, and neither is 100'' rows, it 2012 now, not 1912!. Really, I'm trying to be nice to MySQL Cluster here! I'd really want it to beat MongoDB (which loaded data real nice and fast on this box, with very little, hardly any, tuning). I really would want Cluster to work, but I want it to work out of the box to compete with, say, MongoDB, as a KVS.

And by the way, some of the defaults are just plain crazy. What is heavens sake is a 2 Mb undo buffer size doing except in your average wristwatch?

But I'll try. And try again.

/Karlsson

wwwted said...

Hi Anders,

Thanks for your valuable information regarding our misstating in the documentation, we are going to fix this.

The "table is full" problem is probably limit on rows per partition, as stated in manual this limit is about 47.000.000 rows per partition so I presume you have a 2 node cluster? You can have a maximum of 8 partitions per LDM and this can be configured when you create your tables with either MAX_ROWS or CREATE TABLE ... PARTITION BY KEY(key) PARTITIONS X;

If you run ndbmtd you can specify multiple LDM if you need more partitions, we support a max of 8 partitions per LDM.

Regarding performance issue I have to little information to give you any exact guidance, if you are running everything on a single box I would suspect that the disk is fully utilized during load of data, if you have auto_increments please increase ndb_autoincrement_prefetch_sz to get more speed loading data.

If you need support you are most welcome to contact me directly!

Kindest regards,
Ted

Mikael Ronstrom said...

Looks like we should make sure our docs guys looks into all the issues you found in the docs.

If you use 100 million rows, you will need about 1.5GB of index memory. So the 1GB index memory isn't sufficient for 100m rows. Each hash index entry takes approximately 15 bytes (pretty sure you'll find this in the docs).

For the data memory part I don't know if you used any ordered indexes on the data. For each ordered index you add on the table there will be 10-15 bytes per record needed. In addition there will be a part of the table that resides in memory, this contains a number of headers that are required for online operations, transaction handling and a reference to the disk part and so forth. This header is around 20 bytes per record. So assuming you have 4 bytes of PK, 1 ordered index, 16 bytes of in-memory attributes to be stored in memory. Then you need 15 bytes per row in index memory and 10-15 + 4 + 16 + 20 = 50-55 bytes per row in DataMemory. So to handle 100 million rows you should set IndexMemory to at least 2G and DataMemory to at least 6G. In addition you probably want to upgrade the config parameter for the disk page buffer which is 64M by default, probably you want at least a few gigs.

Karlsson said...

Ted!

Thanx for the offer for support, but I'm OK for now.

Mikael et all!

Thanx ofr the input. I was actually running with 1Gb IndexMemory per node, and I have 4 nodes on 1 box. Still, performance way below expectations, but disk page buffer might do the trick. That this is 64 Mb by default is just plain weird though, who in these days would allocate a mere 64 Mb for database buffers in a server? 64 Mb is nothing!

And I will continue testing, although to an extent i have accomplished my mission, which was to test if MySQL Cluster was as good a KVS as, say MongoDB, and it's not. It might have better performance with some tweaking (I sure hope so), but for the general KVS use case, an easy to set up and manage environment, MySQL Cluster still falls down. But as you all have been so nice in helping me here, I will do four things:

1) Try until I really have my data loaded (I run it again, and failed on table is full. I don't know why and I don't know how to figure out what was "full". It wasn't Index or Data memory anyway).

2) Do some simple read.only benchmark and compare with MongiDB and MySQL with InnoDB.

3) Suggest a few improvements. What I have in mind doesn't require code changes, which is a good thing. What is bad is what I have to suggest has been suggested many times before, without much happening. But I can be pretty persistent.

4) Keep on blogging about it. And if you beeive that any marketing is good marketing, this is a good thing.

/Karlsson

hingo said...

Anders, you should save yourself some trouble and use Severalnines configurator (or sandbox, if you want to use a single machine). I use it all the time just to get good default configuration, and I have completely forgot about how to actually tune MySQL Cluster thanks to Severalnines. Yet people think I'm a big shot cluster guy, I'm not. I cheat :-)

Karlsson said...

Hingo!

Valid point. Although what I wanted to test was how good MySQL Cluster was out-of-the-box as a KVS. But maybe this will help me? I'll give it a show. Which means I will cheat, but I'm apparently in good company :-)

/Karlsson