Saturday, June 9, 2012

No MySQL Cluster, the table isn't bl**dy full! Not even close!

OK, to begin with, I'd really like to try MySQL Cluster as a Key-Value Store. Yes, I have already recognized that MySQL Cluster stores variable length data in fixed length format. And someone should think about how good an idea this really is, in my mind it's just as mindbogglingly efficient, modern and start as storing data on punched cards. Which I assume how the MySQL Cluster sourcecode is managed.

yes, I really dislike the idea of fixed length strings, and more on this later. But think now, I have 2 fields defined as VARCHAR(256) and this is a Web application, then how much space will be allocated on disk for MySQL Clsuter? As many bytes are there are in the string r 256 bytes? Neither actually, this is a web-app, and I know it comes as a surprise to some of you, but Unicode is here to stay, and UTF-8 is way to go. And MySQL assumes that no one will use more than a 3-byte Unicode character (which is an incorrect assummpion, but I can live with it. And 4-byte Unicode charsets are available actually). Anyway, as each character in that case may occupy 3 buyes, MySQL Cluster will allocate 768 bytes for a VARCHAR(256) UTF-8 encoded field. Yes. Yuck. As I have written in many blogs before, disk space isn' much of an issue these days, but writing all that data may well be. So it I put 10 7-bit ASCII characters in a VARCHAR(256) fields, I'm not writing 10 bytes, I'm writing 768 bytes, that makes a difference in performance.

All this aside, I really want to load my data into MySQL Cluster to test it's claims for performance and it's claims as a great Key-Value Store. And now again, it tells me that "The table is full". That MySQL Cluster tablespaces stores data in fixed length format is bad enough, that it insists on storing it in fixed length data files (that takes forever to create) makes it even worse. But I have never been able to store more than about 60.000.000 rows to my MySQL Cluster setup. The error I constantly get is "The table '' is full. Yes, there is no table there, the table name is empty. I have 4 data nodes. I look at my storage and I see this:
mysql> select * from ndbinfo.memoryusage;
+---------+--------------+-----------+------------+------------+-------------+
| node_id | memory_type  | used      | used_pages | total      | total_pages |
+---------+--------------+-----------+------------+------------+-------------+
|       2 | Data memory  | 497647616 |      15187 |  524288000 |       16000 |
|       2 | Index memory | 129687552 |      15831 | 1048838144 |      128032 |
|       3 | Data memory  | 498073600 |      15200 |  524288000 |       16000 |
|       3 | Index memory | 129802240 |      15845 | 1048838144 |      128032 |
|       4 | Data memory  | 497745920 |      15190 |  524288000 |       16000 |
|       4 | Index memory | 129777664 |      15842 | 1048838144 |      128032 |
|       5 | Data memory  | 497680384 |      15188 |  524288000 |       16000 |
|       5 | Index memory | 129679360 |      15830 | 1048838144 |      128032 |
+---------+--------------+-----------+------------+------------+-------------+
OK, what about my data store on disk, maybe I am running out on space there:
mysql> select SUM(FREE_EXTENTS * EXTENT_SIZE) / 1024 / 1024 As "Free MB", SUM(INITIAL_SIZE) / 1024 / 1024 "Size MB", (SUM(INITIAL_SIZE) - SUM(FREE_EXTENTS * EXTENT_SIZE)) / 1024 / 1024 AS "Used MB" FROM information_schema.files WHERE FILE_TYPE = 'DATAFILE';
+-----------------+-----------------+----------------+
| Free MB         | Size MB         | Used MB        |
+-----------------+-----------------+----------------+
| 327935.00000000 | 381472.00000000 | 53537.00000000 |
+-----------------+-----------------+----------------+
Nothing there either, there seems to be plenty of space available. What makes this table "full", please tell me if you know, I'd really like to test this beast!

And then, before I end this post, let me tell you one thing: For the intent and purposes that MySQL Cluster was once designed, Telco-applications, I'm sure it's great. Fact is, I know it is and  few beats it. Secondly, even in Web-space, there are great uses for MySQL Cluster, I know that from first hand experience. But as, say, a substitute for, say, MongoDB as a document store, it's not even close. Not necessarily only in terms of raw performance, but it's just not a good fit for the use case. For one thing, a document is variable in size, for another a simple KVS document store is just that, simple, and if there is one thing that MySQL Cluster is not it's simple. If you want the Ferrari of databases, then MySQL Cluster might be it. Possibly. But if you want the Volvo KVS, then MySQL Cluster isn't it.

And now I'm close to closing, but there is more more thing: No, I'm not giving up. No way! But I just want to know why the table is "full", because MySQL Cluster is wrong, it's not (and when creating the table I told it to hold 150.000.000 rows with and avg-row-size just what it in reality is, so if MySQL Cluster has a problem with this table, it could have said so from the start, when I created the table. Right? If it ignores what I am telling it, why is the syntax there at all?).

/Karlsson

9 comments:

Ivan said...

Anders,

Don't forget to tell us when you will reach you 1 billion queries in a minute!

/iz

Karlsson said...

The one value, looking at ndbinfo.memoryusage, is that DataMemory is getting close to full. Now, I did say i had disk data tables, so this shouldn't matter much, but then I remembered that DataMemory really should be called DataAndUndoMemory, so maybe this was the issue? And IndexMemory is very far from full, so maybe I am running out of UNDO space in memory here (why I could't get an error message indicating this is beyond me, but I guess that is the kind of stuff that makes MySQL Cluster 7.2 GA and "Enterprise"). So I have shuffled the memory settings around a bit and will try again (and why UNDO has to have this much memory, and why it cann't be on disk or be dynamically allocated I do not know. But I fully understand that allocating memory as you need it, and use disk when you don't have memory, I a very advanced concept, which requires some careful design and consideration, just like the, just as modern and technologically advaced, fins on a 1959 Cadillac).

/Karlsson

Bernd Ocklin said...

Use MAX_ROWS on your table. Pls see FAQs:

http://dev.mysql.com/doc/refman/5.1/en/faqs-mysql-cluster.html#qandaitem-B-10-1-14

Karlsson said...

MAX_ROWS was already set. If you read this and previous blogposts you can see that I have set MAX_ROWS way above what is needed, as well as AVG_ROW_SIZE to the max row size. This is pretty annoying, I set MAX_ROWS as 150.000.000 and the table got "full" at 30.000.000. But it seems to have been UNDO space that was "full", as now I am actually close to loading all rows (I'm at 80.000.000, out of a total of slightly more than 100.000.000).

/Karlsson

Bernd Ocklin said...

I see. Looked at your mem usage numbers again and what could have happened is simply running out of DataMemory. You are at 95% usage (15.2k out of 16k):

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-minfreepct

Karlsson said...

Yes, but it took me a while to figure out this was an issue, as I was using Data on Disk. But then I realized that DataMemory included UNDO, and figured this was an issue. And it seemed that could be it. Once I've finished loading so I can do some serious benchmarks, I'll post some more on this.
But as I already said, this is just too much work to get a KVS working in most cases, unless it's really good (limiting the # of rows in a KVS is usually not a good idea either).

On the other hand, most of my complaints, excluding the docs, is of much less significance in a telco OEM application, which is what MySQL Cluster is best at, still. My thinking here was to try MySQL Cluster for something else, so far with little success.

/Karlsson

David Kirchner said...

I am running in to the same error (although I am trying to use Cluster as a regular ol' SQL server). I have plenty of memory and am just importing one of our smaller tables (~3M rows), and the import aborts at around 800,000 rows ("The table is full"). MAX_ROWS is set to 100,000,000 and avg_row_length is reasonable (470).

Disk and Index usage all report 0% (6274/983040, 1385/1310752 respectively) and ndbinfo.memoryusage confirms there's about 32GB available (total - used). I can't imagine 800k inserts generated *that* much undo traffic, but I guess it's possible.

I watched the output of 'select * from ndbinfo.logspaces' while inserting rows. The used/total ratio definitely went up for both UNDO and REDO logs. For UNDO the ratio was 1.3% when the error occurred,up from 0.17% before I started, and below the peak of about 1.6%. The REDO logs peaked at about 15%, but were around 9% when the error occurred.

David Kirchner said...

Additionally, I can insert rows after the table is full command occurs. I'm not sure how many yet. Maybe the import is too fast?

David Kirchner said...

So what's interesting about this is I could insert some rows but not other rows. I don't know why. But what I do know is that when you see The table is full you can get the actual, real error message with the query "SHOW WARNINGS":

| Warning | 1296 | Got error 1601 'Out extents, tablespace full' from NDB
| Error | 1114 | The table 'fooo' is full

Why that error is hidden as a warning is beyond me. Hope this helps others that land on your page.