Thursday, November 28, 2013

MariaDB Database Disk snapshot backups on Windows

Using disk snapshots to perform MariaDB backups has become more and more common, but this seems to have been limited to Linux and Cloud environments. There seems that there has been a notion that Snapshots cannot be done on native Windows, the way we can do snapshots using LVM on Linux for example. This is actually False, if you use the NTFS filesystem snapshoting is built in and has been so for a long time, the base for it is the Volume Snapshot Service (VSS) which was introduced as far back as Windows XP.

Using Snapshots to backup on Linux or Windows works well, assuming that the storage engine you use can recover on startup, the backups created this was are physically consistent but not logically, but InnoDB will recover this. Also note that any warnings about using snapshots with MyISAM also applies to the mysql database, i.e. the database metadata. Usually though, this is not an issue, but if you want to protect yourself even more then use a FLUSH TABLES WITH READ LOCK and backup the MyISAM tables separately somehow. But this is no different between Windows and Linux in this case, this is an attribute of any kind of disk snapshot backups.

VSS is a services that run on Windows and to utilize it some programming is needed, but luckily that has already been done for us, there is a simple utility called VSHADOW.EXE available from Microsoft that can be used to access the VSS service. Some version of Windows, such as Windows Server 2008, Windows Server 2012 and also with a version of Windows that is very little used, probably there are more users of Windows 3.10, than of Windows 8, come with DISKSHADOW.EXE, a utility that does the same thing, basically, but has a different interface. In Windows 7, that I tested on, there is VSSADMIN.EXE instead, but as VSHADOW.EXE can be added, I downloaded that one as it is more functional (It is part of the Windows SDK). All this mess (c) Microsoft.

So, how does this work then? As I said, this is different than on Linux (what? Windows being different than Linux? No kidding?). Using the API you can create two types of snapshots, persistent and non-persistent. The difference is that a persistent snapshot is one that remains until specifically removed, whereas a non-persistent one goes away when the program that created it exists. As we are about to take a backup here, I guess you think we want a permanent one, but that is not so, as VSHADOW can run a command file for us, before it exists.

I will use an example here, for that I have placed my MariaDB data on the E-drive in a directory called \data, which means my my.ini has this line in it:
[mysqld]
datadir=E:/Data


Next question then is where the snapshot is placed? Well, the snapshot ends up as a Windows device with a name along the lines of \\?\Volume{5b3610ce-d2cb-11e2-a649-2cd05a8ad0c9}\ which is just as easy to remember and type as the novel War and Peace (Yes, Tolstoy was a Windows guy, I guess you didn't know that!). Yes, again, thank you Microsoft, Windows is SOO user friendly! Anyway, yet again VSHADOW helps us out, as it can create a command file for us that will contain the names of the devices and some other things. It can look something like this and I have called it vss-setvar.cmd:
@echo.
@echo [This script is generated by VSHADOW.EXE for the shadow set {c8fcb322-ce02-409a-b609-7fc04311fad1}]
@echo.

SET SHADOW_SET_ID={c8fcb322-ce02-409a-b609-7fc04311fad1}
SET SHADOW_ID_1={11243eaf-0234-4784-92e1-9bc24668b5b4}
SET SHADOW_DEVICE_1=\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy8


OK then, so this is generated for us, how do we use that? Well, what we do is create a command script that is called by VSHADOW as I said before, and this calls the generated scripts to get all the details. An example script, that I call e-backup.cmd, can look like this:
call vss-setvar.cmd
@echo off
dosdev T: %SHADOW_DEVICE_1%
robocopy T:\Data C:\Backup /MIR
dosdev /r /d T:


There are still a few missing bits here though. The call command is calling the generated file as above (you can call this anything you want, see the example command line below) and what about the command DOSDEV? What is that, that is a simple little utility you can find on the net (Google is your friend) that allows you to create a DOS device from a device descriptor, and it can also remove them. There are other ways of dealing with this, but I choose this path my my example here. Robocopy you probably know already and the /MIR option is used to tell robocopy to mirror a full directory tree.

So what the command script above will do is create a DOS device called T: for my snapshot, copy that to a directory called C:\Backup and then remove the T: device.

Now we have a command script (not two, the first one I showed was an example of a generated one) and an idea what it does, to do a shadow copy now, we just need to call vshadow, and it will do the work for us. We pass three parameters:
  • The name of a script with the device names and things to generate.
  • The name of a script to run
  • The name of the drive to create a shadow device of.
Using the example above to create a backup of the E-drive, where I have my MariaDB datafiles in the \data directory, this is the command I use:
vshadow -script=vss-setvar.cmd -exec=e-backup.cmd E:
So that's about it, now you know how to create an online snapshot backup of a MariaDB database on Windows.

/Karlsson

Monday, November 25, 2013

Workaround for CURRENT_TIMESTAMP as default for a DATETIME value

Currently you cannot specify the current time as a default value for a DATETIME column. It works for TIMESTAMP but not for DATETIME,and even with TIMESTAMP this only work for the one timestamp column in the table, if you have more than 1 TIMESTAMP column you have to specify a default for all but the first.

This is fixed in MySQL 5.6 and MariaDB 10 which allows you to use a construct like this:
mysql> CREATE TABLE td1(
  c1 INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  c2 DATETIME DEFAULT NULL());

So what can you do if you are on MariaDB? Well, you can use a trigger to achieve a similar result:
mysql> CREATE TABLE td1(
  c1 INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
  c2 DATETIME DEFAULT NULL());
delimiter //
mysql> CREATE TRIGGER td1_ins BEFORE INSERT ON td1 FOR EACH ROW
BEGIN
  IF new.c2 IS NULL THEN
    SET new.c2 = now();
  END IF;
END// 
delimiter ;

And then this works:
mysql> INSERT INTO td1 VALUES(NULL, '1960-04-10 16:45');
mysql> INSERT INTO td1 VALUES(NULL, NULL);
mysql> INSERT INTO td1(c1) VALUES(NULL);
mysql> SELECT * FROM td1;
+----+---------------------+
| c1 | c2                  |
+----+---------------------+
|  1 | 1960-01-04 16:45:00 |
|  2 | 2013-11-23 15:57:51 |
|  3 | 2013-11-23 16:00:55 |
+----+---------------------+
4 rows in set (0.00 sec)



Is this perfect? Nope, but it works while we wait for MariaDB 10.

/Karlsson

Tuesday, October 29, 2013

Bet the company! Just do it!

The term "bet the company" means that a company, large or small, puts everything it has in one big move. Betting the company when the company is small is one thing, this happens every day (you can say that MySQL did this, but they did it when they were small and and there wasn't really much to bet with). It is true that Henry Ford bet the company on the T-Ford, but at that time Ford really wasn't that big when Henry Ford started building the Model T using the Assembly Line which meant that the cars could be made much less expensive.

But how often do you see a major company betting itself on one single product line? Not that often I say, but that might be because I do not know industries outside IT and Cars that well, and also you tend to forget the failures and the associated companies.

One reasonably well-known failure was the Schlitz Brewing Company that went downhill in the early 1980's after betting on new technologies in making beer to produce a less expensive beer to produce and increase margins. In the process though, the beer, even for American beers at that time, went of to taste even less, if that is possible and also the beer looked awful. So Schlitz lost it's customers and went downhill and was acquired a few times. Their most innovative Brewery, as far as technology goes, still exists, is owned by Anheuser-Busch and produces, well, you can guess it by now, Bud Light.

Anyway, enough of interesting things such as cars and beer, let's get back to boring computers again. Some of the biggest bet-the-company project we have seen are in the IT industry, one is when Microsoft decided to go with Windows, skipping out of the development of OS/2. If you look back at it now, and consider that at that time, IBM was still the dominant player and also that IBM had brought Microsoft to the position it was in (by giving them the right to print money, sort of, in the shape of allowing Microsoft the rights to DOS), this was pretty risky bet by Microsoft, but it worked out. There were more IT bet-the-company projects in the 1980's, from Commodore and Atari and so on, but they were largely failed. Also, we have Apple giving us the Mac, but in this case, Apple as a corporate entity never was betting on the Macintosh until it had proven itself in the marketplace.

But I think that one of the most substantial bet-the-company projects ever, in any industry, was when IBM bet on the 360 architecture in the beginning of the 1960's. At the time IBM had a huge array of conflicting and incompatible lines of computers, and this was the case with the computer industry in general at the time, it was largely a custom or small scale design and production industry, but IBM was such a large company and the problems of this was getting obvious: When upgrading from one of the smaller series of IBM computers to a larger one, the effort in doing that transition was so big so you might as well go for a competing product, the effort was actually similar, you had to redo everything. So IBM was getting some serious competition from the "BUNCH" (Burroughs, Univac, NCR, CDC and Honeywell) as they were called later when General Electric and RCA left the field of computing in the late 1960's.

Yes, the IBM 360 was truly revolutionary. It was a whole new range of computers, including software, hardware and peripherals. It had a whole bunch of features we now take for granted, like the 8-bit byte addressing (yes, this was not an obvious feature and the term "byte" itself, although not invented for the 360, was first used by IBM and was made 8-bits and nothing else with the 360-series). The risk that IBM took with the 360 series was enormous, and this is another factor of what bet-the-companies all about: taking a really big risk. Looking at what Xerox did, or rather didn't do, with PARC, explains a big chunk of why PARC was a failure, in terms of turning Xerox around, and why the 360 was not. The 360 really did turn IBM around, and many at IBM was not happy with the way that the 360-series was developing and what it was doing to existing IBM product lines, but the IBM management at that time stood by their decisions. As for Xerox in the 1970, they were much less supportive of PARC (and this is a gross understatement).

Anyway, the IBM management that dared taking the 360-project is to be applauded, and I think we need more of this kind of daring management, stuff is just happening too slow and it's time for a quantum leap. And although there is a risk to betting-the-company, the gains can be enormous and the effects of NOT taking that risk is also betting-the-company (look at those office equipment manufacturers that didn't dare to go into computers).

I think this is only of the issues with many Open Source projects, they are run and supported by people using the project the way it works today, and hence has limited interest in taking a huge risk at some new, unknown technology that currently has limited value for them. As Henry Ford said: "If I had asked people what they wanted, they would have said they wanted a faster horse".

/Karlsson

Wednesday, October 16, 2013

MariaDB Dynamic Columns client API

I have blogged on using MariaDB Dynamic Columns already, and I hope this was useful and introduction. I have a few more things on this subject though, but one so far little known and used feature is the Client API for Dynamic Columns, see the MariaDB Knowledge Base for details. What this is all about is that Dynamic Columns were originally envisioned as a means of managing the "dynamic columns" used by Cassandra when using the MariaDB Cassandra Storage Engine. But as we realize, this is the server side of things (the Storage Engine) but there is a corresponding client library also, that is part of the MariaDB Client API.

As you have seen if you have read my previous blog on this subject, or whatever is written about MariaDB Dynamic Columns elsewhere, which is not much, MariaDB actually has no external representation of the dynamic column: Either you get the binary representation or you parse the binary representation using one of the supplied functions (like COLUMN_GET to get the value of an attribute in a dynamic column) or you get the binary. The only exception is the function COLUMN_JSON that retrieves the whole dynamic column and converts it to JSON, supporting nested objects also. Regrettably, there is no built-in means of adding a dynamic column in JSON format (Ulf Wendel has provided some JSON User Designed Functions though, but they don't line up with Dynamic Columns on the other hand).

Now, if we assume we are not using the COLUMN_GET function or something like that, can I programmatically parse a dynamic_column on the client? Well, I could sure use the server functions again, calling COLUMN_LIST and functions like that from a SELECT statement or in a Storage Procedure (I'll show this later), but what is real handy (if you are a C-developer like myself) is the DynamicColumns API functions in the MariaDB Client Library.

To begin with, these functions are not as easy to use as it might seem in the Knowledge Base article mentioned above. There are some bugs in what files are required that currently makes it kind of messy to build programs that use the Dynamic Columns API (which brings up another question: How many times can you write Dynamic Columns in one single blogpost?) This will be fixed eventually (it is known, reported bugs we talk about here), but until it is, let me show you some examples, so you get the point of this API.

To begin with, the API assumes that data passed to it is in the form of a DYNAMIC_COLUMN struct, which is a string with a specified length, i.e. it is NOT a null-terminated string but instead it is really, truly binary. This might be silly, but it is just how things are. So the DYNAMIC_COLUMN struct is set up with the data you got from a MariaDB Dynamic Column binary data. Still, this might not be valid Dynamic Column data, so to check that you should call the mariadb_dyncol_check() referencing this data to make sure it is valid Dynamic Column data.

The function that you then want to use is by far the most powerful one, mariadb_dyncol_unpack(). This will unpack a MariaDB dynamic column are retrieve the basic information on it:
  • How many attributes there are (how many "Dynamic Columns")
  • What the names of those attributes are.
  • The the values of those attributes are.
The values are returned as DYNAMIC_COLUMN_VALUE structs and the names as MYSQL_LEX_STRING structs. Most of this is described in the knowledge base article mentioned above.

I spent some time playing with this, most of that time was spent understanding how the API was meant to work and how to build an application on it. One night, after some heavy drinking, I decided that the best way to show this would be to write some code, instead of just blog about it on and on, so still severely drunk I wrote a simple tool for extracting Dynamic Column data from MariaDB and show it in some different ways (JSON, pretty JSON, Indented dynamic columns etc). To allow you to build it yourself I used autotools, but fact is to use that, you have to also copy some includefiles from the MariaDB source distribution (the reason is those bugs I mention above).

So there is a pre-built executable (I built this on an Ubuntu 10.10) that assumes you use the MariaDB 10.x Client Library (I used 10.0.4). If this doesn't work, as I said, you can always build it yourself (which isn't a throughly tested procedure). Also, for any of this to work, you need the Jansson JSON library.

I'll develop some more Dynamic Columns related tools eventually, but for now this is what I have in the dyncoltools toolset, the dyncoldebug tool available on sourceforge.

/Karlsson

Friday, September 27, 2013

Optionutils 2.7 released

This is not big deal, but I have published my generic option file and commandline handler in a new version. If you have used any of my C-programs that are on sourceforge, I think you have seen that most of them use this, in one version or another. It was a long time since I did a generic upgrade of this though, and all utilities seems to have had it's own version embedded, so I have now created a version with most of the new things in it, and I have also updated the documentation.

The cool thing with this library is that it is generic and largely follows the MySQL / MariaDB style of options and config files. Handling commandline options and config files in C is otherwise something you largely have to do yourself. This little library handles all that for you, and it is also quite advanced and has many cool options, such as support for integers and string with proper type checking, array of values, a kind of associative arrays is also supported, as well as configuration files with the MySQL format.

Version 2.7 of optionutil, including full documentation, is now available for download at sourceforge.

/Karlsson

Using 4-byte UTF-8 (aka 3-byte UNICODE) in MariaDB and MySQL

As I wrote in a previous post, MariaDB / MySQL has some issues with the standard UTF-8 encoding there. This UTF-8 encoding limits us to 3 UTF-8 bytes or 2 UNICODE bytes if you want to look at it that way. This is slightly limiting, but for languages it is usually pretty much OK, although there are some little used languages in the 3 byte UNICODE range. But in addition to languages, you will be missing symbols, such as smileys!

Help is on the way though, in the utf8mb4 character set that is part of both MariaDB and MySQL. This is a character set that is just like the one just called utf8, except this one accepts all the UNICODE characters with up to 3 UNICODE bytes, or 4 bytes using the UTF-8 encoding.

This means that there are more limits to how long a column might be when using utf8mb4 compared to latin1 and even when compared to utf8. Let's try it:

First, I don't want MySQL to substitute any data types for me (why I would EVER want that is beyond me):
mysql> set sql_mode='strict_all_tables';
Query OK, 0 rows affected (0.00 sec)

Then we can create a table to hold a message using the latin1 character set:
mysql> CREATE TABLE msg(msgid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, msgtext VARCHAR(65528) CHARACTER SET latin1);
Now, we know this isn't terribly international, so lets go with UTF-8 instead:
mysql> DROP TABLE msg;
Query OK, 0 rows affected (0.12 sec)

mysql> CREATE TABLE msg(msgid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, msgtext VARCHAR(65528) CHARACTER SET utf8);
ERROR 1074 (42000): Column length too big for column 'msgtext' (max = 21845); use BLOB or TEXT instead

OK, that didn't work, so we'll reduce the size of the msg column and try again:
mysql> DROP TABLE msg;
Query OK, 0 rows affected (0.05 sec)

mysql> CREATE TABLE msg(msgid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, msgtext VARCHAR(21842) CHARACTER SET utf8);
Query OK, 0 rows affected (0.29 sec)

OK, so now we have a table with a UTF-8 column in it. Let's insert some data into it:
mysql> INSERT INTO msg VALUES(NULL, 'Some test message');
Query OK, 1 row affected (0.04 sec)

mysql> INSERT INTO msg VALUES(NULL, 'Some test message with utf8 åäö');
Query OK, 1 row affected (0.01 sec)

Right. That was OK. But what happens if we insert a 3-byte UTF-8 character into this table, say a "smiling face with horns" which is UNICODE 01F608 or as UTF-8 F09F9888. We do this using the MySQL UNHEX function:
mysql> INSERT INTO msg VALUES(NULL, CONCAT('Some test message with 3-buyte UNICODE ', UNHEX('F09F9888')));
ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x98\x88' for column 'msgtext' at row 1


Hmm, too bad, back to the drawing board and let's use the utf8mb4 character set:
mysql> DROP TABLE msg;
Query OK, 0 rows affected (0.08 sec)

mysql> CREATE TABLE msg(msgid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, msgtext VARCHAR(21842) CHARACTER SET utf8mb4);
ERROR 1074 (42000): Column length too big for column 'msgtext' (max = 16383); use BLOB or TEXT instead

Close, but no cigar I guess. We need to further reduce the size of the msg column, as each utf8mb4 character potentially takes up 1 more byte than when using utf8. So try this instead:
mysql> CREATE TABLE msg(msgid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, msgtext VARCHAR(16382) CHARACTER SET utf8mb4);
Query OK, 0 rows affected (0.29 sec)

OK, this worked better, now let's try that insert again:
mysql> INSERT INTO msg VALUES(NULL, CONCAT('Some test message with 3-byte UNICODE ', UNHEX('F09F9888')));
Query OK, 1 row affected (0.00 sec)

Cool, now we have some data in there, let's then just check if this really is a 4-byte UTF-8 encoded character in there.
mysql> SELECT LENGTH(msgtext), CHAR_LENGTH(msgtext) FROM msg;
+-----------------+----------------------+
| LENGTH(msgtext) | CHAR_LENGTH(msgtext) |
+-----------------+----------------------+
|              43 |                   40 |
+-----------------+----------------------+
1 row in set (0.00 sec)

As can be seen, length(which returns the length of the column in bytes) returns a value 3 longer than the number of characters, which is just right!

So in conclusion, as utf8mb4 really is a superset of utf8, should I always use utf8mb4? No, you shouldn't, as in most cases you don't need those 3-byte UNICODE characters in strings for names, places, companies etc. But in this case, when it comes to some free format text that the user enters, then it might be useful. Also, as I have pointed out before, even though this is VARCHAR, which is variable length, there are situations where the full possible length is used, such as when the table data is sorted. And finally, as for collations, these are slightly different than those for utf8, but they are all there, like utf8mb4_bin, utf8mb4_general_ci, utf8mb4_unicode_ci etc.

I'm really bored with character sets and UNICODE and stuff like that now, so my next blog will NOT be about the fascinating subject of Byte Order Marks (BOM).

/Karlsson

How MariaDB and MySQL makes life with UTF-8 a bit too easy. And how to fix it...

UNICODE is getting more and more traction and most new applications, at least web applications, support UNICODE. I have written about UNICODE and related stuff before in Character sets, Collations, UTF-8 and all that but before I go into some more specific and some issues, and fixes, let me tell you about UNICODE, UTF-8 and how MySQL interprets it. See the blogpost linked to above for more information on the subject, surprisingly even more boring, on Collations.

So, let's begin with UNICODE. UNICODE is a character set that is very complete, you should be able to make yourself understood in any language using the characters from this vast character set. This is not to say that all characters from all languages are in UNICODE, some are missing here and there and sometimes new characters make their way into languages, like different currency signs and stuff like that. In general though, you can use all the more than 100.000 characters in UNICODE to express text in any language.

So, then we know UNICODE; what is UTF-8, UTF-16 and all that? Those are ways to represent the bytes that make up UNICODE. The smartest, and most common, of these, by far, is UTF-8. To represent the highest byte-value of a UNICODE character, you would need 3 bytes for every character, but this is wasting space. Also, this would mean that all old program code that assumes that a character is 1 byte only (with 7 bits representing the good old US ASCII characters, but there are local variations of this, or interpreting up to 8 bits as latin1). Ken Thompson fixed this for us though with the UTF-8 representation.

UTF-8 is variable length, and the way this works is that if the first (highest) bit is 0, then this is a character in the range 0-127. If you stick with US ASCII, you can say this is actually UNICODE represented as UTF-8, as the 128 "first" caharacters in UNICODE are the same as 7-bit US ASCII. No issues there. When there are more than 1 byte to the character, then at least the first bit is 1 and the number of leading 1's, up to the first 0, tells how many following bytes there are. So the first byte either starts with 0, in which case there is 1 byte only, or with 110 (10 would mend 1 byte, but that starts with a 0). This in turn means that the first byte never starts with the bit pattern 10, so 10 is used as the leading bit pattern for any of following bytes. Using this scheme you can't represent as many bytes as you want, as you run out of 1's in the first byte, but enough bytes for UNICODE at least! And it's US ASCII compatible.

Potentially you should be able to have 8 bytes in total (all bits set to one in the first byte and then 7 following bytes). As the following bytes all start with 10, only 6 bits are significant, and as we have used up all the significant bits in the first byte, this means that we can now represent 7 * 6 = 42 bits of data or 5 bytes + 2 "extra" bits. But the scheme was never intended to go this far, originally it was envisioned that UNICODE might use all up to 6 bytes in the UTF-8 encoding (this doesn't mean that we would have that many characters, not all code-points are used as characters are organized in separate codepages). Eventually it was agreed that UNICODE would only use up to 21 bits or slightly less than 3 bytes, which encoded as UTF-8 requires up to 4 bytes.

So, now back to good old MySQL. MySQL does support UNICODE and it supports UTF-8, right, so we are all set then? Nope, I'm afraid not. You might have seen that MySQL makes space 3 * in a CHAR or VARCHAR column, to make sure that a 3-byte UTF-8 character can fit. In a VARCHAR that might not be significant, as it is variable length anyway? Right? Nope, depending on the MySQL Storage Engine, VARCHAR is actually allocated to the full potential length. OK, but you use InnoDB anyway, which doesn't behave like this, so you are in the clear? Again, I have to disappoint you, unless you never sort on that column, you need consider this as sorting involves sorting the potential length of data! Also, if you use the NDB Stoarge Engine and use Disk Data, you have the same issue (see: http://dev.mysql.com/doc/refman/5.6/en/mysql-cluster-disk-data-storage-requirements.htm).

But there is another thing that might have caught your attention in the above section. I write that MySQL allocated 3 bytes per UTF-8 encoded UNICODE character, but I also say that UNICODE, when encoded as UTF-8, might occupy up to 4 bytes? What is going on is that the MySQL UTF-8 encoding is limited to 3 bytes, which means that MySQL can, when using UTF-8, not represent the whole UNICODE character set, specifically, it is limited to 4 + 6 + 6 = 16 bits or 2 bytes. This might not be a big deal, but it depends on your situation. I was reading up on this on Wikipedia for example, and it says "Three bytes are needed for characters in the rest of the Basic Multilingual Plane (which contains virtually all characters in common use[11]). Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters and various historic scripts and mathematical symbols." but this isn't altogether true, but someone who implemented the original UTF-8 probably read something like this.

What one has to understand is that UNICODE isn't only about languages, it's also about symbols, such as currency signs etc. But again, most of this also lives in UTF-8 3-byte space (even new ones do, as there is unused space in UNICODE). But then Facebook and Twitter comes around and screws it all up and nothing works again :-( suddenly the use of Smileys exploded. And this is where trouble starts. The simple smileys, like :-) all fit within 2 bytes of UNICODE, but the more advanced ones do not! Oh my!

There is help on the way though, as MySQL also supports a full UTF-8 encoding called (in MySQL lingo then) utf8mb4. Now you know what that is used for!

But I'm going to close here, and follow up real soon with a blog specifically on using utf8mb4. Fun, isn't it!
/Karlsson

Thursday, September 19, 2013

MariaDB Dynamic Columns

MariaDB has a feature called Dynamic Columns which is not in MySQL, and this feature requires some explanation. It is used for example by the Cassandra Storage Engine, which is also unique to MariaDB, and as this is a schema-less database, which means we need a way to handle the fact that one one end MariaDB has a fixed set of columns defined by the schema, and on the other end, Cassandra provides any kind of attribute that the developer feels he wants to have for a particular "row" (which is a row in MariaDB but is not called a row in Cassandra).

But ignoring the Cassandra engine for a while, let's look at what us mere mortals, brought up on mothers milk, SQL and Relational Theory under the watching eyes of E.F. Codd, can use this for, and fact is that it can be quite useful. All in all, what Dynamic Columns provide here is a means of adding non-schema data to a row in a structured way, you know where you used to emulate an array or something in a VARCHAR or use full blown Anchor Modelling for properties.

Which is not to say that Dynamic Columns is the best things since sliced bread, but it does have some good uses, so let me show you how they work. To begin with, a Dynamic Column contains structured data in the form of one or more pairs of <attribute name> = <attribute value>. In MariaDB 5.5, <attribute name> was numeric, in 10.x it is a proper name as it should have been from the start. All examples here assume you are using MariaDB 10.x.

 The attribute names are not predefined or in a schema or anything, each instance of a Dynamic Column (i.e. every Dynamic Column in every row) might have different attributes with different values and types. And yes, is typed, but typing is automatic per instance of a Dynamic Columns (i.e. every Dynamic Column in every row of the table might have different type for an for the same ). The possible types are:
  • String (Binary or Normal)
  • Date
  • Datetime
  • Decimal
  • Double
  • Integer (Signed or Unsigned)
  • Time
  • Dynamic column
 As can be seen from this, nesting is actually available, i.e. might be in the form of another Dynamic Column. So let's dig in with some simple examples.

First, lets create a table that holds some useful attributes, in this case I have a table with my users, and for each user I want to store a range of mixed attributes that the application finds useful, but which are not in the normal TABLE schema, such a birthplace, car make, etc etc.
CREATE TABLE users(userid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, username VARCHAR(255), firstname VARCHAR(255), lastname VARCHAR(255), attrs BLOB);
Now we have to insert some data into this table. To create a Dynamic Column value you use the COLUMN_CREATE function which takes a even variable number of arguments, which are name value pairs. For example:
INSERT INTO users VALUES(NULL, 'bloggs', 'Joe', 'Bloggs', COLUMN_CREATE('shirtsize', 'L', 'colorpref', 'red'));
INSERT INTO users VALUES(NULL, 'doe', 'John', 'Doe', COLUMN_CREATE('shirtsize', 'L', 'vegetarian', FALSE));
Now we hopefully have some data in the table, and as you can see, we are mixing schema data with schemaless data in the same row. Before I end for this time on this subject, let's get a look at that data in the attrs column. There are a few different means of doing that, either you can get a specific Dynamic Column value, like this:
SELECT COLUMN_GET(attrs, 'colorpref' AS CHAR) FROM users;
+----------------------------------------+
| COLUMN_GET(attrs, 'colorpref' AS CHAR) |
+----------------------------------------+
| red                                    |
| NULL                                   |
+----------------------------------------+
2 rows in set (0.00 sec)

Or I can get all the attributes of a dynamic column in one go, in which case I get JSON back:
SELECT COLUMN_JSON(attrs) FROM users;
+-------------------------------------+
| COLUMN_JSON(attrs)                  |
+-------------------------------------+
| {"colorpref":"red","shirtsize":"L"} |
| {"shirtsize":"L","vegetarian":0}    |
+-------------------------------------+
2 rows in set (0.00 sec)

That's it for now, but I'll be back with some more details and some cool things you can do with MariaDB Dynamic Columns!

/Karlsson

Thursday, August 22, 2013

Big Data.. So what? Part 2

Sorry for this delay in providing part 2 of this series, but stuff happened that had really high priority, and in addition I was on vacation. But now I'm back in business!

So, last time I left you with some open thought on why Big Data can be useful, but that we also need new analysis tools as well as new ways of visualizing data for this to be truly useful. As for analysis, lets have a look at text, which should be simple enough, right? And sometimes it is simple. One useful analysis tool that is often overlooked is Google. Let's give it a shot, just for fun: if I think of two fierce competitors, somehow, that we can compare, say Oracle and MySQL.. Oracle is much older, both as a technology and as a company and in addition owns the MySQL brand these days. But on the other hand, the Web is where MySQL has it's sweet spot. Just Googling for MySQL and Oracle shows that MySQL seems to be much more discussed (and no, I haven't turned stupid just because I was on vacation, so I realize that there are many sites which doesn't talk about MySQL but have a "powered by MYSQL" text or something or it is actually not shown to the end user, but anyway) "Oracle" gets 165 000 000 hits, whereas MySQL gets 226 000 000.

One issue is that this is not terribly interesting in this case. If I am a small company with a little known brand and I am trying to make my name known, then this is not a bad way of getting some kind of measurement on how I am doing, but in any case, this tells me nothing about what is being said, if it's good or bad, if it's said by some powerful dude, like "Vladimir Putin says that MySQL sucks" for example, or if it's just yours truly: "Swedish Ponytailed Geek says MySQL is actually pretty much OK, but not as OK as MariaDB". You see my point here, we have the data but we must analyze it better, even though this simple analysis is useful in some cases.

So, what kind of analysis do we do here? Text analysis you say. Maybe we could analyze the text and look for negative or positive sentiments. This is exactly what advanced text analysis tools do, but they aren't foolproof, and much less so then they used to be surprisingly. The reason is cyber-language. Analyzing an editorial article in the New York Post is what these technologies rock at, but things such as blogs, facebook and twitter is much more difficult.

But I don't give up easily on my tools, let's for fun try to use Google to look for sentiments. Let's search for "Oracle sucks" and "MySQL sucks" and see what we get. We already know that a simple search for MySQL has many more hits than plain Oracle. So when I add slightly negative word to my search, will this relationship remain? The answer is no: "Oracle sucks" gets 1 950 000 hits, whereas "MySQL Sucks" gets 1 650 000 hits, i.e. significantly less than Oracle. Does this tell us something then? Well yes, to an extent it does actually. Not so much so I want to bet all my money on it, but combined with some other knowledge, this can turn out to be useful.

For another dimension of data, one that Google isn't terribly good for, but we can at least play with it a bit and using Google isn't too expensive either! What we are to look at here is something that is difficult to determine from a document, even when using advanced analysis tools, which is the date and time. Look at a standard article in some newspaper, just look at the top of the page and you see when the article was published. But then the article itself might mention some date previously that the article really is about, and it is not about current time at all. This is easy to figure out with the naked eye, but analyzing this pragmatically is much more difficult, and it gets even worse with relative dates, like "yesterday" or "in a few days" or so.

But let's give Google a shot here. I go to Google to search and instead of just searching I first click on the "Any time" drop down, select "Custom range" and select 2003-01-01 til 2003-12-31 and then I search for Oracle and MySQL. Now I see Oracle getting 243 000 hits and MySQL 204 000. Then we see what the situation is today and do the same search for "Past year" and this time Oracle gets 94 000 000 results and MySQL only 17 000 000.

Why are these numbers so much smaller for MySQL now? I guess because the high numbers come from MySQL referencing pages that really has no actual MySQL content. I don't know, I'm just guessing here. Or maybe we are reaching the limit of what Google can do for us here.

See, I have now written a whole blog on just text analysis, and I have given some simple examples. Look at what can be done with image and video analysis. Or with some more dimensions, such as who wrote the text or shoot the video? Right? And if we combine all this, can we agree that we get some interesting insights? And still, we have not really visualized it in some cool way. In my mind, the most interesting with all this is that what we get back isn't very hard facts, far from the normal Data Warehouse statistics ("We had 25 751 orders last year"), and as the result is fuzzy, where we get some insights, but far from all, it triggers my curiosity. What query can I ask now when I am armed with this new, fuzzy, insight? Yes, Big Data and Big Analysis really is there to triggers us to think up new questions to ask.

I'll do another blog in this series in a few weeks or so!

Cheers
/Karlsson

Monday, August 5, 2013

Don't let Technophobia kill innovation

What? Me? technophobic? I have the latest iPhone, my office is jam packed with USB gadgets and my car is a Prius, how much more techno friendly can one get?

That is all fine, but looking beyond fun technologies that we play with just for fun, or natural, but cool and useful, evolutions come to most of us easily. But can you honestly say (I can't) that you always look at the promise of a new technology and never have never looked at it not from the point of view of the obvious new advantages, when the technology has developed into something useful, and instead just looked at it and judged this new technology only from it's first, shaky, implementation?

When I was in my early teens (which occurred around the time just after Mayflower had arrived in New England) my family moved into our first own house. My parents were running a restaurant at the time (they ran one or the other all through my childhood) and my mother had seen most of the weirdo Heath Robinson designed (TM) commercial and domestic kitchen appliances, and when we first entered our new home and mum looked in the kitchen and realized there was a dishwasher in there, her first reaction was "Well, I'm never going to use that one". One month later, the dishwasher was working more or less daily, and my mum never did any dished by hand.

Many years later, me, her only son, having spent the better part of his life playing with SQL based relational databases (and looking at some of the code in them, I suspect that Heath Robinson is still around, now as a software engineer), started to look at NoSQL databases, and my reaction was largely that of my mums when she saw the dishwasher "Nah, I'm not going to use anything like that. Eventual consistency? What kind of silly idea is that".

Yes, I was wrong, but I am still convinced that NoSQL databases (yes, I know NoSQL is a bad term, but this is monday morning and I don't have enough energy to think up something better) will not replace SQL based system. What I do think is that we need both.

Just as I think my mum got it wrong twice: Yes, the dishwasher really is a good idea, but some things are better handled without is. The results is that there is an abundant lack of sharp knifes in my mums house (as a dishwasher is a really effective knife-unsharpener). My self, I use a dishwasher, but knifes and beer glasses are still, to this day, washed by hand by yours truly (beer glasses and I don't want any left over enzymes in my beer, as they are used to kill bacteria, including the really tasty bacteria that gives beer it's distinctive taste).

Too many words has so far been used to say this: The world needs both SQL and NoSQL databases working together, serving different purposes and applications. As for Eventual Consistency, I still thing this is bogus, just say what it is, no consistency, and live with it, MongoDB, Cassandra and LevelDB are still very useful technologies, as is MySQL. And in many cases you need ACID properties and atomic transactions and all that, but in many cases this is a gross overkill.

Look at something like Virtualization. In that case, I think I looked at it in the right way, looking at the potential of the new features this brought, and not ignoring, but thinking less about the issues with the first implementations (slow I/O, slow networking, complexity of use, complexity of installation etc) and looking at what it could do in terms of cost reduction, effective systems management etc.

Back them, when I was a big Virtualization supporter, many were opposing me with the obvious issues with databases (which is the field where I work, if this wasn't already obvious) which was that I/O was slow and unreliable. Yes it was, but that can be fixed. This is not a flaw with the technology per se, but with the specific implementation and the limitations of the underlying technology at the time. Not everyone needs the highest of high performance, many can do with less. And some can easily scale out to more machines. All in all, many can benefit from Virtualization, maybe more than you think. These days, I think noone doubts that Virtualization is useful.

This is not to say I am always right, but I am not so technophobic that everything that is not something I already know is something that sucks. Also, we should be careful when comparing things. We often compare based on attributes of existing technologies and tend to forget that new technologies might well have virtues of their own (which we do not use for comparison as we are unfamiliar with these features as they don't exist in the technologies we currently use).

I think one technology that is now in a state of being seen as inferior is Cloud technologies. We look at a cloud by taking something we run on some hard iron in-house and throw it at Amazon and look at the result. Maybe we should build our applications and infrastructure differently to support clouds, and maybe, if we do that, a Cloud might well be both more cost-effective, scalable and performant than the stuff we run at our in-house data center.

So don't let new innovative technologies die just because they lack a 9600 baud modem or a serial port. Or because they are no good for washing beer glasses (even if that is a very important dishwasher feature).

/Karlsson

Big Data.. So what? Part 1

This is the first blog post in a series where I hope to raise a bit above the technical stuff and instead focus on how we can put Big Data to effective use. I ran a SkySQL Webinar on the subject recently that you might also want to watch, and a recording is available here:http://bit.ly/17TTQnJ

Yes, so what? Why do you need or want all that data? All data you need from your customers you have in your Data Warehouse, and all data you need on the market you are in, you can get from some analyst? Right?

Well, yes, that is one source of data, but there is more to it than that. The deal with Data is that once you have enough of it, you can start to see things you haven't seen before. Trend analysis is only relevant when you have enough data, and the more you have, the more accurate it gets.Big Data is different from the data you already have in that it is Bigger, hence the name, but not only that. Big Data also contains much more diverse types of data, such as images, sound, metadata and video. Also, Big Data has much more new data coming in and is constantly shifting. Research says that each day some 25 quintillion bytes of data is created, this is 25 000 000 000 000 000 000 bytes, if you ask (which is some 25 000 petabytes or 25 000 000 terabytes). And yes, that is every day. (and yes, this is using 1000 bytes per kb, not 1024 per Kb).

As I already said, what is interesting with such huge amounts of data is that once the volumes are high enough, is that you can infer things that you couldn't with smaller or more focused data. You can infer changes that you couldn't before and in some sense make qualified predictions on what will happen in the world. Does this sound like rocket science? Well, it shouldn't and truth is that we have been doing this in at least one field for a very long time, since the 1950's or so, and this was one of the first application for computers. And no, I'm not talking about Angry Birds here.

What I am talking about is weather forecasting. Using knowledge about how winds blow, temperatures, geographies and statistics, we can reasonably well predict how the weather will be. As we all know, these forecasts aren't always right, but even when they go wrong, we get to know why they went wrong. The way these predictions work is to combine large amounts of data with experience and hard facts on how the weather behaves, and the data isn't directly related to the area where we try to predict the weather either. We can do very little to influence the weather, except of course plan a picnic which is sure to create thunderstorms.

In the case of, say, sales of some consumer product, we are actually able to influence this more than we can influence the weather. And if we then add our knowledge of our market and the dynamics of it and combine that with truckloads of related and semi-related data, why shouldn't we be able to do some predictions. Not in the sense of knowing exactly what will happen in the future, but at least have an idea of what is the most likely thing to happen and have an idea of the likelihood that this will be so. Which is how weather forecasts work.

But this isn't all there is to it. Let's pop back to weather forecasting for a second. The analysis done on weather systems is a lot more complex than that done in most data warehouses, there is more to this than some summaries and averages. Also, the way this is presented: Using a way with an overlay of symbols (a Sun, a Cloud, some poor soul planning a picnic) is different from how we are used to see trend data in our data houses.

Conclusion:
  • We need ways of dealing with large amount of fast moving data - Big Data
  • We need new, better and more specialized analysis - Big Analytics
  • We need new ways to view data - Visualizations
I'll be back soon with something more specific on this subject, so don't touch that dial!

/Karlsson

Saturday, June 22, 2013

Some things never go out of style

Such as SQL I guess. And nerdiness and vi and emacs. Not to mention new potatoes!
Yes, Swedish new potatoes. To be honest, this was the first batch this year, but this was the first proper batch for me, with Dill and some salt and eaten with some butter and an extra pinch of salt. Yummy, better than candy! And yes, this was yesterday, on midsummer eve!

And not only are the potatoes a tradition, my blog about them is one too I guess, and for you newcomers, this started out in MySQL days with a mailing to life@mysql.com and when I left MySQL in 2010, I have continued this as a blog.

And as a final word, this years new potatoes were just as good as ever. My twins are real cute and love them dearly, but they do take a lot of time and energy from me and the wife (in particular the wife), but on the other hand, we have someone to pass the new-potato-tradition on to.

And maybe some of you was thinking there was going to be any potatoes this year. Well, you were wrong, the blog about them was just a bit late.

Cheers folks, and have a great summer
/Karlsson

Friday, May 31, 2013

MyQuery 3.5.2 Beta available

I have just posted version 3.5.2 of MyQuery on sourceforge. This version adds just one feature: The ability to run LOAD DATA LOCAL INFILE and in the process adds a few features to this, such as a progess window and the ability to stop a running load.

Download and enjoy!

/Karlsson

Monday, May 27, 2013

JSON datatype in MariaDB prototype

I have patched up a MariaDB version with JSON support, just for the fun of it. This is not the best version of MariaDB around, as I am not a MariaDB developer by any means, and although I have played with the MySQL and MariaDB sources before, I have never attemped to look like I know it in detail. So although this works, it's probably full of memory leaks, errors and bad code (my additions, that is).

That said, it actually works. Surprise! For a simple prototype, that is to show off a specific feature.

So, this is what I have: I grabbed MariaDB 10.0.2 sources and worked from there. To support the JSON specifics, I included the Jansson C-library for JSON. So far so good, then I wanted a JSON datatype, that was the first step. Adding a new datatype to MariaDB / MySQL is a pretty major undertaking though, so I decided to try something different, I decided to kidnap an existing datatype, preferably one with attributes that I could reuse for my JSON type, so I ended up grabbing the little used MEDIUMBLOB datatype.

I also decided to use plain JSON text as the representation, and not use some compacted binary representation, the only compaction I do is to remove duplicate members and unnecessary spaces. So for a JSON object like this:
{"fname":  "John", "dept": {  "name":  "accounting", "id": 1 }, "lname": "Doe", "fname":  "George"}
I will store
{"fname":"George","lname":"Doe","dept":{"name":"accounting","id":1}}
Which is no big deal I guess. So with this in hand, I can now create a table like this:
CREATE TABLE appuser(id VARCHAR(10) NOT NULL PRIMARY KEY, name JSON);

The JSON datatype checks that the input string is valid JSON, which is different from the way dynamic columns work. Also, the data is compacted as mentioned above.

Let's insert some data to test this out:
INSERT INTO appuser VALUES('johndoe', '{"fname": "John", "lname": "Doe", "dept": {"name": "accounting", "id": 1}, "role": "admin"}');
INSERT INTO appuser VALUES('susanscott', '{"lname": "Scott", "fname": "Susan", "dept": {"name": "services", "id": 2}}');
INSERT INTO appuser VALUES('janeroe', '{"lname": "Roe", "fname": "Jane", "dept": {"name": "accounting", "id": 1}}');
INSERT INTO appuser VALUES('tomdraper', '{"fname": "Tom", "lname": "Draper", "dept": {"name": "management", "id": 3}, "subordinates": ["johndoe", "susanscott"]}');

Following this, we can do some simple JSON manipulation like this;
SELECT id FROM appuser WHERE json_object_get(name, 'role') = 'admin';

And if the name of the 'admin' role changes to 'administrator' we can do this:
UPDATE appuser SET name = json_object_set(name, 'role', 'administrator') WHERE json_object_get(name, 'role') = 'admin';

And to find whoever is supervising Susan Scott, we can do this:
SELECT id FROM appuser WHERE json_array_contains(json_object_get(name, "subordinates"), "susanscott");

Finally, John Doe is actually the supervisor for Jane Roe, so let's fix that:
UPDATE appuser SET name = json_object_set(name, "subordinates", '["janeroe"]') WHERE id = 'johndoe';

I have a few more things in there, and there are many things missing, this was done just to get a feel for things. The main tool for all thsi is Jansson, as mentioned above, and Jansson has a bunch of cool tricks up it's sleeve. We could easily implement merging of object and of arrays, more array management functions and stuff like that.

Again, what do you think? Is this useful? I sure think this is easier to use than the Dynamic Columns (but these have other advantages). I still have things to do, both functions to be added as well as SELECT ... INTO OUTFILE and LOAD DATA INFILE and stuff like that.

Does anyone want to play with this? I do have a x86_64 build if someone wants to try it.

/Karlsson

Sunday, May 26, 2013

SQL and JSON, what do you think?

As you might know, I'm a big fan of JSON. One big reason is that I believe that JSON is closer to most developers view on data, whereas the Relational SQL based model is closer to what someone working with data itself or someone working with infrastructure. What I mean here is that neither view is wrong, but they are different.

So, given that, can we merge the Object JSON world with the relational model? Well, not JSON, but Hibernate does it quite well. This is one of my objects to the NoSQL world, that the datamodel is closely linked to the application at hand, and less so to data itself and to other applications. Stuff such as accounts, privileges, accounting data, orders and many other things are global, and are not specifically connected a specific application, but in many NoSQL applications, this is what it ends up being.

And there are not that many good solutions, how can I easily explore data in a NoSQL database, where the data model is application based, without me knowing the application? See what a high-ranking 10gen person, Dwight Merriman (Chairman of the board) has to say on the subject in an interview in 2011.

On the other hand, we relational folks has a few things to answer to. We insist that we know how data is to be managed (and I think we do) and we seem to ignore the fast that development tools and environments has changed sine the 1980s (we still insist that proprietary client/server protocols is the way to go, and that SQL rules), In my mind, SQL and Relational rules for data, but not necessarily for applications, and we (including yours truly) should wake up and appreciate what is going on here: We need better, more developer focused, means of accessing data, even if that data is relational data.

The NoSQL camp is getting ready to fix what they can fix: proper query languages (Cassandra CQL being one), better data access methods and probably much more that I am not aware of right now.

Which bring me to what we can do to be more developer friendly on the SQL and relational side of things. And the answer is, as I have already hinted, JSON support! MariaDB has a few trucks up it's sleeve, like the Dynamic Columns support. But this is nowhere near enough, the idea is OK but the way to use dynamic columns for a developer doesn't help much, rather the interface is more targeted at making the Cassandra Storage Engine features easier to access from the MariaDB SQL Interface. But I'll do a blog later on the Dynamic Column support and JSON later on anyway.

What I would want, but maybe that's just me, is a proper JSON datatype, with JSON functions, syntax and semantics. That is a starting point I think, and it shouldn't be too difficult. But many other JSON related things that we could do spring to mind:

  • SQL / JSON language mix - What I mean here is that JSON isn't just a string datatype, but that we can include plain JSON, where approprite, in the SQL:
    INSERT INTO table VALUES(57, {"empno": 1, "ename": "Scott})
  • JavaScript stored procedures
  • ROW data as JSON support in the MySQL API
  • JSON column data as ROW data in the MySQL API and tools.
  • A JSON Storage Engine
  • JSON support in LOAD DATA INFILE
  • JSON support in SELECT .. INTO OUTFILE
  • REST / JSON interface to complement the MySQL interface
  • JSON attribute indexing
A then again some stuff I haven't though of so far. A reason for this blogpost if to get your opinion? What do you think? Would this be useful? Would is make MySQL easier to use?

And example why this is useful: instead of having to add columns to a table for some new attribute, attributes that are useful mostly for the applications, is then done by just adding the to a JSON column in the table and the application can add what it wants to that column. We have all seen this, haven't we? Some odd column that the application wants to keep track of in some table, some columns the data is which is pretty uninteresting as data, but is still necessary by the application? This is where a proper JSON type would help?  Also, the MariaDB Dynamic Columns help with that, but I think JSON would be a far more elegant solution?

/Karlsson

Sunday, May 19, 2013

What's the deal with NoSQL?

Everybody seems to be looking at and debating NoSQL these days, and so am I and I thought I'd say a few words about it. Which is not to say I haven't said stuff before, bit them I was mainly targeting specific attributes of many NoSQL solutions (like "eventual consistency" or, as you might call it, "instant inconsistency", What I was opposing is that "eventual consistency" has anything to do with just that, consistency. Rather, what this means is that at any point in time the system is inconsistent, and even if it might be consistent, you cannot rely on it being so. Which is fine, but don't call it consistency, call it inconsistency. Allowing a database to be somewhat inconsistent doesn't necessarily mean that it's something wrong with it).

All this said, what is going on here, why are we MySQL and MariaDB users seeing so many MongoDB, Cassandra and LevelDB applications pop up? Come on, these are typically less functional implementations of a database than even the most basic MySQL setup? No transactions, no joins, no standards etc. etc. And the answer, if you want to hear what I have to say, is ease of use. So let's explore that a bit.

Following the Object Orientation frenzy of the 1990s, when any application project ended up consisting of endless sessions modeling objects, usually involving expensive consultants, dresses in expensive, blue suits. And when that was done (which took years!) you had a way cool object model, but no money left to do the actual implementation, i.e. do the real programming (shiver), and you went to some other project and the nicely dressed object design consultant left to see another OO sucker.

Now, objects are much more standard, even non-OO languages have a big chunk of OO features, and these are used enhance programmer productivity and better code and design. Which is fine (except that if you were one of those OO consultants, which means you are now out of a job, as such mundane tasks of writing is not something you would ever do, such dirty stuff is better left to "programmers". Oh no, I forgot that you are now an ITIL consultant, that just slipped my mind) but how does this relate to MySQL and MariaDB. The answer is that MySQL, which was once considered real easy to use, no longer is as easy as it used to be. The Relational data model is still brilliant when you look at data as data, and that is how many of us look at it, so we go through the process of mapping data to objects, if that is what it takes. SQL and Java, PHP or whatever merges, and the application now contains a layer mapping objects to real data. Or we use hibernate, which does this automatically for us.

But a new cadre of developers are emerging, and they look at OO as natural and they look at objects as data (it's not. Data, in my mind, should be independent from the application using it, objects on the other hand, are closely tied to the application at hand). With which I do not mean that there is something wrong with building applications using objects, quite the opposite. But if all you know is objects, then using relational technology turns difficult, and SQL, for all the good things with it, seems old-fashioned and arcane, which it is (but it is so widely used you cannot avoid it). So you go with something that looks at objects as all you need, and present that in some object format. Like JSON.

And again, there is nothing wrong with that. But if we who are on the SQL and Relational track just discards these NoSQL technologies, we are not making any friends. We have to accept that MySQL and MariaDB really aren't that easy to use anymore, at least not for newcomers.

And then there is another thing: Some data, like Big Data, has attributes that really doesn't fit well in a relational model. Data where the attribute of a value can't easily be determined once and for all, and you need to reprocess that data (large test objects, images and maps are some examples). In this case, you really need to extend the relational model, somehow.

But SQL-based relational isn't going away. The Relational model is still one of the best ways to look at data, it's just that we also need some other ways of looking at data. And it needs to be easier to access. And we shouldn't really have to push SQL down the throat of every single developer, trying to develop an application using some OO technology. The answer is we need both. And these technologies needs to interoperate. I want to use SQL for my data. But I also want JSON and REST for my data. And there shouldn't be much of a performance overhead. All in all, we SQL folks need to wake up and data easier to use again. We know data better than the Cassandra and MongoDB folks. We know transactions better than them too. But they know how to work with developers who doesn't know who The Beatles were and make Relational easy to use for them, without them having to learn JSON (and now having to listen to a tirade about todays youngsters not knowing what real music is and that it died with John Lennon! What? You don't know who John Lennon was! That's exactly what I mean, you have no taste at all!).

Just my 2 cents...

/Karlsson

Friday, May 17, 2013

MyQuery 3.5.1 beta released!

After a lot of fuzz, I am now releasing MyQuery version 3.5.1. This version introduces one big feature, a brand new Dictionary viewer. In addition to that, there are numerous bug fixes and the removal of one feature, which is the option to run with just 1 connection: In this version, 2 connections will always be used, and I have some good reasons to remove this as being optional, fact is, running with 1 connection was hard to diagnose, caused a lot of problems, and had no real benefit actually, just drawbacks.

So, for you Windows users, MyQuery 3.5.1 is now out there, but it is really a beta. The beta is caused by the new Dictionary viewer, the rest should be pretty stable.

Download it from sourceforge.

Happy SQLing
/Karlsson

Monday, May 13, 2013

Version 1.6 of mysqljsonimport now available

Yes, finally! This took some time, but I have been so busy with other things, work-related as well as domestic, that I just haven't had the time for this. But finally version 1.6 is available for download from sourceforge. The downloads is as usual the autoconf enabled source code and PDF documentation in PDF.

So, what is new you ask, well there is one big new feature which took a lot more effort than I expected. When this program was written at first, I still have the table/use use in mind. What this means is that I visioned JSON objects to be mapped to a table. This is not how programmers view JSON, but this is how data is viewed in many databases, even NoSQL ones such as MongoDB. So I wanted an import tool for simple row-structured JSON objects.

Now, there is a different way to look at things, which is to see the data in the JSON file as objects, and each member as one or more rows in a table. This sort of makes up an object (yes, this is very simplistic, but you get the point). So data might look like this:
[
{"nodename": "server1"
 "users": [{"id": "joe", "name": "Joe bloggs"},
    {"id": "sue", "name": "Sue Bloggs"}
  ],
  "hosts" [{"name"; "internal", "address": "192.168.0.78"},
    {"name": "external", "address": "11.186.19.177"}
  ]
  },
{"nodename": "server2"
 "users": [{"id": "dick", "name": "Rickard Bloggs"}
  ],
  "hosts" [{"name"; "internal", "address": "192.168.0.75"},
    {"name": "external", "address": "11.186.19.161"}
  ]
  }
]
Here we would be loading into tables users and hosts and we would load some 7 rows in those two tables. I think what is also clear is that there is a whole bunch of stuff here to make this smarter, like other fields of the object affecting the data that is loaded, either being added to the data or to filter what data is loaded. But none of that is in place right now, for this version, this is just a simple object to table load. The old row-by-row formats are still supported (plain JSON format and Array format).

Also, something cool to add is to add support for MariaDB dynamic colums. I have some ideas here, but I have yet to write the code.

In addition, this release adds a --dry-run option has been added, which allows you to test config files and settings, before starting to load.

I'm planning to write more about MySQL / MariaDB and JSON here eventually, and also about plain JSON, but for now, have fun, take care and happy SQLing.

/Karlsson

Tuesday, April 30, 2013

JSON with MariaDB and MySQL Slides available

As you may know, I'm a big fan of JSON and I am working on releasing a new version of my JSON tools real soon. Meanwhile on the SkySQL Solutions Day on April 26 I gave a talk on JSON with MySQL and MariaDB, and the slides are available on slideshare.

/Karlsson

Friday, April 26, 2013

MySQL on Amazon AWS 101 slides are now online

The slides are here: http://www.slideshare.net/blueskarlsson/my-sq-lonaws101 and if you don't know it already, tomorrow friday April 26 is SkySQL Solutions Day. Registration and Beer is free! I'll be talking on MySQL on JSON but that are many other interesting talks, The venue is the same as the Percona Live conference!

/Karlsson

Thursday, April 25, 2013

SkySQL Solutions day on Friday April 26! Free! As in free beer! Really!

Yes, Percona Live ends on Thursday, but on friday. at the same location, there is SkySQL Solutions day. Attendance is free, and among the highlights yours truly will be speaking! Register here for this free event: http://info.skysql.com/roadshow2013-0 Another highlight of this is that it ends with a Biergarten. Free!

/Karlsson

Wednesday, April 24, 2013

In Santa Clara now. 2 talks coming up

I'm in Santa Clara for Percona Live now, and things are looking good! The announceent on the merger of Monto Program and SkySQL is a good one! If you are around, I'll be speaking on MySQL on AWS on Thursday at 1:50 PM in Ballroom F. On Friday at SkySQL Solutions day (if you are at Percona Live and don't know about this, registration is free! Come see us, the program is here: http://www.skysql.com/content/mysql-cloud-database-solutions-day-schedule) I'll be talking about MySQL and MariaDB with JSON at 11:15 AM in Grand Ballroom B!

I'm also about to release a new version of my MySQL JSON tools real soon!

/Karlsson

Friday, March 29, 2013

See you at the UC in April!

I'll be speaking at the MySQL Conference and Expo on April 22-25 in Santa Clara. On April 25 at 1:50 PM I'll be talking about using MySQL in the Amazon AWS cloud but to be honest, I hevn't done much formal preparations. I will prepare some slides here, but fact is that most of this session will be practical, hands-on stuff. Largely, I'll show stuff that I used when I was Database Architect and Admin for a reasonably large AWS installation.

Anyone telling you that Amazon AWS is just like any other environment, except that disk-I/O is slower, doesn't get it, there is much more to AWS than that. By using the services that comes as part of AWS there are loads on things you as a DBA or Devops can do to simplify and automate everyday tasks. Backups, slave provisioning, availability are things that can make really good use of AWS. So armed with an AWS account and some MySQL instances running there, I'll be showing you some real world examples.

Also, you may ask why I haven't been blogging much recently? If this worries you, I think you should get a day job. Jokes aside, I was testing MySQL replication in real life, but it sort of failed on me, and I ended up with twins alright, but one boy and one girl. These two has taken a lot of my time recently, and the joys of blogging and writing code in spare time was changed to exercising the joy of changing dipers on these little babies:

I will do some more blogging now again though, I have promised myself to do that, but I have loads of other things to do also.

/Karlsson

Monday, January 21, 2013

Talking at the SkySQL Roadshow in Stockholm

SkySQL Roadshow is coming to Stockholm on Feb 7, come by and meet us. I'll be ending the day with a talk on Big Data, which will be a more generic Big Data talk with some MySQL relevance, but with the focus on Big Data in general.

I haven't blogging much recently, but that has some reasons. I am since Dec 1 the proud father of twins, a little boy and a little girl. I have yet to teahc them to write proper SQL, the have particular issues with subqueries, but we'll get there. In order to create the usual mess of things and to make sure things are at the brink of running out of control, we decided to renovate our flat in the middle of all this. But I'll get there, and once we have a new kitchen installed, I'll do some more blogging, I have some things piled up to write about.

/Karlsson

Friday, January 4, 2013

MySQL JSON import / export tools updated

A user of mysqlimport. Josh Baird, reminded me of a feature which I should have added from teh start, but which was forgotten about. The deal is that when you put a bunch of JSON objects in a file, you have a couple of options on how to do this.

The most obvious is maybe to export as a JSON array of objects, like this:
[
{"id":1, "name": "Geraint Watkins"},
{"id":2, "name": "Kim Wilson"}
]
But this is not what mysqljsoninport supported and this is not how, say, MongoDB exports JSON by default. The reason is that for large amount if data this is cumbersome, as what is in the file is actually one big JSON object containing all the data. This is difficult to parse, requires that a lot of data is read and that the object in whole is kept in memory, unless some clever processing is done. And if we are clever, this is still not effective. Rather, what was supported by mysqljsonimport and how MongoDB exports to JSON is as multiple objects without separators, i.e. you read an object, processit, and then you read some optional blankspace until you reach another object, like this:

{"id":1, "name": "Geraint Watkins"}
{"id":2, "name": "Kim Wilson"}

The latter is more effective, but often the former is used also. So mysqljsonimport now supports both formats, and mysqlexport can optionally export as a single JSON array of objects in a file.

Download the most recent version from Sourcefore: mysqlimport 1.5 and mysqlexport 1.2

Cheers
/Karlsson

Wednesday, January 2, 2013

Amazon AWS for MySQL folks - Speaking at Percona Live 2013

I'll be speaking at Percona Live Conference and Expo in Santa Clara (April 22-25 2013) and this time I'll do a different talk from what I usually do. The plan here is to be low-level dirty practical, showing stuff using the Amazon AWS API, writing scripts using them and showing how to use them together with MySQL. I have said it before and I say it again, to get the most from your cloud, you have to understand and use the unique features of the cloud environment you use.

Can you create an elastic MySQL setup on Amazon? What about HA? How can you add slaves seamlessly? And automatically? I'll try to cover and show as much of this is possible, but the presentation is far from ready so I am happy to accept suggestions on specifics to cover here. See some more details on my talk here.

Hope to see you in Santa Clara in April!

/Karlsson