Wednesday, December 29, 2010

Ubuntu Upstart for automatic MySQL start and stop - Part 2

I've done a little change to the upstart script I published in a recent blogpost. The issue is with mysqld_safe which has a few issues it seems, but so far I haven't tracked them down exactly. Anyway, if mysqld_safe is not in your path and you run it with the full path (as I do in the Upstart script which is run by root that may well not have the mysql bin directory in it's path), then mysqld_safe will behave strange in at least 5.5 when called with the full path, i.e.
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
will not work. If I then put the appropriate directory in my path, like this:
PATH=/mnt/mysql/bin:$PATH
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
It still will not work as expected. In the first case, it complains that it cannot find my_print_defaults, and then just goes on, seemingly using some default values, error logging to /usr/local/mysql/data and trying to run /usr/local/mysql/bin/mysqld, and before you ask, yes I have set things like ledir and basedir in the [mysqld_safe] section in the .cnf file.

Anyway, the fix for all this is to run mysqld_safe with the current directory being the same as basedir (i.e. one step "above" where mysqld_safe is located). To tell Upstart to run in a specific directory, you use the chdir stanza. So my scripted ended up looking like this now:
#
# MySQL Service for Recorded Future
#
description "MySQL Server"
author "Anders Karlsson, Recorded Future"

start on (net-device-up
and local-filesystems
and runlevel [2345])
stop on runlevel [016]

expect fork
kill timeout 2

# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007

chdir /usr/local/mysql5.5
exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf >> /tmp/x.out &

post-start script
loop=600
# Wait for MySQL to start.
while [ $loop -gt 0 ]; do
if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
break
fi
loop=$(($loop - 1))
sleep 1
done
exit 0
end script

# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
exec kill -SIGTERM `cat $MYSQL_PIDFILE`

# Wait for MySQL to end. Flushing buffers and all.
loop=600
while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
if [ -e $MYSQL_PIDFILE ] ; then
loop=$((loop - 1))
sleep 1
continue
fi
break
done
end script
And lastly, again before you ask, don't try the easy way out here:
chdir $MYSQL_HOME
It will not work. And as usual, debugging this with Upstart is nearly impossible. If you try using the above syntax, all you get is an information that the script would not run (and, yes, the console Upstart stanza is more or less useless. Why I cannot tell Upstart to output to the current terminal is beyond me, and why there is no verbose mode, I do not know either).

I will check up the issue with mysqld_safe and send a bug report to MySQL (if it isn't there already). As for upstart, I am so annoyed by it at this point in time, that I'm not sure I have the energy to post a bug report, as I cannot even tell what so-called "feature" is most annoying. But again, this may be because of my limited upstart exposure. Upstart is sure different from working with chef though (which I am also new to). One single, minor, error in chef, and you get an error listing the size of Encyclopedia Britannica. Whereas Upstart on the other side would say something like "job failed", and nothing more, if it was managing the Titanic.

/Karlsson

Saturday, December 25, 2010

Ubuntu Upstart for automatic MySQL start and stop

Here at Recorded Future we use Ubuntu (running on Amazon EC2), but so far we have not explored Ubuntu Upstart that much. During the holidays I made an effort to get acquainted with Upstart and to implement proper MySQL start and stop with it.

If you do not know Upstart, this is the way you start and stop services in Ubuntu, and it serves the same purpose as the old /etc/init.d scripts, but are a bit more structured and powerful. That said, Upstart is regrettably far from complete, although the functionality is much better and Upstart has some cool features, some things do not work that well. For one thing, documentation, where it exists, is useless, at best. Secondly, there is very limited ability to test and develop Upstart scripts. And this is made worse by the fact that the documentation is so bad. Another thing is that Upstart insist on stopping services, by default, by sending a brutal kill signal. Not good for databases, mostly.

In the /etc/init directory are the Upstart scripts you have. In difference to the old init.d scripts, you cannot disable a service in Upstart curenntly. If it is in /etc/init it will be started at system start. That's it. And this is something that I am sure will be fixed, but for now, again, is something we have to live with. Upstart scripts have the suffix .conf (don't ask me why), so the default MySQL Upstart script, for example, is called /etc/init/mysql.conf.

In an Upstart script, there are Stanzas that determine what to do. Like the exec Stanza that runs a program for example. And you may then ask, when is it run? Startup? Shutdown? And the answer is startup. For shutting things down, as I said before, Upstart will by default just send a kill -9 signal.

The minimal startup script you can have, and this actually works in a reasonable way, is to just have one line with an exec stanza, like this:
exec /usr/bin/mydaemon
Which will start the daemon. For stopping the daemon, Upstart will send a -9 signal to the started process by default, and nothing more is needed in the Upstart script.

For MySQL, we need to make things a bit more complicated. The default mysql.conf Upstart script really is not good. For one thing, it will not do a controlled shutdown of MySQL (this is possible even if Upstart will eventually send a kill -9 anyway). Secondly, this script assumes that what we use is a standard Ubunty installed MySQL distribution, so if you have installed MySQL in /usr/bin/mysql5147 or somethings like that, you are out of luck.

So what I wanted to create was an Upstart script for MySQL that fullfilled these requirements:
  1. Starts MySQL automatically.
  2. Waits for MySQL to be available before exiting.
  3. Be configurable to support different MySQL install locations, data directories etc.
  4. Do a clean shutdown of MySQL when stopping the MySQL services.
Before I show you what I ended up with, I want to comment on the points 2 and 4 above. With Upstart, you can define a script or command to run just before or after a services has been started or stopped, and this is what I use to wait for MySQL to become available, and to send a SIGTERM to the MySQL Server when stopping (which will do a clean MySQL shutdown).

So here we go, a complete MySQL Upstart script, the way I want it to work:

#
# MySQL Service for Recorded Future
#
description "MySQL Server"
author "Anders Karlsson, Recorded Future"

start on (net-device-up
and local-filesystems
and runlevel [2345])
stop on runlevel [016]

expect fork
kill timeout 2

# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007

exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf >> /tmp/x.out &

post-start script
loop=600
# Wait for MySQL to start.
while [ $loop -gt 0 ]; do
if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
break
fi
loop=$(($loop - 1))
sleep 1
done
exit 0
end script

# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
exec kill -SIGTERM `cat $MYSQL_PIDFILE`

# Wait for MySQL to end. Flushing buffers and all.
loop=600
while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
if [ -e $MYSQL_PIDFILE ] ; then
loop=$((loop - 1))
sleep 1
continue
fi
break
done
end script


To be honest, this is not what I create for all our MySQL servers. Instead I used this to create a chef template, chef is what we use for configuration management here (see http://www.opscode.com/ for more on chef), and here it is put to good ude to generate an Upstart script for MySQL. The above is just an example.

/Karlsson

Monday, December 20, 2010

Using the right tool for the job at hand - MongoDB, MySQL and Sphinx

You may have seen my posting regarding "eventual consistency" some months ago, and you may have come to the conclusion that I was insisting that a SQL based RDBMS is the way to go for just about anything. Tell you what, that is not so. And nether am I against using. say, MongoDB, where it is appropriate.

The whole deal with Eventual consistency is something that I am still opposed to, I want to know if my data is consistent. And I am not not sure that you cannot have a fully consistent, distributed system either. But I guess that debate goes on. And I still want my base data to be consistent. Like in RDBMS-SQL-Foreign-keys-all-over-the-place-and-not-a-bl**dy-bit-lost-in-the-MyISAM-swamp consistent. That is what I want the base data to look like. And if there are compromises with this, which it may well be, then I want to know about those too.

So, having covered that, what am I trying to say? Well, if you properly normalize your data, then the more you normalize and the more strict you are with data quality, the more troublesome management of that data is going to be, and that is something we have to live with I guess. But if you then are to ask some hefty queries on that data, where the data is organized in such a way to make data real consistent, and the queries just want to data, and the query-side of things really doesn't care about normalization at all, how do you deal with that? One way of course may be to replicate to something more query-friendly, possibly a second MySQL-server or possibly even a bunch of such servers in a scale-out scenario. But your data structures still look really complex, having being built to support storage, update, maintenance and consistency requirement foremost.

At Recorded Future we have taken a different path in our latest release: Choose the best tool for the job at hand. We use MySQL with InnoDB for our data loading and storage. And for that, MySQL worka real well. So we have the data we have collected and processed and organized, structed nicely in an RDBMS.

Now, on the other side of things, where queries are made, things look different, there we want to fulfil 2 needs, basically:
  • Fast quering for data, in out case these are instances.
  • Fast retrieval of attributes of the instances that was retrieved.
The above is fulfilled with 2 supporting technologies, Sphinx for instance querying and MongoDB for instance attribute (anwering the typical kind of key-value-store query: Here is a key, gimme all the data). This is what we are live with now, and so far we are really happy. Mind you, this didn't come that easy, both Sphinx and MongoDB has caused some problems.

Forgetting Sphinx for now, what we do with Sphinx is actually really simple, and concentrating on MongoDB, where we also do pretty simple things, but the requirements and the scale in the case of MongoDB is higher for us. MongoDB so far has performed well for us. We are running in an Amazon EC2 environment, and that has issues of it's own (in particular this seemed to be the case with Sphinx, but they are on the case). As for Mongo, this is so common in EC2 environments so I guess ot has been more tested.

I admit to liking MongoDB. It is lightweight, easy to manage and has some cool features, such as automatic, and reasonably transparent, sharding. MongoDB at first seems to have good documentation, there is a lot of examples and samples. What is lacking is a good reference though, and a good administration guide is also missing and the Sharding support is still not well documented from admin point-of-view. The JavaScript orientation in MongoDB at first was annoying to me, but increasingly I find it useful. One thing to note about MongoDB is that it needs it's memory: the more the better. For good performance with MongoDB you really want to avoid disk accesses (in particular in EC2 environments). What we do to avoid disk-accesses as much as we can, we have gzipped the data items. We store data in JSON format (yes, we are a JSON shop, sorry XMLers out there) and compressing data fast and having a fast JSON parser is crucial. And yes, we DID think about the case with frequently accessed data: we have the option, on an instance-by-instance basis, to store data as plain JSON or compressed. Plain JSON has advantages, such as as MongoDB is using JavaScript, JSON is pretty much Native to MongoDB, making certain operations real easy.

We are always on the lookout for new technologies, and we do try many things, but the current setup is really useful and we do get much better performence and scalability. And yes, we do get both, with the same number of servers, we get better performance, and much better distributed load of the machines. Now we are waiting for Amazon to fix their disk IO and Network issues.

Your truly
/Karlsson
Hope to see you in Santa Clara in April at the MySQL UC!

Thursday, December 2, 2010

Working with MySQL on Windows - Part 1

This is the first post in an attempt to write about how to get along with MySQL on Windows. Before I start, I want to stress that the focus here is seasoned Windows users getting started with MySQL. As for MySQL, I am prett much Operating System agnostic, having used Linux / Unix for some 25 - 30 years ( started using Unix as of Version 6. That's OLD). But I have also been a Windows user and developer since Windows 3.0 days.

Also, when I say I'm a Windows developer, let me tell you where I come from. I come from learning Win16 in C. Hardcoded message loops and message handlers and that kind of stuff. And that is how I still develop on Windows, but these days with Win32 / Win64 and exposing more Windows APIs. So I am no C# or .NET expert by any means.

So, you are ready to get started with MySQL on Windows? OK, first you should download it, possibly from here: http://dev.mysql.com/downloads/mysql/. And no, you don't have to pay anything, MySQL is free for us developers. And no, you will not need to compile anything from source or something, quite the opposite, MySQL comes with a very nice installer. Just download the MSI installer package for your platform (32 or 64 bit that is), and get started.

Usually when Installing MySQL, you can choose most of the defaults. After the installation, a configuration application starts, and you can usually choose most of the defaults for a starter here also. One things that your might want to look at is the character set selection. Latin1 is one option that is usually OK and that is similar odl and tried Windows CP1252. It's good enough as a start. If you plan to develop Web applications, then you might want to choose UTF8.

OK, assuming all went well, where do a get started with really using MySQL? Well, there isn't much in terms of a Windows GUI for MySQL installed as part of MySQL. And not only that, if you are a Visual Studio user, possibly developing in C#, there is very little in Visual Studio that shows that there is a new database server installed? Well, this we will fix, all this stuff sure is available from MySQL, you just have to know where to look.

First, as for a MySQL GUI for Windows, you have a few choices. If you want a complete MySQL design and SQL tool, then look no further than MySQL Workbench from MySQL themselves. This is a database design tool for MySQL that is loaded with cool features. Again, you can download it and use it at no cost, it is available here: http://dev.mysql.com/downloads/workbench/5.2.html

There are alternatives to MySQL Workbench, and many of them are also free and Open Source. HeidiSQL is one such option, and I have a tool myself called MyQuery. Both of these are a little less ambitious than MySQL Workbench, but also has features of their own. HeidiSQL is very much DBA focused and has loads of DBA style functions. My own MyQuery is focused on working with MySQL Scripts and shines when it comes to extensibility among other things. MyQuery can be downloaded here: http://sourceforge.net/projects/myquery/

But what about your applications? How is a database connection to MySQL achieved? Well, there are several ways to connect to MySQL. When you download the MySQL Server, it comes with a C API. If you are an old man like me, that might well be the way to go. The C API is OK and is well documented here: http://dev.mysql.com/doc/refman/5.5/en/c.html. The C API comes in static and dynamic shape. If you don't know the the difference or don't really care, then use the dynamic build. Using the static library is more demanding. MySQL itself, including the C API, is built with Visual Studio, so if that is what you use for your application (and I guess you are), that is OK. And you can do with the Express editions of Visual Studio.

Now, chances are you are not an old C hacker like myself, but a more modern person, using C#, Visual Basic or Java. If that is the case, MySQL provides help even here. If you are using Visual Studio with any of these languages, you want to download and install MySQL Connector/Net. A Connector is what MySQL calls their database drivers. And yes, even if you are on Java you want the .Net connector. The reason is that the .Net connector also includes the Visual Studio integration, although here it doesn't seem that the Express editions are enough. Connector/.Net can be downloaded here: http://dev.mysql.com/downloads/connector/net/.

And then for Java, you need Connector/J, available here: http://dev.mysql.com/downloads/connector/j/

Now, one word of caution before I leave you for this time: Both Connector/Net and Connector/J does some cool things for you. If you look at how MySQL works, or have a look at the MySQL C API, you realize that MySQL has some very particular ways of dealing with some things, like array inserts, prepared statements and things like that. To make life as a MySQL developer easier, both of these connectors do some very cool things to isolate you from that. Also, these connectors to an extent address some things beyond MySQL itself, such as high availability. And thirdly, tuning the Connectors is often just as important as tuning the server! What I am saying here is, in short, something as unknown and arcane as read the documentation. In particular this is true for Connector/J, which has many parameters, is very advanced, but also very fast and can help you speed up your application if you know how to work it.

Bye for now, I'll be back in a jiffy with some more Windows specific MySQL notes
/Karlsson