Wednesday, December 29, 2010

Ubuntu Upstart for automatic MySQL start and stop - Part 2

I've done a little change to the upstart script I published in a recent blogpost. The issue is with mysqld_safe which has a few issues it seems, but so far I haven't tracked them down exactly. Anyway, if mysqld_safe is not in your path and you run it with the full path (as I do in the Upstart script which is run by root that may well not have the mysql bin directory in it's path), then mysqld_safe will behave strange in at least 5.5 when called with the full path, i.e.
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
will not work. If I then put the appropriate directory in my path, like this:
PATH=/mnt/mysql/bin:$PATH
/mnt/mysql/bin/mysqld_safe --defaults-file=/etc/mysql/my.cnf &
It still will not work as expected. In the first case, it complains that it cannot find my_print_defaults, and then just goes on, seemingly using some default values, error logging to /usr/local/mysql/data and trying to run /usr/local/mysql/bin/mysqld, and before you ask, yes I have set things like ledir and basedir in the [mysqld_safe] section in the .cnf file.

Anyway, the fix for all this is to run mysqld_safe with the current directory being the same as basedir (i.e. one step "above" where mysqld_safe is located). To tell Upstart to run in a specific directory, you use the chdir stanza. So my scripted ended up looking like this now:
#
# MySQL Service for Recorded Future
#
description "MySQL Server"
author "Anders Karlsson, Recorded Future"

start on (net-device-up
and local-filesystems
and runlevel [2345])
stop on runlevel [016]

expect fork
kill timeout 2

# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007

chdir /usr/local/mysql5.5
exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf >> /tmp/x.out &

post-start script
loop=600
# Wait for MySQL to start.
while [ $loop -gt 0 ]; do
if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
break
fi
loop=$(($loop - 1))
sleep 1
done
exit 0
end script

# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
exec kill -SIGTERM `cat $MYSQL_PIDFILE`

# Wait for MySQL to end. Flushing buffers and all.
loop=600
while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
if [ -e $MYSQL_PIDFILE ] ; then
loop=$((loop - 1))
sleep 1
continue
fi
break
done
end script
And lastly, again before you ask, don't try the easy way out here:
chdir $MYSQL_HOME
It will not work. And as usual, debugging this with Upstart is nearly impossible. If you try using the above syntax, all you get is an information that the script would not run (and, yes, the console Upstart stanza is more or less useless. Why I cannot tell Upstart to output to the current terminal is beyond me, and why there is no verbose mode, I do not know either).

I will check up the issue with mysqld_safe and send a bug report to MySQL (if it isn't there already). As for upstart, I am so annoyed by it at this point in time, that I'm not sure I have the energy to post a bug report, as I cannot even tell what so-called "feature" is most annoying. But again, this may be because of my limited upstart exposure. Upstart is sure different from working with chef though (which I am also new to). One single, minor, error in chef, and you get an error listing the size of Encyclopedia Britannica. Whereas Upstart on the other side would say something like "job failed", and nothing more, if it was managing the Titanic.

/Karlsson

Saturday, December 25, 2010

Ubuntu Upstart for automatic MySQL start and stop

Here at Recorded Future we use Ubuntu (running on Amazon EC2), but so far we have not explored Ubuntu Upstart that much. During the holidays I made an effort to get acquainted with Upstart and to implement proper MySQL start and stop with it.

If you do not know Upstart, this is the way you start and stop services in Ubuntu, and it serves the same purpose as the old /etc/init.d scripts, but are a bit more structured and powerful. That said, Upstart is regrettably far from complete, although the functionality is much better and Upstart has some cool features, some things do not work that well. For one thing, documentation, where it exists, is useless, at best. Secondly, there is very limited ability to test and develop Upstart scripts. And this is made worse by the fact that the documentation is so bad. Another thing is that Upstart insist on stopping services, by default, by sending a brutal kill signal. Not good for databases, mostly.

In the /etc/init directory are the Upstart scripts you have. In difference to the old init.d scripts, you cannot disable a service in Upstart curenntly. If it is in /etc/init it will be started at system start. That's it. And this is something that I am sure will be fixed, but for now, again, is something we have to live with. Upstart scripts have the suffix .conf (don't ask me why), so the default MySQL Upstart script, for example, is called /etc/init/mysql.conf.

In an Upstart script, there are Stanzas that determine what to do. Like the exec Stanza that runs a program for example. And you may then ask, when is it run? Startup? Shutdown? And the answer is startup. For shutting things down, as I said before, Upstart will by default just send a kill -9 signal.

The minimal startup script you can have, and this actually works in a reasonable way, is to just have one line with an exec stanza, like this:
exec /usr/bin/mydaemon
Which will start the daemon. For stopping the daemon, Upstart will send a -9 signal to the started process by default, and nothing more is needed in the Upstart script.

For MySQL, we need to make things a bit more complicated. The default mysql.conf Upstart script really is not good. For one thing, it will not do a controlled shutdown of MySQL (this is possible even if Upstart will eventually send a kill -9 anyway). Secondly, this script assumes that what we use is a standard Ubunty installed MySQL distribution, so if you have installed MySQL in /usr/bin/mysql5147 or somethings like that, you are out of luck.

So what I wanted to create was an Upstart script for MySQL that fullfilled these requirements:
  1. Starts MySQL automatically.
  2. Waits for MySQL to be available before exiting.
  3. Be configurable to support different MySQL install locations, data directories etc.
  4. Do a clean shutdown of MySQL when stopping the MySQL services.
Before I show you what I ended up with, I want to comment on the points 2 and 4 above. With Upstart, you can define a script or command to run just before or after a services has been started or stopped, and this is what I use to wait for MySQL to become available, and to send a SIGTERM to the MySQL Server when stopping (which will do a clean MySQL shutdown).

So here we go, a complete MySQL Upstart script, the way I want it to work:

#
# MySQL Service for Recorded Future
#
description "MySQL Server"
author "Anders Karlsson, Recorded Future"

start on (net-device-up
and local-filesystems
and runlevel [2345])
stop on runlevel [016]

expect fork
kill timeout 2

# Set variables.
env MYSQL_ETC=/etc/mysql
env MYSQL_PIDFILE=/var/run/mysql.pid
env MYSQL_HOME=/usr/local/mysql5.5
env MYSQL_INSTANCE=my
umask 007

exec $MYSQL_HOME/bin/mysqld_safe --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf >> /tmp/x.out &

post-start script
loop=600
# Wait for MySQL to start.
while [ $loop -gt 0 ]; do
if $MYSQL_HOME/bin/mysqladmin --defaults-file=$MYSQL_ETC/$MYSQL_INSTANCE.cnf ping; then
break
fi
loop=$(($loop - 1))
sleep 1
done
exit 0
end script

# Send a soft SIGTERM to MySQL before Upstart will kill it.
# A Sigterm to mysqld will cause a controlled shutdown.
pre-stop script
exec kill -SIGTERM `cat $MYSQL_PIDFILE`

# Wait for MySQL to end. Flushing buffers and all.
loop=600
while [ $loop -gt 0 ]; do
# If the pidfile is found, then continue waiting.
if [ -e $MYSQL_PIDFILE ] ; then
loop=$((loop - 1))
sleep 1
continue
fi
break
done
end script


To be honest, this is not what I create for all our MySQL servers. Instead I used this to create a chef template, chef is what we use for configuration management here (see http://www.opscode.com/ for more on chef), and here it is put to good ude to generate an Upstart script for MySQL. The above is just an example.

/Karlsson

Monday, December 20, 2010

Using the right tool for the job at hand - MongoDB, MySQL and Sphinx

You may have seen my posting regarding "eventual consistency" some months ago, and you may have come to the conclusion that I was insisting that a SQL based RDBMS is the way to go for just about anything. Tell you what, that is not so. And nether am I against using. say, MongoDB, where it is appropriate.

The whole deal with Eventual consistency is something that I am still opposed to, I want to know if my data is consistent. And I am not not sure that you cannot have a fully consistent, distributed system either. But I guess that debate goes on. And I still want my base data to be consistent. Like in RDBMS-SQL-Foreign-keys-all-over-the-place-and-not-a-bl**dy-bit-lost-in-the-MyISAM-swamp consistent. That is what I want the base data to look like. And if there are compromises with this, which it may well be, then I want to know about those too.

So, having covered that, what am I trying to say? Well, if you properly normalize your data, then the more you normalize and the more strict you are with data quality, the more troublesome management of that data is going to be, and that is something we have to live with I guess. But if you then are to ask some hefty queries on that data, where the data is organized in such a way to make data real consistent, and the queries just want to data, and the query-side of things really doesn't care about normalization at all, how do you deal with that? One way of course may be to replicate to something more query-friendly, possibly a second MySQL-server or possibly even a bunch of such servers in a scale-out scenario. But your data structures still look really complex, having being built to support storage, update, maintenance and consistency requirement foremost.

At Recorded Future we have taken a different path in our latest release: Choose the best tool for the job at hand. We use MySQL with InnoDB for our data loading and storage. And for that, MySQL worka real well. So we have the data we have collected and processed and organized, structed nicely in an RDBMS.

Now, on the other side of things, where queries are made, things look different, there we want to fulfil 2 needs, basically:
  • Fast quering for data, in out case these are instances.
  • Fast retrieval of attributes of the instances that was retrieved.
The above is fulfilled with 2 supporting technologies, Sphinx for instance querying and MongoDB for instance attribute (anwering the typical kind of key-value-store query: Here is a key, gimme all the data). This is what we are live with now, and so far we are really happy. Mind you, this didn't come that easy, both Sphinx and MongoDB has caused some problems.

Forgetting Sphinx for now, what we do with Sphinx is actually really simple, and concentrating on MongoDB, where we also do pretty simple things, but the requirements and the scale in the case of MongoDB is higher for us. MongoDB so far has performed well for us. We are running in an Amazon EC2 environment, and that has issues of it's own (in particular this seemed to be the case with Sphinx, but they are on the case). As for Mongo, this is so common in EC2 environments so I guess ot has been more tested.

I admit to liking MongoDB. It is lightweight, easy to manage and has some cool features, such as automatic, and reasonably transparent, sharding. MongoDB at first seems to have good documentation, there is a lot of examples and samples. What is lacking is a good reference though, and a good administration guide is also missing and the Sharding support is still not well documented from admin point-of-view. The JavaScript orientation in MongoDB at first was annoying to me, but increasingly I find it useful. One thing to note about MongoDB is that it needs it's memory: the more the better. For good performance with MongoDB you really want to avoid disk accesses (in particular in EC2 environments). What we do to avoid disk-accesses as much as we can, we have gzipped the data items. We store data in JSON format (yes, we are a JSON shop, sorry XMLers out there) and compressing data fast and having a fast JSON parser is crucial. And yes, we DID think about the case with frequently accessed data: we have the option, on an instance-by-instance basis, to store data as plain JSON or compressed. Plain JSON has advantages, such as as MongoDB is using JavaScript, JSON is pretty much Native to MongoDB, making certain operations real easy.

We are always on the lookout for new technologies, and we do try many things, but the current setup is really useful and we do get much better performence and scalability. And yes, we do get both, with the same number of servers, we get better performance, and much better distributed load of the machines. Now we are waiting for Amazon to fix their disk IO and Network issues.

Your truly
/Karlsson
Hope to see you in Santa Clara in April at the MySQL UC!

Thursday, December 2, 2010

Working with MySQL on Windows - Part 1

This is the first post in an attempt to write about how to get along with MySQL on Windows. Before I start, I want to stress that the focus here is seasoned Windows users getting started with MySQL. As for MySQL, I am prett much Operating System agnostic, having used Linux / Unix for some 25 - 30 years ( started using Unix as of Version 6. That's OLD). But I have also been a Windows user and developer since Windows 3.0 days.

Also, when I say I'm a Windows developer, let me tell you where I come from. I come from learning Win16 in C. Hardcoded message loops and message handlers and that kind of stuff. And that is how I still develop on Windows, but these days with Win32 / Win64 and exposing more Windows APIs. So I am no C# or .NET expert by any means.

So, you are ready to get started with MySQL on Windows? OK, first you should download it, possibly from here: http://dev.mysql.com/downloads/mysql/. And no, you don't have to pay anything, MySQL is free for us developers. And no, you will not need to compile anything from source or something, quite the opposite, MySQL comes with a very nice installer. Just download the MSI installer package for your platform (32 or 64 bit that is), and get started.

Usually when Installing MySQL, you can choose most of the defaults. After the installation, a configuration application starts, and you can usually choose most of the defaults for a starter here also. One things that your might want to look at is the character set selection. Latin1 is one option that is usually OK and that is similar odl and tried Windows CP1252. It's good enough as a start. If you plan to develop Web applications, then you might want to choose UTF8.

OK, assuming all went well, where do a get started with really using MySQL? Well, there isn't much in terms of a Windows GUI for MySQL installed as part of MySQL. And not only that, if you are a Visual Studio user, possibly developing in C#, there is very little in Visual Studio that shows that there is a new database server installed? Well, this we will fix, all this stuff sure is available from MySQL, you just have to know where to look.

First, as for a MySQL GUI for Windows, you have a few choices. If you want a complete MySQL design and SQL tool, then look no further than MySQL Workbench from MySQL themselves. This is a database design tool for MySQL that is loaded with cool features. Again, you can download it and use it at no cost, it is available here: http://dev.mysql.com/downloads/workbench/5.2.html

There are alternatives to MySQL Workbench, and many of them are also free and Open Source. HeidiSQL is one such option, and I have a tool myself called MyQuery. Both of these are a little less ambitious than MySQL Workbench, but also has features of their own. HeidiSQL is very much DBA focused and has loads of DBA style functions. My own MyQuery is focused on working with MySQL Scripts and shines when it comes to extensibility among other things. MyQuery can be downloaded here: http://sourceforge.net/projects/myquery/

But what about your applications? How is a database connection to MySQL achieved? Well, there are several ways to connect to MySQL. When you download the MySQL Server, it comes with a C API. If you are an old man like me, that might well be the way to go. The C API is OK and is well documented here: http://dev.mysql.com/doc/refman/5.5/en/c.html. The C API comes in static and dynamic shape. If you don't know the the difference or don't really care, then use the dynamic build. Using the static library is more demanding. MySQL itself, including the C API, is built with Visual Studio, so if that is what you use for your application (and I guess you are), that is OK. And you can do with the Express editions of Visual Studio.

Now, chances are you are not an old C hacker like myself, but a more modern person, using C#, Visual Basic or Java. If that is the case, MySQL provides help even here. If you are using Visual Studio with any of these languages, you want to download and install MySQL Connector/Net. A Connector is what MySQL calls their database drivers. And yes, even if you are on Java you want the .Net connector. The reason is that the .Net connector also includes the Visual Studio integration, although here it doesn't seem that the Express editions are enough. Connector/.Net can be downloaded here: http://dev.mysql.com/downloads/connector/net/.

And then for Java, you need Connector/J, available here: http://dev.mysql.com/downloads/connector/j/

Now, one word of caution before I leave you for this time: Both Connector/Net and Connector/J does some cool things for you. If you look at how MySQL works, or have a look at the MySQL C API, you realize that MySQL has some very particular ways of dealing with some things, like array inserts, prepared statements and things like that. To make life as a MySQL developer easier, both of these connectors do some very cool things to isolate you from that. Also, these connectors to an extent address some things beyond MySQL itself, such as high availability. And thirdly, tuning the Connectors is often just as important as tuning the server! What I am saying here is, in short, something as unknown and arcane as read the documentation. In particular this is true for Connector/J, which has many parameters, is very advanced, but also very fast and can help you speed up your application if you know how to work it.

Bye for now, I'll be back in a jiffy with some more Windows specific MySQL notes
/Karlsson

Tuesday, November 30, 2010

More on the sqlstats SQL Statement monitor plugin

There was some questions on how sqlstats really works, so I'll provide some more details here. Before I begin though, if you want to play with it, it can be downloaded from Sourcefore here: http://sourceforge.net/projects/sqlstats/

OK. So lets start then. SQLStats is a plugin for MySQL 5.5 and up, it will not work at all with previous versions. What it does is reasonable simple, the reason it hasn't been done before is that MySQL just wasn't instrumented to support it, but the effect is really usuful anyway. The 4 things it does, in short, are:
  • Pick up all SQL Statements executing in MySQL. This is done using the MySQL Audit plugin interface.
  • The SQL Statement is then normalized, which in this case means that all literals in the statements are taken away, so that SELECT * FROM orders WHERE orderid = 57 becomes SELECT * FROM orders WHERE orderid = ?. Then we place this statement in a list of most recently executed statements, if the normalized version is already in the list, the execute counter is incremented and the statement is popped to the top (as it is most recently executed), else it is inserted into the list (and popped to the top), in which case, if the list of full, the least recently executed statement leaves the list.
  • Then we check a second list, the toplist of most executed statements. If the execute count of the normalized statement will make it appear on this list, it is added there in the same way.
    As a sidenote, both the recently executed list as well as the top executed list are simple in-memory datastructures. They are not stored on disk or anything, they are just lists of C-structs allocated dynamically.
  • The last thing the Plugin do is to expose two INFORMATION_SCHEMA plugins, so that the recently executed list and the toplist can be examined as normal MySQL tables.
All three plugins (the Audit plugin that will pick up and process the statements, and the two INFORMATION_SCHEMA tables) are in the same plugin library, so that they can share the same data: the lists of SQL statements. The lists are protected by an internal mutex, so that multiple threads can use the plugin simultaneously without breaking anything.

One magic aspect here is the Audit plugin API? What is that you ask? It is actually pretty well documented these days (see http://dev.mysql.com/doc/refman/5.5/en/audit-plugins.html). Basically how I use it in this instance is to pick up the statements every time they are written to, or would be written to, the general log. Note that the general log does NOT have to be enabled anywhere for the plugin to work! That's a releif, right?

So, once the plugin is installed, if you follow the documentation, you can see the most recently executed statements in the SQLSTATS_LASTSQL tables in the INFORMATION_SCHEMA schema, like this for example:
SELECT * FROM INFORMATION_SCHEMA.SQLSTATS_LASTSQL;
The columns are described in the documentation, but the most important ones are:
  • STATEMENT - The normalized statement.
  • SAMPLE_STATEMENT - The first real statement that was turned into the normalized statement in the STATEMENT column.
  • NUM_EXECUTES - The number of times the statement has been executed.
  • NUM_ROWS_TOTAL - The total number of rows retrieved by all NUM_EXECUTES executions of the statement.
The SQLSTATS_TOPSQL table looks the same, but has different rows in it of course.

You can turn collection of statements on and off by setting the sqlstats_enable global variable:
SET GLOBAL sqlstats_enable=1; -- Turn SQLSTATS on
The installation script will turn it on. You may also adjust the size of the lists using the global variables sqlstats_lru_stmts and sqlstats_top_stmts global variables. The way these work now, they will not really be adjusted down below the current size of the list.

To install the plugin, just follow the documentation. As you have to link with the MySQL sourcecode, the build process is slightly iffy, but once built, the plugin can be installed and deinstalled and recinfigured, without shuting down the server or anything.

I think that the Audit API is good enough reason to go with MySQL 5.5, if you ask me. My testing, and some semi-production testing here at RF, has shown that the overhead of the plugin is hardly noticable.

/Karlsson

What? Me Grumpy? Me?? or MySQL on Windows, what's the deal?

So you have seen my writings on MySQL on Windows. Most recently regarding MySQL 5.5 and before that on Cluster and 5.1. So, what is the deal? Am I a mean person attacking MySQL? Am I a Windows hater trying to show how badly Windows works? Or maybe a Linux / Unix hater who want to keep those bearded 1970's terminal junkies out of my GUI Windows world?

Actually none of those. My ambition is to point out this that should be addressed in MySQL for Windows to make it more Windows friendly, for Windows users and developers. As MySQL on Windows stands right now,, it is OK for the Linux / Unix user user who also wants to run on Windows, and for those with cross-platform knowledge and ambition who want to run MySQL across the range. And I put myself among those. But I also want the Windows hard-core users to find out about, use and love MySQL. And MySQL sure could do much more in that area, but before that happens, we (yes we. We who are cross platform folks, we who who can take the time to check things out a bit and has somewhat of an understanding of what those rough edges are and we who care about MySQL and it's adoption on ALL platforms).

I really do not think that a Windows user, who is not used to Linux / Unix, whould have to learn some Linux stuff just to use MySQL on Windows. Come on, MySQL is a database system, using the SQL standard and that runs fine on Windows. Yes it DOES! And there are some good Windows integration, like the VS plugin, .NET connector and much more. MySQL 5.5 really does do away with many MySQL on Windows limitations also.

I recently reported some bugs on this matter, really minor bugs, but things that could easily be fixed and that would make MySQL on Windows a bit more polished. One issue I reported was that there are still Perl-scripts distributed with MySQL for Windows, even in 5.5. Now, installing Perl just to use MySQL is bad enough. But the thing is that many Windows users / developers don't even know what a .pl file is, and that a Perl runtime is needed! If MySQL on Windows really needs Perl to be fully functional (I don't think this should be the case. If Oracle can do without Perl, then MySQL should be able to do this also, but this is a different issue), then ship a Perl runtime with MySQL!

The reason I am reporting these minor bugs on MySQL on Windows is not that I dislike or hate this or that technology. I am waaay too old to get that emotional about technology in and of itself. The reason I am doing it is that this is low-hanging fruit on the way to make MySQL work, look and behave really good on Windows.

/Karlsson

Monday, November 29, 2010

Announcement: MyQuery 3.4.1 Released

MyQuery 4version 3.4.1 is now released on Sourceforge: http://sourceforge.net/projects/myquery/. This is a pretty minor release with just a few fixes, but the first of them pretty significant:
  • Windows 7 User Access Control (UAC) issue fixed - When running on Windows 7, the Windows UAC would prohibit MyQuery from starting, complaining about registry access. For once, this was not yet another UAC annoyance, but an actual bug that I hadn't caught before. Now, this should work OK even on Windows 7.
  • SQL Statement normalization issues - The SQL Statements dialog got statements wrong at times, and could cause a crash. This was due to a bug in the normalization of SQL Statements and this is now fixed, including refining the normalization a bit.
  • Optionally Show SQL SELECT tool output vertically - This will turn the output from the SQL command in a SELECT tool around. This is for running statements that will produce multiple columns instead of multiple rows, such as SHOW SLAVE STATUS and SHOW MASTER STATUS. The output from these commands is weird, the multi-column layout really isn't useful, but this little feature fixes that.
Happy SQL Coding!
/Karlsson

Saturday, November 27, 2010

Perl is probably a fine scripting language, but on Windows...

Please do not assume Perl is always installed on all platforms. In particular not on Windows! Just as it is probably a bad idea to assume that Windows DOS commands work on Linux! Come on now!

Why am I making a fuzz you ask, do I have a problem with you writing stuff in Perl (personally, I don't even like Perl, but that's another issue)? Nope. But somewhat modern software really should not assume that you have Perl, or DOS, or some other arcane, not terribly standardized piece of software around for a server to do it's job. And MySQL still have a way to go to be reasonably good on Windows. Downloading the lastest RC, 5.5.7, MySQLhas managed to screw up at least two things, at least, in my mind. But maybe that's just me:

  • The optimzied. non debug libraries, are now in /lib under the distribution, not in /lib/opt where they used to be. I see no good reason for this, to be honest (place the libraries where you want, of course, but once a release is out, don't move them around at random in upcoming releases). This has been reported as bug #58532
  • To make it realhard for us in the community to test this release Candidate version of MySQL, to iron out the bugs, MySQL decided not to include dynamic debug client libraries, only the static ones. Why this makes things difficult, you have to do some development on Windows to userstand. For a bunch of reasons (not necessarily good reasons, but if you are Microsoft, you are allowed to do whatever you choose, no reason needed, except that maybe it's the 7th hour on the 7th month and it's afull moon coming up or something), Micosoft compiles stuff so that a static library depends on stuff included by the compiler itself. So if the compiler version changes, the library will not work properly anymore. In short, you need to build your applications with the same version of Visual Studio as the MySQL Client library was built with. Yikes! (For more Yikes, see MySQL bug #58531)
None of this is any fun. And to add insult to injury, in the bin directory of the MySQL distribution on Windows, in 5.5.7, I find the following files: mysql_config.pl, mysqld_multi.pl and 4 more. Yes, Perl programs. No, I do not have Perl install on my Windows boxes. I'm about to test this and see how it worls aout. Above all, I want to test if mysql_config works as expected.

All in all: Please remove Perl depenent client programs from the MySQL distribution! And mysqld_multi seems rather superflous on Windows in general, if you ask me (not that it even works, check bug #58553).

/Karlsson

Tuesday, November 23, 2010

Monitoring MySQL SQL statements the way it SHOULD be done!

You may have read a previous post of mine, back in April this year, where I wrote about using the MySQL 5.5 Audit interface to SQL Statement monitoring. There was a bunch of comments and some ideas, but not much happened. Until now that is.

Hereby I release the first version of SQLStats, which is a much enhanced version of what I described in the previous post. This is a MySQL Plugin for MySQL 5.5 that allows you to monitor the most recently executed, and the most frequently executed statements using 2 INFORMATION_SCHEMA tables. The thing is not very complex, to be honest, but it does do the job. So what was the job then? Well, looking at what this plugin does, it goes something like this:
  • Allows you to monitor ALL SQL statements executed by the server.
  • The SQL statements are "normalized", meaning that literals / constants are removed before comparison.
  • Data is saved in memory. No disk access and very little overhead.
  • Data is retrieved from INFORMATION_SCHEMA tables, just a simple SELECT and you know what is going on in the server.
I have done some basic testing of the overhead of this plugin, and it wasn't even noticable. If you have used the general query log for this stuff before, know what overhead I'm talking about. With SQLStats, the overhead is close to 0.

There is no need for MySQL Proxy or anything like that. There is no need to change something in the Client or in the Connector. To be honest, there are a couple of things I want to add to the plugin eventually, but this is a starting point at least. To use it: download it, install the tomcat / mysql monitoring server, install the monitoring agent and ... No wait, that was MySQL Enterprise Monitor, this is how you do it: download, build, install and use it. That's it.

So where can I get it, you ask (or maybe you don't, but I'm gonna tell you anyway). It's on sourceforge, and you can download it from here:
http://sourceforge.net/projects/sqlstats/

There are two files to download: A simple PDF documents with some basic usage and configuration tips, and a source-code package (which also includes the same PDF).

What would I want from you? Ideas for future development, bug reports and a few beers, that's it, not too much to ask, eh?

/Karlsson
Also, did I mention that the overhead is VERY low...

Friday, November 19, 2010

MyMonitor 1.1 released. Finally!

I know some people have used mymonitor on and off over the years, but it was a long ago since I created it, and things has moved on and changed. Among the things that didn't work well was the build process, and the fact that there were a few bugs in the normalization of SQL statements. Now this is fixed.

If you have NOT used MyMonitor, it is a small ncurses based tool for monitoring SQL statements in your server, and work by polling the PROCESSLIST and the normalizing and sorting and grouping the result. What is "normalizing" then? It's the term I use to describe the process where a statement has all the literals replaced with ? in them, so I can see that:
SELECT * FROM mytable WHERE id = 861
is actually the same statement as
SELECT * FROM mytable WHERE id = 387
And by the way, this is also the same statement as
SELECT * FROM mytable WHERE id = 177

MyMonitor can look like this in action:

You get the point I think. MyMonitor has a few more tricks up it's sleeve, like the ability to write out the monitored statements to a file and to do some simple timing on statements. All in all, this is a simple tool that can be real helpful at times. Many times less powerful than, say, MySQL Enterprise Monitor or MONYog, but still useful.

If you are wondering: No GUI needed. What IS needed though is ncurses. And Windows support? Nope, but have a look at my MyQuery project instead, it has many more features, including staement monitoring.

Download mymonitor from Sourceforge here: http://sourceforge.net/projects/mymonitor/

Cheers
/Karlsson

Sunday, November 14, 2010

MySQL configuration management using Chef

Here at Recorded Future we use Chef for Configuration Management, and I am a complete newcomer to this. Chef is, like Puppet, built in Ruby, Ruby sure seems popular for Configuration Management things.

As for the so called "recipes", which is what Chef names a things used for installing some software package, such as MySQL, is really not complete in the case of MySQL at least, but that is really not a big deal. I actually am beginning to like Chef, there are many good points to it, although I can't escape the feeling that I am spending more time writing Ruby than actually installing stuff and getting MySQL running. But I beleive that this most a result of the fact that we are still a startup and do not yet have that many MySQL servers running.

I my opinion, and I can be pretty opinionated as you know, I think the one big thing with Chef that I do not like is Ruby. I have to admit I really have to work hard to get used the idiosycratic means that this language works in. Mind you, I'm no big fan of scriping languages in general, but I have written fair amounts of PHP, for example is most of my website papablues built by myself in PHP as Joomla modules, components, templates and plugins.

In many ways, I think Ruby started of pretty well, the basics of it are promissing. But the way they have complicated Object Orientation seems to me to be not just an overkill, but an overkill without really good uses. There are so many different means and syntaxes for the same thing that you just get confused. And then you do what I do. Learn one tecnique and use that and nuthin' else (which means the code will be hard to read for someone using some other tecnique: Tough sh*t!)

That said, all this may be just because I'm not used to it. I used to hate PHP also, but sort of got used to it (I still don't like it that much, to be frank). But my one and only favourite programming language remains C. Which you can learn by reading the thin "The C programming language", not 1000 pages of "Idiots introduction to the basics of PHP" or some similar utterly useless publication.

All this though, Chef is beginning to be fun, and I will share some more experiences with you regarding the Chef work we are doing here at recorded Future. Also, I was thinking that we should share some experiences with other MySQL Chefs, maybe as a BoF session at the upcoming MySQL UC (It can't be a full blown talk, the deadline for CFP is over).

Cheers for now
/Karlsson
Who sticks to his old way of working, even with Chef: Code should be commented! And yes, I am also aware I am old fashioned.... And by the way, Chef documentation leaces a bit to be desired (and that way todays gross understatement).

Wednesday, November 10, 2010

MySQL 5.5.7 - Can we trust it being RC, or?

I just saw that MySQL 5.5.7 RC had been released, and reading the releasenotes made me more than a fair bit suspicious. In some kind of general agreement on what constitutes a "beta" release, this is when the software has reached a level of maturity when no more major features are to be introduced. MySQL (and many others) has broken that rule at times, and the rule is not enforced or something.

What constitutes an RC release though, in my mind, but I really want to know what you think, is software that is really 100% feature complete. There may be, but hopefully there aren't, even any major bugs to iron out. In short, it is "A Candidate to Release", and as close to GA as you can get. I have not seen this rule broken much, really.

With MySQL 5.5.7, this is an rc, as was the previous release, 5.5.6, and this time there is a really major feature introduced between these two release, pluggable authentication. And before I go on, let me stress that this feature per se is not what I am questioning here, quite the opposite, this is a very useful feature.

What I am questioning though is:
  • Why is MySQL introducing new major features in an RC release, even in between 2 RC releases? This means, if I am not mistaken, that this very important feature (authentication deal with security, mind you), that it might go live (the C is for Candiate) without having been beta tested?
  • What made MySQL 5.5.6 a Release Candidate? What I mean here is that if we assume that this major new features was conceived, written and performed in just a few weeks between 5.5.6 and 5.5.7, MySQL knew that 5.5.6 wasn't feature complete, and hence in no way a release candidate (C is for Candidate, if you had forgotten that little fact).
  • Why does MySQL insist on having major important changes to the security setup be tested the least in the server before GA? Fact is, what MySQL is telling us here is that there may not be any testing at all (as 5.7.7 is RC (where C is for Candidate) which means is could possibly be GA.
My conclusion is that MySQL 5.5 is not to be treated as GA (is MySQL even considering a GA release as feature complete, or are they about to introduce more features again in that line of releases) just not yet. Which is a problem for me personally I I just recommended us to go with 5.5.6, hey, it's RC (you know what the C in RC is for now, right?), and that it would be as close to solid for production use as you can get.

To be clear, I will stick with 5.5.6 for now. Not 5.7.7 or even 5.5 GA for a while, until I have tested that pluggable authentication is secure and solid for production use. I really want 5.5, and I am not alone, so I do not understand why MySQL had to screw around with this. I do understand why pluggable authentication should go into MySQL, for sure, but not in 5.5 or at least not in the midst of a RC cycle.

/Karlsson
Who is not saying he will now change to Postgres. Nope, I will not not act stupid. And watch me run with 5.5.6 for a while yet, I will not be alpha testing MySQL security in a live production site, no way, José.

Sunday, October 24, 2010

It's about time

TIMESTAMP and DATETIME is how dates and times, or temporal, data is represented ion MySQL. These types are combined with a bunch of operators on these types, such as ADDATE and TIME_FORMAT and such. So is this enough to effectively manage temporal data then? Nope, in my mind, far from it.

To begin with, time isn't such a simple type of data as we might look at it. Is a date just a count of the number of seconds since 00:00:00 on Jan 1 1970 or something like that? And we then convert it to a calendar and all is fine? Is time data really linear in that sense? No, it's not. Dec 24 2010 is Christmas Eve and when you get your Christmas gifts here in Sweden. Is this significant? Also, it's a Friday. That this data is a Friday can be calculated from the date or the seconds-since-jan-1-1970 counter, but that is a different thing. Looking at that counter, you cannot see that there is no way to look at that number and figure out that this day is special, but it is.

To manage this, we have the date and time functions in MySQL, which help us figure things like the above out, but there are issues with it still. To begin woth, the constructs quickly turn difficult. Secondly, you cannot really do an indexed search on temporal data, except pretty simple cases, such as searching for a data and time, before or later than a given data and things like that. A search for "all saturdays" will sure not use an index.

There are more issues though, like that fact that, as I have shown above, a temporal value as we know them above, contains hidden data, something that is considered bad (like the day of the week, the month, the fact that a data is a leap day etc).

And it doesn't end there, there are different calendars in use, in different parts of the world and in different inductries. If you have a mortage on your house, chances are that the interest you pay is the same each month, despite the fact that the interest in calculated over the full year and different months have different number of days. The reason is that interest uses a "360 day" calendar, where each month is assumed to have 30 days. And different financial instruments use diferent calendars (for example the one called 366 day, which is our normal calendar and the 365 day calendar. which is the normal calendar with the exception that there is no leap day). If you try to match one financial instrument to another, when they use different calendars, is not a lot of fun (I've done it, so I know).

But chances are that many of you now think that this is not a big problem for you, as you don't see people asking questions on data with complex temporal relationships. And to that I say that, there is a big chance that this is because:
  • Temporal data is difficult. More so than simple integers and strings, disregarding the simple before date X or something like that. But we just don't ask the questions like "how much of the sales per quarter happen on the last day of the quarter" and things like that. Which doesn't mean the information isn't interesting, but the questions is difficult to formalize and even more so to execute in a database. Not to mention checking for overlapping dates and things like that.
  • We are used to not being able to ask these kinds of questions.
But really we should be able to ask, query and get data with a lot more temporal information. When was the last time you asked a temporal data query in Google (not that you CAN do this). Like, how has the matching or IBM and Pressrelease changed over the last 10 years, i.e. how many Press releases did IBM post in 2001, in 2002 etc. Yes, the data is there somewhere, querying it is difficult. And using temporal data combined with data you get on the web should allow you to determine more than what you can today (i.e. IBM and Press release matches X number of documents in Google right now). Adding a temporal axis to web data should allow you to determine what might happen.

One issue with all this is that databases are hardly geared up to support this, and a second is that the data to support this isn't available in large quantities yet. This is one of the things we are looking at at Recorded Future, but there is more stuff coming. The web is there with masses of data, now we need to make sense of all that data. One aspect which is there but which hasn't yet been explored much is just the temporal aspect of data.

/Karlsson

Tuesday, October 19, 2010

MyQuery 3.4.0 Released!

I have not finished MyQuery 3.4.0, after a long time of weekend and spare-time development. This version adds some features that are basically Scintilla featires that I have integrated with. This includes error indicators, margin markers and some other things. In particular the margin markers changes a few things in the interface: When running up to, or continue from a point in teh script, then use the margin marker instead of the current line! If you don't like this, then don't set the margin marker and the current line will be used just like in old releases.

Another addition this release is Online help. I get the help contents from the database that you connect to, which is installed with most MySQL installations in the tables called help_xxx in the mysql schema.

External program has a few more options, and in this release the installer installs a few of those, just to show how it works, like the MySQL schema in an editor window and direct access to the MySQL commandline.

In addition to this, and a few more minor features, there is a whole bunch of bugfixes in this release, which doesn't mean that some doesn't remain. Above all, I have reengineered how I handle multiple access to the MySQL connection.

Download it from Soureforge! I will follow up this post with some more on how to use MyQuery, how to build extensions and plugins and some things like that!

Happy SQLing
/Karlsson
PS. If you have issues with MyQuery, drop me an email, but note that I am now at anders@recordedfuture.com!