Monday, November 28, 2011

Former Swedish Pirate Party leader and MySQL US Speaker on the list of the worlds top 100 thinkers

For those of you who was at the MySQL User Conference in 2008, you might remember that the then leader of the swedish Pirate Party, Rick Falkvinge, was one of the keynote speakers, giving a talk on Copyright Regime vs. Civil Liberties. The MySQL User Conference has a history of inviting really interesting speakers, often slightly off what is expected at such a tech-focus conference. Falkvinge did a great talk there, and at least impressed me and got me thinking even more on the issues pf free and open speach, copyright, software licensing etc.

Now Falkvinge has stepped down as leader of the Pirate Party, but he is still very much out there and promoting openness. He has managed to achive such a reputation that he is now number 98 of the list of Top Global Thinkers published by the Foreign Policy magazine. I'm not sure how much influence the MySQL UC keynote meant in terms of getting on that list, but it is impressive and the MySQL UC organizers deserve a pat on the back!

/Karlsson

Friday, November 25, 2011

Cloud Tech Day in Stockholm Tue Nov 29

I'll be doing the keynote at Cloud Tech Day here in Stockholm on tuesday. I'll be speaking a bit about what Recorded Future is up to, about Clouds at Amazon and what it is like, about databases, like MySQL and MongoDB, in the Clouds and about Big Data in the Cloud! Really big data in Mongo, in MySQL and the lot.

As usual, I will express my opinions in no uncertain terms. What works? What doesn't work? What really should work, but which doesn't! What is considered new and waay cool but what is really some old technology that didn't use to work and has little chance of working now. And stuff like that, you know what it's like and maybe you even know what I am like :-)

Hope to see you on tuesday
/Karlsson

Wednesday, November 23, 2011

Nov 23: At Cloud Camp Stockholm

I am Cloud Camp in Stockholm today. Some interesting ideas are bounced around, pretty cool stuff.

One thing hit me today though: the lack in innovation, in IT as a whole and in databases in particular is stunning. I have thus decided to write a few blogpost on this I think should, and probably eventually has to change, but noone wants to change it, and few even see it as a problem.

That said, I still got a few interesting ideas today, and I will test some products I saw here, and I will write a few blogpost on some of them.

I think the good usecases for clouds is also getting clearer, and that is a good thing. In difference to the current IT trends, IT press and many high-profile bloggers as well as IT influencers, I do not think that cloud computing will help resolve the conflict in the middle east. Also, I do not believe that the introduction of cloud computing, in difference to what many IT security folks seem to think, will cause all the credit card info, all the personal data and everything else suddenly to be available to everyone on the net. Taking my own stand as usual, and in this case this is a real different view,I beleive that Cloud computing is great for some, but not for all. And I also do not think (you are sitting down now, right? This is revolutionary, ground-breaking thinking) there is no such thing as a silver bullet. Tough!

/Karlsson

Tuesday, November 15, 2011

MyQuery 3.4.3 GA Released

I have had MyQueru 3.4.2 as beta for quite a while now. During this time, a few minor bugs has appeared, and they have now been fixed. This means I can proudly announce MyQuery 3.4.3 as GA! Download it from Sourceforge!

Except a few bugfixes, the only major change is that the NDB Monitor plugin is no longer part of the prebuilt binaries. There are three reasons for this:
  • The plugin relies on NDB API, which still is still not available from Oracle in binary form, and if they insist on not shipping these binaries, although the source exists, I have to assume they do not want me to use them.
  • I just can't be bothered to build MySQL luster form source, just to get these binaries.
  • The way Windows binaries are built with NDB is not properly documented, so I had to guess and use some trial-and-errors. This was not something I wanted to put in a GA release.
If you don't know what MyQuery is, this is a Windows-only MySQL Query tools, that concentrates on flexibility, extensibility and SQL-scripting. It has a colour coded text editor, using Scintilla and includes user-defined, if you want to, keyword lists. There are several means of extending the tool, from just running simple SQL statements to using a C API. The tool comes with complete docutemtation and API samples.

/Karlsson

Monday, November 14, 2011

My take on the "warning" against using MongoDB...

We have seen the "warning" against using MongoDB a few times now, and I have to say that this reminds me of other such warnings:
In a sense, most of them were right. If you had, in the 1920's, asked the movie going public if they wanted "talkies", chances are most of them would have said no. If you had told my mom and dad in the late 1970's that within 20 - 30 years, everyone would have a computer at home, with some resemblence to what their nerdy near-20 year old boy was tinkering with in the basement of their house, they probably would have laughed, at best.

But that's not the thing here. True innovation moves things forward. It introduces new things and new ways of doing things in a way that we have not heard of before, and the rest of the world has not a got a good view on it. Look at Virtual computing. This was considered so slow that it was close to useless some 10 years ago or so, but today it just cannot be ignored and is put to good use all over the place (I am now disregarding the fact that this technology is way older than this, I am talking Virtual computing in the field where I spend most of my time).

If something that provides new and unique features, and new ways of doing things, are still slow, when compred to traditional means of acheving similar results, isn't strange:
  • New means not fully developed. What you want to demonstrate with something completely new isn't that it performs as well as existing technologies, then why would anyone change? No, you want to show the new features and demonstrate hwo unique this new thing is.
  • The way we measure performance or whatever we use to measure existing technologies, is usually tied to measuring just that: the performance or whatever of existing technologies, not that of a new technology and a new way of doing things.
In the early 20th centrury, steam cars were mach faster and more reliable than internal combustion powered cars, the Stanley Steamer was more performant and reliable than most competitors.

Getting back to MongoDB then: My main gripe with it is not that's it's not in all aspect mature (it's not. face it, if you use MongoDB you used leading egde stuff. It will break, live with it!). Neither do I have any issues with many of the other attributes of MongoDB and neither that it really isn't even innovative (it's not, live with it). No, my main gripe is this: MongoDB and NoSQL isn't really that new, and this means it is probably a stop-gap solution. In the 1980's running a SQL database on a PC was possible, but slow (I was working for Oracle at the time, so I know), DBase was in the case of a single PC easier to use, faster and more developer friendly. And the way you used Ashton-Tate DBase, by the way, wasn't that much different from how MongoDB is used today.

But the SQL Based relational databases, like Oracle, Informix etc. had more features and was more flexible and standardized, and once the PCs got more powerful, DBase was history.

What really wins then, in my mind, is features, flexibility, scalability and broad spectrum of usacases. SQL Based relational databases, has this, to an extent, but what most of them lacks is scalability across servers in a cloud. In this aspect, they have some of this scalability, but they don't scale nearly as much or as easily as, say, MongoDB or the other NoSQL databases (yes, I hate that term. Find a better one fpor me that is broadly accepted and I start using it).

So what am I saying here? Let me summarize it:
  • No. MongoDB doesn't suck, no way, but in terms of maturity it has some way to go.
  • Yes, a relational RDBMS usualy has more flexibility and broader set of usecases than a NoSQL solution.
  • And Yes: There are places where the RDBMS software industry got caught with their pants down: Cloud environment scalability for example. Also, licensing, if Oracle or MySQL or whoever could figure out a proper means of pricing cloud services, I'd be happy (insteaad of using the pricing models for software that was introduced with Auto-Flow in the early 1960. This is insane. The world has changed since then, guy!)
  • Would I rather use MySQL than MongoDB here at Recorded Future? Yes, probably, but I don't insist, and it just wouldn't work, as MySQL will not scale in the way you can scale a MongoDB solution, far from it.
  • Will MongoDB or the other NoSQL solutions mean that the era of SQL based databases is reaching an end? Nope, no way, José. Eventually some RDBMS vendor will get it and understand the issues and build a viable solution (like ScaleDB or NuoDB or something like that).
  • What is my main issue with MongoDB? That it sacrifices features and functionality for performance. And it cannot add the features and flexibility of an RDBMS without sacrificing performance (look at this: access method: B+-Tree. Come on, how innovative is THAT?) . To be honest, running MongoDB without sharding seems like a useless excercise to me. If you don't need the scalability that this setup can provide you with, you have better options. (but this is just me talking here.)
  • With MongoDB hang around for long? Yes, probably, but it will not be that hot for as long at SQL based databases. The reason is that compared to a traditional RDBMS it provides just one big advantage (a big advantage, yes, but just one): Performance. Which is why we use it. Remember the Stanley Steamer? Old hat today, but it was the hottest thing you could drive some 100 years ago or so. And this is what cloud computing is all about, a constant change of technology to get the best value for money right now, and to be on constant lookout for new technologies that drives features and performance.
/Karlsson

Wednesday, November 2, 2011

Clouds in Stockholm

I'll be at Cloud Camp here in Stockholm on November 23. Some familiar faces will be there, beyond yours truly then. I will discuss and present some real-live Database Cloud experiences, but as this is an unconference, don't expect slides, rather I will talk from my heart and give you some annoying and upsetting views on how things really are. Really!

I hope to see you there, pop by and say hello!
/Karlsson

MongoDB for MySQL folks part 3 - More on queries and indexes

Last time I wrote about MongoDB for MySQL DBAs I described some of the basics of MongoDB querying, and this time I'll follow that up with some more on querying.

As we saw last time, the basic format of a MongoDB query is:
db.find(<query>,<attributes>)
Note that you do NOT replace db with the name of the database you want to query here, you just make the database you want to use the current one and issue the query, such as:
> use test
> db.mycoll.find()
The example above will find all objects in the mycoll collection, and will include all the object attributes and also the key (_id), like this:
{ "_id" : ObjectId("4eb0634807b16556bf46b214"), "c1" : 1 }
{ "_id" : ObjectId("4eb0634a07b16556bf46b215"), "c2" : 1 }
{ "_id" : ObjectId("4eb0635607b16556bf46b216"), "c1" : 2, "c2" : 2 }
{ "_id" : ObjectId("4eb0635e07b16556bf46b217"), "c3" : 3 }
The Object id is generated by MongoDB itself here, although you can set it yorself if you want to, as long as it's unique. The insert method is used to insert data:
> db.mycoll.insert({c3: 4, c4: 'some string'})
> db.mycoll.find()
results in;
{ "_id" : ObjectId("4eb0634807b16556bf46b214"), "c1" : 1 }
{ "_id" : ObjectId("4eb0634a07b16556bf46b215"), "c2" : 1 }
{ "_id" : ObjectId("4eb0635607b16556bf46b216"), "c1" : 2, "c2" : 2 }
{ "_id" : ObjectId("4eb0635e07b16556bf46b217"), "c3" : 3 }
{ "_id" : ObjectId("4eb063d307b16556bf46b218"), "c3" : 4, "c4" : "some string" }
And as you can see, typing is automatic, or you can look at it as being type agnostic. Now, this wasn't much more than we saw last time, what we want is to select some specific objects and possibly get some specific columns from it, this is done by specifying one or two arguments to the find() method. For example, if I only want to get the object back that I inserted last above, I'd do this:
> db.mycoll.find({c3: 4})
{ "_id" : ObjectId("4eb063d307b16556bf46b218"), "c3" : 4, "c4" : "some string" }
And this wasn't really complicated, right? The condition is passed as a Java Script object notation, and that is fairly uncomplicated. But what happens for something slightly more than this really simple example, like a rangesearch? To get all objects where the c3 member is 4 or higher (which results in the same object as above by the way), you would write something like this, and :
> db.mycoll.find({c3: {$gt: 3}})
{ "_id" : ObjectId("4eb063d307b16556bf46b218"), "c3" : 4, "c4" : "some string" }
I will show some more $-operations beyond $gt in a later post, for now just accept that they exist and are documented here: Advanced Queries

The _id column is just annoying here, right now, but it is always shown by default, as are all the other object. To get rid of it for now, this will do the trick:
> db.mycoll.find({c3: {$gt: 3}}, {_id: 0})
{ "c3" : 4, "c4" : "some string" }
Not too bad, right, and kinda easy to understand. The falgs you pass for each field in the second argument may have one of three values:
  • 1 - Include this field. This is the default.
  • 0 - Do not include this field.
  • -1 - Include no fields except this one and the ObjectId. You may have more of these, in which case all the -1 flagged fields will be included.
Let's try a more advanced version. I want to the the c1 and c2 attributes, and nothing else, then I do this:
> db.mycoll.find({},{c1: -1, c2: -1, _id: 0})
{ "c1" : 1 }
{ "c2" : 1 }
{ "c1" : 2, "c2" : 2 }
{ }
{ }
As you can see, I have to explicitly exclude the _id field.

Online help
The mongo commandline tool for once has decent online help. Typing just help will show the options. For help on database specific operations, type db.help() and for collection specific operations, type db..help(), such as db.mycoll.help(). In JavaScript, a function is just another script, and adding arguments to the function will execute the function, but maybe you want to see how the function is implemented? The just type the name of the function, like this:
> db.mycoll.find
function (query, fields, limit, skip) {
return new DBQuery(this._mongo, this._db, this, this._fullName, this._massageObject(query), fields, limit, skip);
}

DBA Work - Indexing data and explain
What would a mongo DBA want to do? Let's try creating an index. Let's say we want an index on the c1 attribute in the mycoll collection as above, then we must use the ensureIndex() method on the collection in question, telling what columns I want to index, like this:
> db.mycoll.ensureIndex({c1: 1})
And that's it. Let's try to query that collection again, this time using the c1 column as an argument, and hopefully the index will be used:
{ "_id" : ObjectId("4eb0634807b16556bf46b214"), "c1" : 1 }
Right. But is the index used? I want to know that it is for a fact, or if it isn't, so I have something to complain to my developers about. In MySQL, you want use the EXPLAIN command and figure out what index are being used, but with mongo? Easy. Use the explain method, like this:
> db.mycoll.find({c1: 1}).explain()
{
"cursor" : "BtreeCursor c1_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"c1" : [
[
1,
1
]
]
}
}
Hey, that's prett cool, right! The index is a standard B-tree index (the only index type available in MongoDB). An index can also be unique, like this:
> db.mycoll.ensureIndex({c2: 1}, {unique: true})
Which will create a unique index on the c2 attribute, but in our case it will not work:
E11000 duplicate key error index: test.mycoll.$c2_1 dup key: { : null }
What's going on here? Well, the c2 attribute isn't included in all objects, and but the index will include all objects, and MongoDB considers NULL a duplicate here (unlike an SQL NULL in which case this is not the case). So the real question here is, what do you want? As MongoDB is schema-free, and you can have any kind of attributes, and also looking at the data above, what I would probably want is an index on the c2 attrbute that makes sure that c2 is unique WHEN INCLUDED, if the c2 attribute isn't part of the object, then please mr. Indexer, ignore it. This is called a sparse index in MongoDB, and what it means is an index that just indexes the objects where the attribute is included.

Note that this may not always be what you want with non-unique indexes, but it often it is, and it makes seaching and inserting faster (as the index is smaller). In the case you have an attribute that is only rarely part of the object, and you want to find the objects where it IS included, this is just what you want.

In our case, the index is created like this:
> db.mycoll.ensureIndex({c2: 1}, {unique: true, sparse: true})
And this time we had no errors. Let's see how it works, first get some data:
> db.mycoll.find({}, {c2: 1, _id:0})
{ }
{ "c2" : 1 }
{ "c2" : 2 }
{ }
{ }
Now, let's see if the unique index on c2 will guarantee uniqueness by inserting a new row with an existing value for c2:
> db.mycoll.insert({c2: 1})
E11000 duplicate key error index: test.mycoll.$c2_1 dup key: { : 1.0 }
Yo! That worked as expected! As does this (which gives no errors):
> db.mycoll.insert({c2: 3})

That's it for now, I'll be back soon with some more MongoDB DBA stuff: Sharding!
/Karlsson