Thursday, July 23, 2009

Black box programming in C. Why? and a little How.

I'm not too hot on C++, I admit that, although I am hot on some of the advantages of C++! The way it is possible to program using a class as a"black box", where the developer of that "black box" decides what is public, what is protected and what is private.

The reason this is good thing is simple, it makes it possible to control what gets access in a complex datastructure, and how that is accessed, and that is determined by the developer of that complex datastructure, not the users. Without this, you would have to document what aspects of the structure should and can be used by anyone using it, and not even that might be enough, i.e. who needs to read the spec!

This is particularly important with Open Source software. If I work on an open source project and a simple C struct is available to me, I will use it. Hey, it's there and I can see it. And if there are any ill effects of using a particular member of the struct, that I can check by reading the code! Right?? Nah, not really. It may well be that a particular member of a published struct is distinctly private and is planned for other uses beyond what is currently implemented. And when that happens, your code may break. I think this is an issue that we are fighting with at MySQL right now, but we are not unique, I have worked with other RDBMS projects (written in C), even commercial ones, and have seen the same effect. Just because it is not Open Source doesn't mean you are protected, it just means that you have some more control (you know WHO writes the code, and if they screw up, you know where to find them).

I think this is a very important issue with Open Source in general. How can you be sure that all developers that writes code in your project follows what ground rules? Not that they are unintelligent or mean, but really, getting the grips of a large chunk of complex code like MySQL is difficult. The nice thing with C++ here is that it allows you to document the source itself, sort of, and the compiler will, to an extent, also enforce that.

But for me, as I am no fan of C++ in general, or rather, I don't mind C++ in itself that much, but all the different frameworks, discussions, features that are not agreed on and that work differently (templates) and non-standardized aspects (name mangling for example) drives me crazy.

So, can I achieve something similar in C then? Yes you can! And it works surprisingly well, at least in small / mid-size projects.

To begin with do NOT repeat NOT put struct definitions in header files! This is just plain bad! This doesn't mean that structs shouldn't be published, it just means that the implementation of it should be "hidden" from the users of it, and a pointer to it is implemented in the header file.

mycode.h
typedef struct tagMYSTRUCT *PMYSTRUCT;

mycode.c
struct tagMYSTRUCT
{
int nId;
char *pName;
} MYSTRUCT, *PMYSTRUCT;

This works so well and I use it all the time! Even in my own small projects, where I am the only developer! hey, even I loose track of my own code at times!

Now, this of course means a few things. To begin with, if the whole project resides in one implementation file, mycode.c in this case, this doesn't help at all. So the lesson then is to keep the project split in several files, and implement each function as one or more types, usually structs, that are controlled by code in a simple implementation file.

Another advantage of only having forward declared structs in header files is that the include file ordering gets much less important, and the issue of includefiles X requiring Y, but also Y requireing X, is much easier to solve and is much less of an issue in general.

Disadvantages here is that for every member that is accessed, I must provide an accessor function. This is usually considered good practice with C++ also, but in the case of C++ it is a bit easier, as the implementation can be inlined (performance) and in the header file (documentation). Can we solve this with C then?

Well, yes, to an extent. If we assume that we really would like to have the nId member in my sample struct above public, we could do something like this:

mycode.h
typedef struct tagMYSTRUCT
{
int nId;
} MYSTRUCT, *PMYSTRUCT;

mycode.c:
typedef struct tagMYSTRUCTPRIVATE
{
int nId;
char *pName;
} MYSTRUCTPRIVATE, *PMYSTRUCTPRIVATE;

In this case, in the implementation file I need to cast PMYSTRUCT to a PMYSTRUCTPRIVATE. This is not such a big issue, and this is a workable solution, although this has a few disadvantages, like the multiple include issue. Also, I wrote somewhere above that you shouldn't put struct definitions in includefiles, hey, I'm just trying to be practical here. This way of doing things is a bit iffy I guess though.

Another neat C++ feature is inheritance. This can, to an extent, be done with C! Although this also iffy... I'll get into this one later... But I guess that you have figured it out already, if you have read this far.

/Karlsson
Getting out into the sun. Time for some open air motoring!

2 comments:

Antony said...

If you look at the public interface for the plugable stored procedure stuff I did for MySQL, you should notice that I have similar stuff except with both public and private parts of the C structs and when compiling with C++, they also have a compatible C++ style interface for no extra charge.
I did that so a plugin may be written in C or C++ without compromising style in either language.

Karlsson said...

Anthony!

I expected nothing else from you. And this might explain that some existing, "built-in", engines were harder than expected to make pluggable: They didn't conform to the interface and noone, not even the compiler, noticed or bother with it.

/Karlsson