Nothing is Something

When it comes to writing correct, efficient, usable, extensible, readable, and supportable programs, many things come into play.  Coherent algorithms, usage of proper data structures, intelligent application of OOP principles, application of existing standards and conventions, expertise in the domain or business logic, etc.  We can probably keep brainstorming here until we pass out, and produce a list of platitudes long enough to gag a horse.

What I’d like to do is to get back to two simple things that can aid tremendously in some, if not all, of the criteria I use to rate a great program.  They both pertain to what is essentially nothing.  I.e., they do not contribute to the solution of the problem, nor do they increase a program’s efficiency.  They do not make the program more reliable, nor particularly more usable.  In fact, if you never have to really read the code, they make no difference to you at all, because by the time the program is running, the compiler has discarded them.

The two items I’m talking about are comments and space.  Beyond the utter disregard they suffer at the hands of the compiler, they share a kind of symbiosis.  In some ways, good spacing is commenting, and good commenting can add to better spacing.

Back in the mists of time, when I was learning assembly language, my instructor fanatically insisted that I put a comment on every line, without exception.  There was some rationale for this.  Assembler by its nature is very concise, almost cryptic.  It is also very low-powered: it takes a lot of code to do some very simple functions.  Without a clear description of the thought process for these atoms of logic, assembler can be virtually impossible to read.  Most of us today can avoid the challenges that assembler presents, and so a religious dedication to commenting every single line is no longer necessary, and frankly, can detract from readability if applied too scrupulously.

Today we have some languages in our toolboxes that bring us close to the Holy Grail of self-documenting code.  (Something COBOL was supposed to be, but I never got the chance to find out.  Reports from the front were not good!)  Some of the aids:

  • Variable names of unlimited lengths
  • Naming standards for variables (e.g. as we covered to death here and here)
  • Auto-commenting features (/// in C#, /** in Java, ”’ in VB, e.g.)
  • IDE plugins to assist in comment headers
  • A convergence of common statements, i.e. if, while, for, switch, etc. are all very similar among, C, C++, C#, Java, Perl, PHP, JavaScript, etc.

In the code I write today in C#, I find that the vast majority of my lines really do not require any comments.  If I am adhering to reasonable variable naming conventions (e.g. using NumberOfItems, not noits), and have provided solid comment headers at the top of my functions, then comments within the body of the function code are only necessary in these cases:

  • Tricky code–code that needs further explanation because it’s obscure.
  • Explanations why an obvious method was not used, e.g. I use a series of if…if else…else statements rather than a switch because one of my cases would require a non-constant value, and C# does not like that.
  • Warnings about problems that might occur if the business logic changes.
  • Warnings about lines whose proper order is critical to correct processing.
  • Use of any “magic numbers” or “magic strings.”  Although these should always be defined as consts someplace, it’s always good to remind the reader of their significance.

It is possible to overdo the commenting to be sure.  E.g a comment such as the one below is pointless:

if (ItemWasProcessedOK)
   NumberOfItems++;   // Increment the number of items if it was handled successfully

It’s silly because the comment adds nothing to the understanding of the statement.  Any reader should be expected to know what the ++ operator does, and the variable names are sufficient to explain the intent of the coder.  In this case, the comment actually detracts from the readability of the program because it slops up the code space with useless characters.  This kind of thing is almost as bad not explaining your tricky code–almost.

Spacing is something of a more refined art.  There is good space and bad space.  Good space breaks up a block of code into more readable sub-blocks.  Bad space just spreads out the code so it expands beyond the window of your screen and makes you scroll to read the function, which as mentioned last time, is a Bad Thing.

E.g. a block of code such as this:

if (x == 9)
    // Do something
else
    // Do something else
var y = new Thing1();
var z = new Thing2();
if (a == 10)
    // Do something
else
    // Do something else

Can be rendered more readable by the judicious application of blank lines, like this:

if (x == 9)
    // Do something
else
    // Do something else

var y = new Thing1();
var z = new Thing2();

if (a == 10)
    // Do something
else
    // Do something else

However the following spacing:

if (x == 9)

    // Do something
else
    // Do something else

var y = new Thing1();

var z = new Thing2();

if (a == 10)
    // Do something
else

    // Do something else

Indicates one of the following:

  • The coder is careless or sloppy.
  • The coder has some kind of handicap (pun intended) where s/he cannot control how many times s/he strikes the ENTER key before typing the next statement.
  • The coder is demonstrating that s/he writes code as a stream-of consciousness exercise, and never goes back over what was written.
  • The coder intended to add some more statements but could not bear to go back and add them because it would have been, like … a total drag, man.
  • The coder just doesn’t give a crap.

None of the above is the kind of coder I want writing code for my company.

If for some reason you feel strongly that the following code:

if (iNumberOfItems == 0)
{
   Console.WriteLine("There were no items");
   return false;
}

Is rendered more readable by doing this:

if (iNumberOfItems == 0)
{

   Console.WriteLine("There were no items");
   return false;

}

(Which leads me to question your sanity, but OK.)  Then, dammit, there’d better be an extra line below every left brace and an extra line above every right brace in your code, or else you need to go to the bullet list above and pick one.

In case you cannot pick up on my subtle humor here, I am defiantly against inserting gratuitous blank lines for no good reason, because they violate the principle of What You See Is What You “Get.” And while one blank line can be useful, two or more is does not make it even better.  Frankly, if Visual Studio did not default to the first code snippet’s format above, I’d probably still be rendering it like this:

if (iNumberOfItems == 0) {
   Console.WriteLine("There were no items");
   return false;
}

And saving myself a line.  (A grudging nod here to JavaScript which really wants that brace on the same line.  Unfortunately if you put it on the next line, JavaScript sometimes does stupid things, which is just wrong.)  Remember, you have about a 2:1 ratio of horizontal to vertical space at the 75 lines by 150 columns we discussed before.  Vertical space is way more precious.  Consume it carefully; don’t waste it.

Hell, I still pull this kind of stuff in Perl, mostly because it pisses me off that I have to use those braces:

if ($x == 0) {
    $Message = "Zero";
} else {
    $Message = "Not zero";
}

(OK, so I’m kind of fast and loose in Perl, but c’mon, Perl is a fast and loose language, so, ya know, When in Rome…)

My point is that C# and VS’s default editing settings force you to put each brace on a line by itself.  Sure, I can change the setting, but if you’re working in a group, and you’re the only one who’s twiddled with the default edit settings, then your code sticks out like the proverbial Sore Thumb.  So there’s a lot of readability vertical spacing already built into the IDE.  And, yes, it helps if all the code in the project looks similar.  But don’t go crazy adding extra blank lines where none are needed.

One last thing to mention is: don’t underestimate the power of horizontal spacing.  This takes some extra dedication, and sometimes the IDE battles you on this, but e.g. a lot of assignments can be rendered way more readable with some extra spacing:

string Division = "American League East";  // The toughest division in the game
string First = "Yankees";  // What else is there to say?
string Second = "Blue Jays";  // Good team, maybe Toronto should move to the west a bit
string Third = "Orioles";  // Maybe they will surprise themselves
string Fourth = "Rays";  // They look good down here
string Fifth = "Red Sox";  // May they remain here forever!
var When = new DateTime("4/16/2011");

Looks a lot better when you do this:

string Division = "American League East"; // The toughest division in the game
string First    = "Yankees";              // What else is there to say?
string Second   = "Blue Jays";            // Good team, maybe Toronto should move to the west a bit
string Third    = "Orioles";              // Maybe they will surprise themselves
string Fourth   = "Rays";                 // They look good down here
string Fifth    = "Red Sox";              // May they remain here forever!
var When        = new DateTime("4/16/2011");

(I suppose one grand assumption I’m making here is that your font selection in the IDE is mono-spaced. If you use a proportional font, this won’t matter, and your code will suffer from that no matter what you do!  I honestly don’t understand why anyone would use a proportional font.)  A similar mechanism works well for function calls with a large number of arguments.  I realize there are some hazards doing this, what with Visual Studio always want to re-format your code blocks when you close them, and some tedious typing is involved, but I think the clarity it brings to the code page is worth it.  I suppose this is some of my assembler background coming through!

There was a lot here today amongst the wise cracks, so to summarize:

Use space judiciously, not gratuitously; you have more horizontal columns than you do vertical lines, so use the latter carefully; comment but don’t comment too much or for no reason; line things up to help the reader’s eye.

Posted in Readability, comments | Tagged , , , | Leave a comment

What You See Is What You “Get”

This is not about WYSIWYG editors, or about how I wish what I’m typing here in WordPress’s visual tab would resemble to some degree what I see on the web page without a lot of farting around with style sheets and other hideous elements.

No, this about real estate.  Not the kind that everyone says is a good investment because they’re not making any more of it (except maybe in Holland), but rather the real estate on your monitor.  When you are reading code, there is a limit on how much of a program you have within your immediate view.  That limit is set by the monitor you use, the font size your eyes prefer, the layout of your IDE, whether your window is maximized, etc.

My contention is that what you see before your eyes is what your brain is capable of digesting and analyzing.  Barring photographic memory, which few of us, myself included possess, it is only the code under your eyeballs that you can effectively ponder, debug, and otherwise grok properly.  Computer programmers realize this fairly early on in their education when they are encouraged to break up large code blocks into functions and subroutines.  Once the principles of object-oriented programming became embedded in our coding DNA, there were all kinds of neat ways to distribute code into minuscule bits and pieces in methods and properties.

So this is all nifty and keen.  Everyone knows you don’t write a function that spans pages of code.  Everyone knows that, like prisoners on a chain gang making little ones out of big ones, you try to break up big code chunks into little code chunks.  Everyone knows that.  So why do I constantly encounter code vomit that spans pages and pages?  (That’s what’s called a rhetorical question because if I were to answer it, I’d probably go off on a tear about how so many programmers just don’t give a crap…  But I’m not answering it, it was rhetorical.)

Here’s a really simple rule: if you are writing a function, it cannot scroll.  It should all fit in a single window-full of lines.  “OK, smart guy,” you say, “how do you define a window-full?  Who’s monitor do you use as a guide?”  Well, OK, if I have to choose, then I’ll say mine.  My main monitor at work is a Dell 2007FP, and it is now the Standard Definition for a Window-full of Code.  It shows about 75 lines and 150 columns, using Fixedsys 10 point font with a menu and single toolbar row in VS2008.  So there is the definition of a window-full: 75 lines and 150 columns.  Do you like that?  No?  So then why did you ask for a definition?

You could do worse than using my monitor as a standard.  It’s not wide-screen, and it’s only 20 inches diagonally, so chances are most coders have access to a monitor at least that size.  Of course YMMV, what with font sizes and various toolbars.  And one more thing:  even worse that having to scroll lines vertically is scrolling horizontally.  Nothing removes the heart of the code out from under your eyeballs faster than scrolling left to see that long-ass line from someone who forgot how to use the Enter key.  (I mean, c’mon, if we’re not going to break up our lines, couldn’t I at least stop typing all these semicolons?)

So, yea, until I get promoted from Code Curmudgeon to Code King, I cannot really come up with a fixed number of lines and columns for a window-full of code, and yea, you could go out and buy some big-ass monitor that would finally contain that function you wrote with ten nested ifs, but if you do that, you probably should stop reading this blog, ’cause you and I, we’re not on the same page–or screen for that matter.

“OK then,” you ask again, “does every function you’ve written since you’ve had this magic monitor, fit within the boundaries of your so-called Window-full of Code?”  Of course not.  There are a few cases, where I might have blown the 75 line limit by a dozen or so.  And, ssssh, yea, I occasionally let a line extend to the right farther than it should, especially if it’s a message I’m building or a comment I’ve appended.  But I would say that 95% of my functions observe the rules, and probably half of those are well under the limit.

It’s when these limits are blown by multiples that I think the alarm bells should sound.  I still find myself reviewing functions, hitting the vertical scroll bar twice, three, four times or more.  This is when I can safely say there is a problem.  A problem for which there is no excuse.

I loath ifs nested within ifs nested within even more ifs.  And it also pisses me off if someone clearly does not understand short-circuiting, e.g.:

private void HandleArray (string[] InputArray)
{
   if (InputArray != null)
   {
      if (InputArray.Length > 0)
      {
         // ...
      }
   }
}

Should always be rendered as:

if (InputArray != null && InputArray.Length > 0)
{
   // ...
}

No, the above will not throw a null-reference exception if InputArray is null, just don’t switch the order of the tests!  Nested ifs are a necessary evil in coding, as are nested loops and the various combinations of them.  But you need to give the code reader a break.  If your top level if or loop spans a screen page, then, yes, you have to refactor this mess.

There is a school of thought that says that every function should have a single exit point, and I think this is a nice design rule to shoot for.  But it is one that I will break to improve readability.  Going back to our silly HandleArray function, you would like to be able to do something like this:

if (InputArray != null && InputArray.Length > 0)
{
   // do something with the array...
}
else
{
   // Issue a message or throw or something...
}

But if do something with the array is long and involved, and you have to scroll down two screens to get to the else, then you have lost the reader’s ability to understand what the original check was about. Better to simply get the sanity checks out of the way and get to the meat of the function:

if (InputArray == null || InputArray.Length == 0)
{
   // Issue some error message or throw an exception
   return; // Unless you throw
}

// Now do your long-winded processing--no need for an 'else' here!

If there are multiple parameters to check, perhaps with different error messages or throws, then you can see that you’ll start nesting ifs to the detriment of your reader.  If all you want to do with your if checks is test the validity of the parameters, do so up top and sneak out of the function ASAP if the caller messed up.  Don’t nest the ifs.

Nested fors and foreachs are tools we would no sooner give up than a carpenter would his hammer, but judicious use of these tools demands that we don’t nest too deeply.   Think of nesting like a splitter for your TV cable.  Each split drops the signal strength by 3 dB, which, because dB is a logarithmic unit, is half of its strength.  After going through 3 splitters, the signal has dropped by 9 dB, but that means that the signal is at one-eighth of its original power.  Likewise, each nested if, for, etc., drops the comprehensibility of your code by half.

The easiest function to grasp contains one line of code.  (As long as it does not horizontally scroll, of course.  If you make me scroll horizontally to follow your logic then, I’m sorry, but you should be flogged.)  We can’t always be so lucky to be able to solve the problem we’re addressing with the function in a single line.  You need to add some checks and loops, not to mentions trys and locks, and maybe really do some work besides.  But if you start coding away and notice that the function’s starting and ending braces are disappearing from the screen as you code, then it’s time to go Back on the Chain Gang, and start making little ones out of big ones.  Because I usually can’t remember what I ate for lunch, and I’m certainly not going to remember what you wrote in the code that scrolled off the screen a minute ago.  So…

Keep It short, because, what I see is all I’m gonna get!

Posted in Readability | Tagged , , , , , , , | Leave a comment

What’s in a Name? (Part II)

Our story so far:

  • Mono-cased variable and function names are holdovers from programming’s Jurassic Period, and should be regarded as extinct.
  • Mixed case variable and functions names are a good idea (duh).
  • Hungarian notation has its lure, and maybe its usefulness in some limited ways, but as a rule, not a good idea.

Can’t believe all those words boiled down to just those three things.  (Maybe I am a bit too verbose. :-P ) Last time I mentioned that mixed case variables are superior, and the world of development generally would agree that NumberOfItems is preferable to:

  • numberofitems
  • NUMBEROFITEMS
  • NUMBER_OF_ITEMS (but I bet there’s some COBOL geek out there who sees nothing wrong here)
  • number_of_items (but I bet there’s some UNIX geek out there thinking this ain’t so bad!)
  • noits
  • ni

as well as whatever other ways you might masticate the words for a variable that’s suppose to contain a count of the number items you have.  And yes, iNumberOfItems can’t come to the party either, it’s just a bit old fashioned and should be left to float out on a raft among the ice floes.  Sorry.

Now that we’re signed up for the Mixed Case Army, we can all just march along in lock step and always enjoy the same convention no matter what language or platform we code in, right? Not so fast.  There seems to be two schools of thought, even in this obscure cul-de-sac of nit-pickery:  the School of Camel Case, and the School of Pascal Case.  Using my favorite variable name from above, the two schools would render the name this way:

  • Camel Case: numberOfItems
  • Pascal Case: NumberOfItems

Yes, that’s right, there’s but one difference, regarding the case of the first character, “n” in this case.  (No pun intended–it just happened.)  The origins for the terms “Camel” and “Pascal” are known to the cognoscenti, but are nevertheless too pedantic to review, even here.

If you had to pick up code samples from various platforms and languages, you might find that Camel Case is prevalent in the UNIX/Java world, while Pascal Case seems to be more prevalent in the Windows/C# world.  There are exceptions.  E.g. the aforementioned .Net Framework coding guidelines recommend Pascal Case for everything, except parameter names.  The also aforementioned Java guidelines dictate Camel Case be used for variable and function names, but Pascal Case for class and interface names.

But I have a small problem with Camel Case, and it’s this:  it makes no sense.  In English (and let’s face it, most, if not all, computer languages are based in the English language) you never begin something with a lower-case letter.  (OK, e. e. cummings aside, but what the hell did he know about programming, anyway?)  A sentence starts with a capital letter.  A title starts with a capital letter and every other major word (other than articles and prepositions) is capitalized.  The menu items on my WordPress (not wordPress) admin panel begin with capitals.

In our Hungarian days, variable names did begin with a lower case letter, but that was because it was a prefix and not part of the name proper.  That made some kind of sense to keep the actual name more in focus.  But to arbitrarily dictate that only the internal words of a variable’s name are capitalized seems like a throwback those halcyon days when they just did everything in lower case because upper case letters were just, oh I dunno, so LOUD.

Why start a variable or function name this way?  Why not make the last character uppercase, or only vowels, or some other arbitrary selection?  Readability can’t be a factor here.  I’d concede that neither Camel nor Pascal is more readable, but with a gun to my head, I’d pick Pascal, only because it’s more natural.  So I just cannot fathom the mindset of the first person who decided that Camel case was a good idea, nor the herd of sheep that have followed him down this path.

Let me lay off the UNIX/Java stuff for now, before I say something really stupid, and just focus on the Windows/C# side to make my point.  (Let me give a shout out to VB here who has always, since I’ve was doing VB4 anyway, carried the torch for first-letter-caps, even to the language keywords.  W00t, w00t!)

So the .Net Framework guidelines specify 11 types of identifier: Class, Enumeration type, Enumeration values, Event, Exception class, Read-only static field, Interface, Method, Namespace, Parameter, and Property.  Of these Pascal Case is recommended for 10 and Camel Case for 1–Parameter.  Hmm.  Why just one?  Why is that situation so special that it demands a special (and illogical) requirement? I have one idea.  I run into this a lot when I am coding a class, particularly constructors.  I think I can make the simplest case by devising a small, immutable class.  Let’s call this the Person class, and all it has are public properties of FirstName, LastName, and PhoneNumber.  (Not really useful, but play along with me.) So in C#, I might define this this way:

public class Person
{
    public readonly string FirstName;
    public readonly string LastName;
    public readonly string PhoneNumber;
}

Since I’m digging immutable objects these days, I can only set the properties via the constructor, so I might code it this way:

public Person(string FirstName, string LastName, string PhoneNumber) // Ctor #1
{
   this.FirstName = FirstName;
   this.LastName = LastName;
   this.PhoneNumber = PhoneNumber;
}

IMO, this is a perfectly simple, readable, clear piece of code. Yet, the Framework camels would insist you do this:

public Person(string firstName, string lastName, string phoneNumber) // Ctor #2
{
   this.FirstName = firstName;
   this.LastName = lastName;
   this.PhoneNumber = phoneNumber;
}

If you’re paying attention, you will realize in this case that “this.” is unnecessary since C# is case-sensitive (another abomination I will rail about in some later post), but the guidelines clearly state that you should not have two variables that are distinguished only via case, i.e. FirstName and firstName.  Now if you were using Hungarian notation for the parameters (e.g. strFirstName),  then there would be no conflict, but I think we’ve put that to rest.  (Well, at least I’ve tried.)

If we had private member variables or fields behind the public properties, then scope prefixes for the member variables (e.g. m_FirstName) would work as well, but staying with our original simple case, I think Ctor #1 is good enough as it is, whilst Ctor #2, is painful to behold, even more so if you eliminate the “this.” prefix and force the reader to do a double-take.

I don’t want to beat a dead horse here (or a dead camel), so can we please stop the nonsense with this camel case once and for all?  Pascal case does its job just fine, and I cannot see any reason for inflicting an unnatural convention in one of eleven cases (parameters) in the .Net world.  Even though Java has more uses for camel case, prevalence does not cure absurdity.  But since I don’t have to work in that arena, I will opt out of that particular area of combat.

Perhaps we as developers need to focus on making clear, concise, and descriptive identifiers and stop trying to sneak added significance into them via Hungarian notation or obscure casing conventions.  We’re not writing code with notepad or vi any more.  (Yes, I know there are still legions of Luddites out there who swear that vi or emacs is the only code editor they need.   I pray for them.)  We have, or should have, an IDE that remembers for us the type and scope of all our identifiers.  If not, then we’re not working with the right tools.  Sure, a skilled carpenter can still build a bookcase using hand tools, and that bookcase might be very beautiful and functional, but if I’m that carpenter, and I want to compete in today’s business environment, I’d better get out to Home Depot and buy me some power tools, otherwise I soon will be a very skilled and very broke carpenter.

I think that’s about it for now.  I hope I have killed the camel.  My guess is that if I were to get any replies, they would excoriate me for this wish, as well as some of the other platform-insensitive remarks I’ve made.  (I might even get some brickbats from the PETA folks.)  So be it.  To this I would just say…

What’s in a name?  The description of what it does, no more no less.

Posted in Identifiers, Naming conventions, Variables | Tagged , , , , , , , | Leave a comment

What’s in a Name? (Part I)

What’s in a name? That which we call a rose
By any other name would smell as sweet;

William Shakespeare, Romeo and Juliet

Just because I am quoting one of literature’s most over-quoted lines, don’t take me to be a Shakespeare Snob.  (Not that I wouldn’t want to be a Shakespeare Snob–I consider it one of my life’s failings not to have read and seen more of his plays, but I digress…)  I just thought it would be a good way to ease into the topic of variable names.  I need to go slowly because getting into this topic is like wading into a lake filled with piranha.

There is something in the Bard’s words that are applicable to programming, that might be stated thus:

What’s in a variable name? That which we call NumberOfItems
By any other name would still be able to count the number of items.

Kind of rolls off the tongue, doesn’t it?  The compiler certainly doesn’t care if we we call a variable iNumberOfItems, NrItms, or xxxxx.  So who does?  Well, maybe your boss does, if he came up through the nerd ranks and has tried to impose some variable naming standards.  Maybe the poor soul who has to read and debug your code long after you’ve moved off to your next job does when s/he cannot remember what noits means.  But really, it should be you who cares!

Now there are a plethora of naming standards which you can find out there.  I’ve dealt with several ad hoc standards in my day, even tried to devise some of my own.  In Microsoft’s .Net Framework there is a codified set of rules here.  A quick web search turned up a set for Java here.  I’m sure I could find more.

For me it’s a constant battle to avoid the sin of using abbreviations in variable names.  Perhaps it’s the emotional scarring of having spent so many years with IBM assembler where a label could never be more than eight characters.  Back in the mists of history I even managed to run into a Basic interpreter that would only permit one-character variables, $A, $B, %I, etc.  (The horror, the horror!)

I’ve come a long way since the days of uppercase-only variables from FORTRAN, PL/I, and 360 Assembler.  When UNIX and C came along, there seemed to have been a backlash against all of the upper-cased-ness of the IBM world, despite the fact that UNIX was UNIX and not unix.  But I found it all kind of silly, the idea that your program in C was more readable because everything was in lowercase, and my program in PL/I was less readable because all the keywords and variables were in uppercase.  I say silly because I believe that Humanity as a collective had decided that mixed-case writing was superior to mono-cased writing about the time of… well, since people have been writing. (You know, back in the day, with that IBM 026 card punch, all they had was uppercase letters, and we were damn lucky to have those, sonny!)

As a kind of modern guild, we developers seem to have arrived at a consensus that mixed case variables, function names, labels, etc. are better.  Sure, there are some exceptions.  In C/C++/C# the names established via #define are conventionally all uppercase.  (Although C# has done its best to take #define out behind the garage and put a bullet in its head–rightfully so.)  And it seems that language keywords will probably forever more be in all lower case, despite the occasional holdout (e.g. me when coding SQL).

But variable and function naming conventions, while dovetailing into the mixed-case school of design, are still fractured along other lines, leaving room for more controversy, and yes, silliness.

Hungarian notation, a legacy of Charles Simonyi, was a wonderful idea, particularly when using untyped or weakly-typed languages.  While it seems to be somewhat disparaged nowadays, it still has its proponents.  So it’s still likely you’ll run into a set of corporate coding standards somewhere, dictated by one of this discipline’s disciples.

The OCD programmer in me is still lured by Hungarian.  Almost as soon as I type “int” for an integer variable, the middle digit of my right hand is atop the I key, and something deep within me wants the variable name to begin with a lower case “i”.  But I should have known something was horribly wrong by the time I got to the lpsz prefix in Win32 C.  (For those never traumatized by this, it stands for “long pointer to a zero-terminated string.”)  And this is where Hungarian notation falls on its ass.  Where do you stop?  Sure, ints, bools, chars, and strings–they all have easy selections for a prefix (i, b, c, s).  But what about byte?  Eh, can’t use “b”, that’s already taken by bool.  How about “y”?  Great, except only you will know that means byte.  What about “by”?  Well, hell, you’re already halfway there, may as well just use “byte”.

OK, so you spend a day or two, get the complete list of all the native types in your language and come up with clever alternatives for the prefix collisions: byte/bool, string/short, decimal/double, etc.  Put them up on the Wiki, and… Mission Accomplished!

Not so fast.  What about those secondary types you use all the time?  In my job I use way more .NET DateTime variables than I do chars.  OK, we’ll use “dt” for that.  I sure deal with a lot of DataSets.  OK, use “ds” for that.  There are DataTables, too, so we’ll use “dt” for that–oh, no can’t we used that for DateTime already.  The .NET Framework contains thousands of classes and structs.  You mean to say you’re going to take another few days and figure out which are the ones your team uses and then assign prefixes to them?  Push ‘em out to the Wiki, send an email indicating that All Must Follow These Hungarian Notation Prefixes, start code reviews for compliance checks…  If your co-workers are nice, they will laugh at you and recommend you get professional help.  If they are not, they will kill you, and rightfully so.

Even if this worked, so what?  A significant fraction of the variable names in my code are for classes that are defined in the application itself.  This means you will never stop updating that Wiki.  And those snickers behind your back won’t stop until the day your boss decides enough is enough, and the two of you head down to HR for a short chat.

So Hungarian is a slippery slope.  Perhaps you could try to draw the line at only basic or native types, but I think that’s just a gateway drug.  Look, it’s 2010, and we just have to say “NO” to Hungarian!

Except…

OK, so here’s where I come clean.  I find going Hungarian particularly helpful in one or two situations.  I know, it’s kind of like saying, “I only shoot up on weekends,” but when I am programming a UI (WinForms in my case, but a Web UI would apply here as well), I find it very helpful to use short prefixes for the controls on the form, e.g. btn=Button, txt=TextBox, chk=CheckBox, etc.  Since the list of controls in my IDE (VS2008) is arranged alphabetically, this keeps them grouped into types.  With scores of controls on a complex form, this can be pretty useful for finding a particular control quickly.  It also lets me easily find controls I may have dropped onto the form and neglected to rename from the boilerplate name the IDE assigned.  So I don’t think I’m going to stop this any time soon, despite its flying in the face of the orthodoxy I proposed in the previous paragraph.

My second sin in this department is that I cannot get away from “scope” prefixes.  You’ve probably seen these in some form or another.  One of the more common is to use “m_” to prefix a private or protected member variable.  (Alternatively you might see just a plain “_”, but I did C for too long and will always fear starting a name with an underscore.)  I use “k_” for a const, and “s_” for a static.  That’s pretty much the list.  Since I’m doing .NET, I do tend to avoid these for public names, since the Framework is clearly anti-Hungarian, and to a consumer of my class, I don’t want to present a different paradigm.

I realize that, at least as far as the .NET coding guidelines go, I’m committing some sins here, especially with the “m_” business.  The recommendation from the .Net guidelines is to use “camel case” for what they call “fields.”  I’m going to continue this on the next post because (1) this has gone on long enough, and (2) this whole “camel case” thing is another rat hole that I’m going to dive into, and I need to get my strength up for that adventure.  Until then…

Just say “NO” to Hungarian, except in certain cases where you can’t bear to let go.

Posted in Naming conventions, Variables | Tagged , , , , , , | 2 Comments

All Computer Languages Suck

Why The Code Curmudgeon?  Well, I’ve been coding for decades now.  I’ve forgotten more languages than most people know, and one thing I still believe is (as my friend Bill once said to me):  “All computer languages suck.”  But as bad as these computer languages are, the people who code them are worse!  (Of course I mean the way they code, not that they are bad people, because in most cases, they are not. ;-) )

Here’s the Rogues Gallery of languages that have been implanted into my brain at one time or another, in rough chronological order:

  • FORTRAN
  • Assembler for the CDC 8090
  • PL/I
  • Basic
  • Algol
  • LISP
  • APL
  • Assembler for IBM 360/370/XA
  • REXX
  • SQL
  • C
  • Pascal
  • C++
  • MLINK Script
  • Visual Basic
  • Perl
  • C#

Two glaring omissions: COBOL–OK, I may have coded a program or two for a course in that ridiculous language, but I consciously made an effort to avoid it.  It just seemed like way too much typing for my taste.  And then there’s Java–something I might have really have gotten into if I’d not been drawn in by the Siren Song of C# and .NET.

The last thing I want to do is get started on a comparative language war.  I might do better arguing religion, something I’ve not had a lot of luck with in the past.  So the Code Curmudgeon is not about “My language is better than your language,” because remember, “All computer languages suck.”  Some of them really suck because they are outdated.  I mean, PL/I was an amazing language back in the ’70s, but I’m not quite sure why anyone would choose it now.  Some languages don’t suck so bad because they have nice IDEs, and you don’t have to work as hard as you used to (C# and VB, you know who you are).  Some languages suck but are good for certain applications that I’m just glad I don’t have to ever code (Hello, LISP and APL).  Some languages just plain suck, and I have no idea why they’ve survived to this day other than mass insanity (JavaScript, table for one?).

But this is not about slandering any particular language, as much as I might like to complain about the hideous syntax conventions that so many languages have inherited from C, or why I’ll never be able to remember all of the special variables in Perl, or how a man can simultaneously love and hate the C++ STL.

No, this blog is really a screed against bad coding in any language.  You can write good, clean, readable code in the worst of languages, or you can write sloppy, dense, indecipherable code in the best of them.  Ultimately it’s not about the tool, it’s about the craftsman.  I’ve worked for many companies, in many industries, read code from many programmers, and I’ve come to the conclusion that most professional programmers are fair to poor coders.  Why are they so inept as a rule?  Lack of training, lack of caring, lack of time, lack of effort–who knows?  All of the above.  (Most of my work has been in the USA with a mix of American and non-American programmers, but mostly the former.  I cannot say if things are any better outside the US.  My guess is probably not.)

When I was in grad school, the bible of Computer Science was Knuth’s The Art of Computer Programming, a planned seven-volume set of which only three were produced.  No matter that I probably only digested a tenth of the material, or that his choice of expressing algorithms in his artificial assembly language, MIX, may not have been his best idea, or that he never completed the work.  What was inspiring about it then, and still is today, is the idea that programming was an art, albeit with quite a healthy dollop of science.  I think it’s easy to convince people that programming is scientific (if for no other reason than we have all these great buzzwords and acronyms to confuse outsiders–and ourselves–with), but no one really seems to think there is anything artistic about writing a piece of code.  Too often programmers are driven to get code out the door that is “good enough,” and all the art, and a good chunk of science, gets dumped overboard.

My contention is that coding is an art and a science.  You need both.  Just like a great architect needs an eye for what makes a building attractive and the skill to keep it from falling down, a great programmer needs to produce a program that does what it’s supposed to do without being a morass of spaghetti code that cannot be understood two weeks after it’s written by anyone, including the author.

My criteria for a great program or application are, in order of importance:

  1. Correctness: It does what it is supposed to do–it works, at least most of the time.  (And when it fails, it doesn’t leave the environment it runs in a smoking ruin.)
  2. Performance: It does this without consuming inordinate amounts of resources:  memory, CPU, disk I/O, network I/O, etc.
  3. Usability:  The user of the program can interact with it naturally and without confusion, if there is a human user.  For a non-human user (e.g. another program using an API), there are similar, if less anthropomorphic requirements.
  4. Extensibility: The program can be changed to add new functions, or fixed without breaking old functionality, in a reasonable amount of time.
  5. Readability:  The program can be understood by another programmer with similar technical background within a reasonable amount of time, given the size and complexity of the code.
  6. Supportability:  The program documents the significant steps in its mission and can provide detailed information as needed to aid support technicians investigating reported failures or incorrect behavior.  It also includes a set of analysis tools to assist in these investigations.

Now these are just off the top of my head, and I’d be surprised if someone else had not already itemized these and perhaps other criteria before.  My intent is not to claim these as some kind of commandments, just to provide a framework for my subsequent posts.  I suppose the order of importance among the items is debatable, but that’s not a fight I want to waste time on.  I suppose one could add more items as well, although I suspect I could rhetorically shoehorn them into one of the six above without too much effort.  But I’m not so concerned whether the list is complete.

I want to discuss coding from a very micro-level–from the level of the individual lines on the screen that you see when you open up a code file.  I want to provide concrete examples of what to do and what not to do with a program so you can meet one or more of the goals above.  I’m going to talk about seemingly inconsequential things some times, particularly when it comes to improving readability.  Like a house made of bricks, a program is built by the keystroke, by the keyword, by the variable, and if you don’t line up the bricks and mortar, the house will look rather shabby, it will be drafty, and it might even fall down.

Despite my intent to be language agnostic, I will primarily provide examples in C#.  It’s my current language of choice (well, actually my employer’s language of choice :-) ), but I think that’s incidental.  One of my conceits is that C, C++, Java, C#, Perl are all so closely related that it shouldn’t matter.

That’s the idea, anyway, as of this inaugural post.  Who knows how this will pan out, but I guess the motto of this blog will be:

All computer languages suck, so it’s up to you, the programmer, to make sure that your code does not.

Posted in Introduction | Tagged , , , | 1 Comment