Friday, April 20, 2007

Using descriptive variable names

If I've learned something in my programming career, it is a fact that your coding style changes all the time. [Not all the time for the better, I have to admit - there was a weird phase when if..then..else alignment ... no, I'm not yet able to discuss this. Too scary.] It is not only alignment that is affected (hanging begin..end etc) but the way how you split stuff into classes, units, and methods, and how you name entities.

Recently I made a change in the latter. I'm a big fan of using long entity names (even my 'for' variables are usually named iSomething), but recently noticed that I cannot always pack enough meaning into a name. I was doing some DVB transport stream manipulations and noticed that most of the time entity names only conveyed half of their real-world semantics. For example, I had an originalBitrate variable but there was no way of telling if this bitrate is specified in bits per second, bytes per millisecond or maybe kilobits per second. Or I had a VideoStart property and I had no idea if it is specified in milliseconds or PCR units (basic time unit in transport streams) or even as a byte offset from the start of the transport stream. I had to look into my documentation or even into the code to see how the entity in question was initialized.

That was clearly Not Good and I needed a better way. I started decorating names with descriptive suffixes.

Nowadays I'm using originalBitrate_kb_s and VideoStart_PCR and I can immediately tell that former is stored in kilobytes per second and latter in PCR units. I'm using this approach any time that simple entity name is not enough to describe its contents. For example, I'm using _pct suffix when variable holds a percentage of something and _UTC when TDateTime field contents are stored not as a local but as an universal time and even _ref when a pointer/object variable is not an owner of some data but only holds and external reference to it.

While suffixes are good for documenting purposes they also improve the code readability. I can immediately tell that assignment bitrate_b_ms := someOtherBitrate_B_s is wrong and that someTime := otherTime_UTC is wrong or suspicious. I can even tell that the formula Result := base_PCR + MSToPCR(offset_B / bitrate_B_ms) makes some sense. [B] / [B/ms]  gives [ms], which MSToPCR somehow converts to [PCR] units which are then added to some base timestamp, also stored in [PCR] format. Check.

Decorators can be helpful, but still you should use them sparingly. Using a variable bitrate_kb_s_unverified_data_reported_from_external_dll is not such a good idea. Comments are still useful when you have to document that level of semantics. 

Technorati tags: , ,


  1. Anonymous13:04

    I like your idea. I’ll consider using it.

    By the way, I’ve also noticed that the way I format my code changes over time. The only thing that remains is my paranoiac desire to add empty braces () to a procedure/function calls/declarations which take no parameters (C++-syntax-like).
    dummy:= foo();
    This is how I distinguish variables/properties from methods.
    Also, I hate with fierce hatred adding semicolon before “end”.
    procedure p();
    if dummy = 1 then

  2. I don't feel a need to distinguish properties from methods but I do stronly distinguish variables from everything else by always starting a variable in a lowercase letter and everything else in an uppercase letter.

  3. Anonymous15:41

    Very good Idea, and fairly obvious.

    For variables I tend to add a local variables I tend to add a v in front (vStartDate) while with variables in a Class I use Delphi's defualt F (FStartDate) Constants always a c (cDefault).
    But in the bad code I am currently working in it usually is a wild guess of what any variable means and two lines later it might've changed. (ns. That's before I touched the code.
    Oh the Joys (sic) of inheriting code at a new work.)

  4. You could also do something like this

    TSpeed_bit_s = type Double;
    TSpeed_kb_s = type Double;

    Function To_bit_s(aSpeed : TSpeed_kb_s) : TSpeed_bit_s;
    Function To_kb_s(aSpeed : TSpeed_bit_s) : TSpeed_kb_s;

    and let the compiler check that you never make the wrong assignment;

  5. Anonymous21:37

    Using long, postfixed variables is sometimes an indication that the programmer didn't apply object oriented design to his code. For example, you noted that the long names make the following assignment suspicious:

    someTime := otherTime_UTC

    While true, a more object oriented approach would be to declare a Time class, and the compiler will take care of all wrong assignments.

    The same is true in your originalBitrate variable; your post makes it clear that it is a numeric type, an int or a float; but declaring it as its own class would go a long way towards both readability and type-safety.

  6. Object orientation has its limits - sometime you have to stop and do something useful.

    Creating a subtype for every such type is great example of an overkill.

  7. Anonymous23:08

    I love long descriptive names. I hate having to guess what the hell something does. The units attached to a name is a great idea, makes unit analysis easier for math statements too.

    Admittedly, many of my for loop variables are just loop, innerloop, that sorta of thing (unless it really, REALLY matters - normally it doesn't... Loop usually does)

    Something about my code leaves my C programmers in screaming fits. Which seems fair, finding out that an 4 letter variable is actually 16 different variables can be a big motivator for some of my own screaming fits...

  8. Anonymous12:26

    Welcome to hungarian notation. (Yes, according to the wikipedia article the original goal of hungarian notation was to actually describe the meaning and usage of a variable rather than its type.)

  9. Agree. In a way, this is very close to the spirit of the Hungarian notation.