Code Readability Part 2, Code Structure

2 Comments

After reading some of the comments from
Part 1 I've really been motivated to continue this series (I'm sorry
for being late to approve comments.) While writing comments can
really speak to the frame of mind, and the intent, at the time of
writing it can only go so far to ensure that the reader and/or
maintainer of your code can clearly read and understand the
creativity that you spent so long to write. I've had the pleasure of
maintaining a legacy application developed by people who were past
deadline the second they had their assignment handed to them in the
past and it can get really interesting rather quickly when you see
how sloppy you can get when you are in such a hurry. Here are some
guidelines I've given myself to ensure that the structure is correct
at the end of the day.

 

  • Break large files apart into
    smaller, more digestible pieces. This has become rather more
    important these days, I have sifted through a 20,000 line source
    file in the past, and there are some instances where grep just can't
    help. Object Oriented code makes this significantly easier as you
    can easily write a small object that you can interact with and
    separate it from your business logic. While I'm sure I could easily
    come up with a simple example of what I'm talking about I think it
    would take up too much space here. Suffice it to say that generally
    you should try to cut file length at about 1,000 lines, it makes
    searching a file a little more manageable and consuming said file
    more efficient for future developers as our brains like things
    conveniently cut down to size.

  • Using naming conventions is an
    integral part of making code easier to read and write, for example,
    I ran into an error the other day where I wasn't thinking and called
    the class Transferobject, I was able to quickly catch that error
    upon reviewing the source code as we use a CamelCaps convention for
    naming our classes. There is reason for this madness, we use
    autoloaders (not necessarily the best idea for scalability but it
    works for our purposes) as far as the Unix file system is concerned
    capitalize and lowercase letters differ significantly so calling
    TransferObject differs from transferObject, transferobject, and
    Transferobject. A standard will make sure that you can look at
    variable, class, interface, function, etc names and see where an
    error might be.

  • While we are on Variable, Class,
    Function, etc names, make sure your name, at least loosely,
    describes what it contains. I've seen countless references in a
    past coworkers code that called variables as $array, $arrayer,
    $arrayor, $a, $b, $c etc. The reason why you want to describe what
    you're containing should be obvious but let me expand just a bit.
    Let's say that I'm interacting with a function that was predefined
    and I'm trying to understand what input it takes
    [sourcecode lang='php']
    <?php
    function someFunction($a,
    $b) { … }
    ?>
    [/sourcecode]
    is not going to yield as
    much information to me as perhaps
    [sourcecode lang='php']<?php
    function
    someFunction($startDate, $endDate) { … }
    ?>

    [/sourcecode]
    or
    [sourcecode lang='php']
    <?php
    function someFunction(Date
    $startDate, Date $endDate) { … }
    ?>
    [/sourcecode]
    would
    be. In the first function all I can see is that this function takes
    two arguments, much reading and digesting will occur now (provided
    the writer didn't strongly comment his function which even then I
    have to go read the DocBlocks.) The second example is a bit more
    intuitive, now I know I'm working with a date range that starts with
    the first argument and ends with the second; however, I don't know
    that I'm dealing with an Object (in this case PEAR Date) which is
    what example 3 conveys, it type hints the arguments and as such also
    gives us a bit of error handling.

  • Structure your directories, I'm
    sure this one will bring questions of “How do directories interact
    with code readability” well that's an easy one to answer, if you
    separate out your View, Models, Controllers, Base Libraries,
    Templates, etc it will be a lot easier to find the specific thing
    you're looking for. MVC, MVP, and N-Tier does make this a lot easier
    as it gives us better specifications of how to lay out the
    directories and how to effectively use them.

  • Structure your source, for each
    level of execution you should indent a uniform amount of characters
    (personally I go with a tab for every level.) It can be very painful
    to try to read
    [sourcecode lang='php']
    <?php
    if (somecondition) {
    if (someothercondition)
    if (lastcondition) {
    ...
    } else {
    ...
    }
    } elseif (someothercondition2) {
    } else {
    ...
    }
    } else {
    }
    ?>
    [/sourcecode]
    rather than
    [sourcecode lang='php']
    <?php
    if (somecondition) {
        if (someothercondition) {
            if (lastcondition) {
                ...
            } else {
                ...
            }
        } elseif (someothercondition2) {
            ...
        } else {
            ...
        }
    } else {
    ...
    }
    ?>
    [/sourcecode]
    incidentally there is a error in the first execution block, see if you can find it.

  • While on the subject on
    conditional statements, I've seen Error Checking and Input
    Validation as a pet peeve for a long time. There are two major
    things I've noticed, most developers I've worked with either
    completely skip Input Validation and Error Checking or write these
    ugly and long conditonals:
    [sourcecode lang='php']
    <?php
    /**
     * Recursive version of in_array
     *
     * @param string $needle
     * @param array $haystack
     * @param bool $strict
     * @return bool
     */
    function in_rarray($needle, $haystack, $strict = false) {
        //check to ensure that passed haystack is an array for recursion
        if (is_array ( $haystack )) {
            $array = new RecursiveIteratorIterator ( new RecursiveArrayIterator ( $haystack ) );
            foreach ( $array as $element ) {
                //if strict mode enabled use the strict comparison operate
                if ($strict == true) {
                    if ($element === $needle) {
                        return true;
                    }
                } else {
                    if ($element == $needle) {
                        return true;
                    }
                }
            }
            return false;
        } else {
            throw new exception ( 'Haystack is not an array' );
        }
    }
    ?>
    [/sourcecode]
    Where it could be written as:
    [sourcecode lang='php']
    <?php
    /**
     * Recursive version of in_array
     *
     * @param string $needle
     * @param array $haystack
     * @param bool $strict
     * @return bool
     */
    function in_rarray($needle, $haystack, $strict = false) {
        //check to ensure that passed haystack is an array for recursion
        if (!is_array($haystack)) { throw new exception('Haystack is not an array'); }
        $array = new RecursiveIteratorIterator ( new RecursiveArrayIterator ( $haystack ) );
        foreach ( $array as $element ) {
            //if strict mode enabled use the strict comparison operate
            if ($strict == true) {
                if ($element === $needle) {
                    return true;
                }
            } else {
                if ($element == $needle) {
                    return true;
                }
            }
        }
        return false;
    }
    ?>
    [/sourcecode]
    The
    lesson to be learned here, write your flow of execution in such a
    way that it stops right away during a check should it fail rather
    than basing execution on a condition and adding an unnecessary level
    of complexity.

 

That concludes part 2 of my
mini-series, in part 3 I will be discussing how to build decent user
requirements and API Documentation. I hope everyone is having as
much fun reading this as I am writing it. I hope to do better with
comments in the near future and will be looking for a way to handle
spam blocking without having to validate everyones comments.

 

Chuck Burgess

Regarding the "return early" mantra, which I've seen a lot in PEAR feedback lately, I'm sort of torn...

Granted, I'm a bit older-school than many, so the "only have ONE return point from a function" is something I have to consciously consider. On the other hand, I can also see how "returning early" based on conditions not being met serves some code efficiency and potentially readability.

To me, though, the readability savings depend on how complex the method itself is. In the case of a long method, (one that I'd probably say is "too long" anyway), I'd more likely choose to keep the nested conditionals, simply because I can see the direct correlation between the logic that needs that condition to be met, rather than checking the condition *all-by-itself* earlier in the method and returning without further explanation. By keeping the return-due-to-unmet-condition paired with the logic-for-the-met-condition, I keep the "why" for both pieces more visibly coupled. Consider an example method where you could put five return-due-to-unmet-condition operations right at the beginning, where its logic-for-the-met-condition peers are scattered throughout the remaining 200 lines of the method. (now my head hurts from imagining a 200-line method)

Now, in a properly short method, readability of this coupled condition-met/condition-notmet might not suffer much from decoupling it into a standalone return-due-to-unmet-condition. In this instance, I'd say personal preference between the two schools of thought can be seen as effectively equal.

2009-01-19 10:18:58

Shawn Stratton

In regards to your comment Chuck,
the function I used here was set to serve as an example and not really a working one, I used a function I had written earlier and added error checking to it. The thing about it is though, beyond taking out multiple levels of nesting and pushing error checking to the front you can easily see what conditions a method has to meet in order to execute easily which is why I believe in it. I would recommend using it in a short method for no other reason than to get into the practice of it, because more often than not things are not short and not clean.

2009-01-19 13:30:21