After reading some of the comments from
Part 1 I've really been motivated to continue this series (I'm sorry
for being late to approve comments.) While writing comments can
really speak to the frame of mind, and the intent, at the time of
writing it can only go so far to ensure that the reader and/or
maintainer of your code can clearly read and understand the
creativity that you spent so long to write. I've had the pleasure of
maintaining a legacy application developed by people who were past
deadline the second they had their assignment handed to them in the
past and it can get really interesting rather quickly when you see
how sloppy you can get when you are in such a hurry. Here are some
guidelines I've given myself to ensure that the structure is correct
at the end of the day.
Break large files apart into
smaller, more digestible pieces. This has become rather more
important these days, I have sifted through a 20,000 line source
file in the past, and there are some instances where grep just can't
help. Object Oriented code makes this significantly easier as you
can easily write a small object that you can interact with and
separate it from your business logic. While I'm sure I could easily
come up with a simple example of what I'm talking about I think it
would take up too much space here. Suffice it to say that generally
you should try to cut file length at about 1,000 lines, it makes
searching a file a little more manageable and consuming said file
more efficient for future developers as our brains like things
conveniently cut down to size.Using naming conventions is an
integral part of making code easier to read and write, for example,
I ran into an error the other day where I wasn't thinking and called
the class Transferobject, I was able to quickly catch that error
upon reviewing the source code as we use a CamelCaps convention for
naming our classes. There is reason for this madness, we use
autoloaders (not necessarily the best idea for scalability but it
works for our purposes) as far as the Unix file system is concerned
capitalize and lowercase letters differ significantly so calling
TransferObject differs from transferObject, transferobject, and
Transferobject. A standard will make sure that you can look at
variable, class, interface, function, etc names and see where an
error might be.While we are on Variable, Class,
Function, etc names, make sure your name, at least loosely,
describes what it contains. I've seen countless references in a
past coworkers code that called variables as $array, $arrayer,
$arrayor, $a, $b, $c etc. The reason why you want to describe what
you're containing should be obvious but let me expand just a bit.
Let's say that I'm interacting with a function that was predefined
and I'm trying to understand what input it takes
[sourcecode lang='php']
<?php
function someFunction($a,
$b) { … }
?>
[/sourcecode]
is not going to yield as
much information to me as perhaps
[sourcecode lang='php']<?php
function
someFunction($startDate, $endDate) { … }
?>
[/sourcecode]
or
[sourcecode lang='php']
<?php
function someFunction(Date
$startDate, Date $endDate) { … }
?>
[/sourcecode]
would
be. In the first function all I can see is that this function takes
two arguments, much reading and digesting will occur now (provided
the writer didn't strongly comment his function which even then I
have to go read the DocBlocks.) The second example is a bit more
intuitive, now I know I'm working with a date range that starts with
the first argument and ends with the second; however, I don't know
that I'm dealing with an Object (in this case PEAR Date) which is
what example 3 conveys, it type hints the arguments and as such also
gives us a bit of error handling.Structure your directories, I'm
sure this one will bring questions of “How do directories interact
with code readability” well that's an easy one to answer, if you
separate out your View, Models, Controllers, Base Libraries,
Templates, etc it will be a lot easier to find the specific thing
you're looking for. MVC, MVP, and N-Tier does make this a lot easier
as it gives us better specifications of how to lay out the
directories and how to effectively use them.Structure your source, for each
level of execution you should indent a uniform amount of characters
(personally I go with a tab for every level.) It can be very painful
to try to read
[sourcecode lang='php']
<?php
if (somecondition) {
if (someothercondition)
if (lastcondition) {
...
} else {
...
}
} elseif (someothercondition2) {
} else {
...
}
} else {
}
?>
[/sourcecode]
rather than
[sourcecode lang='php']
<?php
if (somecondition) {
if (someothercondition) {
if (lastcondition) {
...
} else {
...
}
} elseif (someothercondition2) {
...
} else {
...
}
} else {
...
}
?>
[/sourcecode]
incidentally there is a error in the first execution block, see if you can find it.While on the subject on
conditional statements, I've seen Error Checking and Input
Validation as a pet peeve for a long time. There are two major
things I've noticed, most developers I've worked with either
completely skip Input Validation and Error Checking or write these
ugly and long conditonals:
[sourcecode lang='php']
<?php
/**
* Recursive version of in_array
*
* @param string $needle
* @param array $haystack
* @param bool $strict
* @return bool
*/
function in_rarray($needle, $haystack, $strict = false) {
//check to ensure that passed haystack is an array for recursion
if (is_array ( $haystack )) {
$array = new RecursiveIteratorIterator ( new RecursiveArrayIterator ( $haystack ) );
foreach ( $array as $element ) {
//if strict mode enabled use the strict comparison operate
if ($strict == true) {
if ($element === $needle) {
return true;
}
} else {
if ($element == $needle) {
return true;
}
}
}
return false;
} else {
throw new exception ( 'Haystack is not an array' );
}
}
?>
[/sourcecode]
Where it could be written as:
[sourcecode lang='php']
<?php
/**
* Recursive version of in_array
*
* @param string $needle
* @param array $haystack
* @param bool $strict
* @return bool
*/
function in_rarray($needle, $haystack, $strict = false) {
//check to ensure that passed haystack is an array for recursion
if (!is_array($haystack)) { throw new exception('Haystack is not an array'); }
$array = new RecursiveIteratorIterator ( new RecursiveArrayIterator ( $haystack ) );
foreach ( $array as $element ) {
//if strict mode enabled use the strict comparison operate
if ($strict == true) {
if ($element === $needle) {
return true;
}
} else {
if ($element == $needle) {
return true;
}
}
}
return false;
}
?>
[/sourcecode]
The
lesson to be learned here, write your flow of execution in such a
way that it stops right away during a check should it fail rather
than basing execution on a condition and adding an unnecessary level
of complexity.
That concludes part 2 of my
mini-series, in part 3 I will be discussing how to build decent user
requirements and API Documentation. I hope everyone is having as
much fun reading this as I am writing it. I hope to do better with
comments in the near future and will be looking for a way to handle
spam blocking without having to validate everyones comments.

Chuck Burgess
Regarding the "return early" mantra, which I've seen a lot in PEAR feedback lately, I'm sort of torn...
Granted, I'm a bit older-school than many, so the "only have ONE return point from a function" is something I have to consciously consider. On the other hand, I can also see how "returning early" based on conditions not being met serves some code efficiency and potentially readability.
To me, though, the readability savings depend on how complex the method itself is. In the case of a long method, (one that I'd probably say is "too long" anyway), I'd more likely choose to keep the nested conditionals, simply because I can see the direct correlation between the logic that needs that condition to be met, rather than checking the condition *all-by-itself* earlier in the method and returning without further explanation. By keeping the return-due-to-unmet-condition paired with the logic-for-the-met-condition, I keep the "why" for both pieces more visibly coupled. Consider an example method where you could put five return-due-to-unmet-condition operations right at the beginning, where its logic-for-the-met-condition peers are scattered throughout the remaining 200 lines of the method. (now my head hurts from imagining a 200-line method)
Now, in a properly short method, readability of this coupled condition-met/condition-notmet might not suffer much from decoupling it into a standalone return-due-to-unmet-condition. In this instance, I'd say personal preference between the two schools of thought can be seen as effectively equal.
2009-01-19 10:18:58