Wednesday, 18 June 2008

Early Out

A quick post on multiple return points. Some people will argue that they are a Bad Thing(TM), but I would like to post my reasons for using them in my code. First of all, what do I mean by multiple return points? I mean that a function has more than one return statement in it.

function quiteMessy(param1, param2, param3)
{
if (precondition)
return 0;

for (i = 1 to 10)
{
if (something(i))
return i * param1;
}

doSomeStuff();
doSomeMoreStuff();

if (condition(param2))
return param2 * param3;

doEvenMoreStuff();
andMoreStuff();

val = doSomethingWith(param3);
return val;
}


In this code, there are no less than 4 return points. The trouble happens when you have temporal coupling (requiring that things happen in a certain order).

If this code was written with the requirement that doSomeStuff() was called before exiting, and then the precondition / return 0 code was written afterwards, then the requirement is not met. So yes, in this case, it was wrong to have the multiple exit points.

But I would argue that there are good reasons to write code with multiple exit points.

Yesterday's post recommended that a function should only do one thing. If when calling the quiteMessy() function we expect to calculate a value and call doSomeStuff then that's two things, not one. If we put the two things into separate functions, we would remove the temporal coupling within the quiteMessy() function.

Ok, so reducing the amount we do in one function makes it easier for us to return early on in the function, but I still haven't given you my reason for multiple exit points in a function.

My reason is to improve readability of the function. I tend to base my functions on two possible patterns.

The first is that the entire function's purpose happens at the end of the function. I see the rest of the function as a crescendo, leading up to the one thing that the function does.

If I was going to return early from a function, I'd try to do it as early as possible. So I end up with some kind of precondition at the top of the function (check if it has already been done, check if it's valid to do this now, etc), followed by any setup code which is necessary for the build up to the final action of whatever it is that I want the function to do.

function calculateSomething(param1)
{
// preconditions

if (nonExistent)
return null;

if (invalidParam(param1)
return null;

// setup result

foo = preprocess(param1);
bar = barify(foo);
baz = bar.getBaz();

// return result
result = baz.getValue();
return result;
}


In the code sample above, even though there are three return points in the function, I've kept within the guidelines of the pattern. The first two return points are within the first two statements of the function - the preconditions, there are no return points during the setup of the result, and the main return point is on the last line of the function.

If that sample was written without multiple return points, I would have to nest the calculation within both preconditions.

function calculateSomething(param1)
{
result = null;

if (exists)
{
if (validParam(param1)
{
foo = preprocess(param1);
bar = barify(foo);
baz = bar.getBaz();

result = baz.getValue();
}
}

return result;
}


It no longer feels as though calculating the result is the important thing. I prefer to keep my focus at the top level of the function, rather than nesting several layers in. Nested code is a deviation from the straightforward path of the function, and makes the code more difficult to understand.

The second pattern I base my functions on is doing the thing that I called the function for at the first opportune moment.

One possible example of this would be finding a value.

function findThing(param1)
{
for (ix = 1 to itemCount)
{
candidate = item[ix];

if (candidate.name == param1)
return candidate;
}

return Not_found;
}



Searching through the items in this code sample, I would like to return the result as soon as I find it, but I also have a return point at the end of the function in case the item wasn't found. Without multiple return points, I would have to store the result in a temporary value before returning it - what a waste of time that would be!

Another example of this pattern would be if I had a cached value, or both quick and slow methods of getting the result. (Presumably the quick method has drawbacks, otherwise you'd never use the slow method!).

function calculateOptimal(param1)
{
cachedResult = cached(param1);
if (cachedResult != null)
return cachedResult;

if (canCalcQuickly(param1))
return calcQuickly(param1);

// otherwise

return calcSlowly(param1);
}


I dread to think what the single return point version of this code would look like!

Earlying out of a function is fine, as long as it makes the code easier to read and understand.

No comments: