ArticleS. MichaelFeathers.
ConceptsThatDontLikeToBeNamed [add child]

Concepts That Don't Like To Be Named

Standard C has a function called atoi. It takes an int and converts it to an ASCII string.

There's another function called itoa that does the reverse. Unfortunately, it isn't part of standard C, it's just implemented roughly the same way in many C libraries. It takes a value to convert, a buffer to write the string representation into, and a radix. The return value is just a pointer to the buffer that you pass in:

char *itoa(int value, char *out_buffer, int radix);

Nice generality there with the radix, but if you don't have itoa and you only care about a radix of ten, you may be tempted to write something simpler, an itoa10 function.. something you can knock off very quickly.. and along the way you might find concepts that don't like to be named..

 You can find out more by clicking here..
 And here..
 And, so on..


 Sun, 28 May 2006 12:04:08, Les Button,
Hi Michael, it seems to me that there's a concept lurking here called "least significant digit" it is an important concept because your implementation immediately jumps to the use of the modulus operator to generate the digits in the result, which naturally gets them in "least significant" to "most significant" order. This is arguably "simplest", and is certainly most efficient for the computer. But it's easy (for me) to imagine a 5th grader who's mastering "powers of ten" working out the digits in the opposite (and unreversed) order. I also tend to look at the most significant digits first (on bank statements, in experimental data, etc :)

I suppose it is a matter of taste whether introducing a simple "explaining variable" helps to explain what's going on, or at least points the reader of the code to the right technical idea to understand ...something like

char *itoa10(int value, char *buffer)
char *result = buffer;
int remainder = value;
int lst_sig_dig = abs(remainder % radix);
const int radix = 10;

do {
*buffer++ = lst_sig_dig + '0';
remainder /= radix;
lst_sig_dig = abs(remainder % radix);
} while (remainder != 0);
if (value < 0)
*buffer++ = '-';
*buffer = '\0';
return strrev(result);

I really like this example because it reminds me of two things that are easy to forget:
1) Sometimes it is really hard to make the code express everything, and from my experience this can be especially true for mathematical algorithms. Just look at code for FFTs, for Gaussian elimination, for numerical solution of differential equations: there exists amazingly good code for carrying out these tasks, but their usability hinges crucially on the quality of documentation and background theory that comes with them.
2) I can't imagine any simple path for refactoring from a "most significant digit first" algorithm to this clean, efficient one. Programmers shouldn't forget the "Substitute Algorithm" refactoring!

''Les, I don't see a easy way of going there either. It's great to have the passing tests to help in the move though.
 Sun, 28 May 2006 20:19:12, ,
 Sun, 28 May 2006 20:20:27, Les Button,
Oops, of course "lst_sig_dig" should be declared after "radix" in the snippet I provided, sigh.
 Wed, 31 May 2006 22:33:15, Anonymous Coward, Concepts that don't like to be explained
This article is more than a tad obfuscated: You never actually explain what it is you mean by "concepts that don't like to be named;" yet you use the term twice. I'm left with the impression that you think that the notion of producing a reversed string is a sufficiently obtuse concept that it should not be dignified with a named symbol in the program.

However, I've since decided that the concept you didn't want to name is the fact that your 2nd to last program fails the test case for MIN_INT/INT_MIN. As it turns out, -INT_MIN == INT_MIN; so you'll need to convert nonneg_int_to_revstring() to accept unsigned int, or at least use an unsigned radix. Fortunately, your final version passes the test because it uses abs() around the signed modulo operator. ;-)

Seriously though: What do you really mean by "Concepts that don't like to be named?"

Yes, I was referring to fact that a reverse string is pretty obtuse, not the concept itself, but it seems very "low weight" to have functions that operate on it. Re the cases for INT_MIN, that's the way TDD is done. You don't get to the case for INT_MIN until you get to the case for INT_MIN. If, for instance, I was writing a Stack, I might go through a couple cases for pop() before ever getting to a case for 'pop on empty.' If you look at the intermediate implementations for pop() you'd go "oh wow it has a bug" but not really, it's just not done yet. It's the price you pay for doing things in small steps. The benefit you get is feedback on your partial implementations. -- Michael Feathers
 Tue, 27 Jun 2006 15:14:49, Tim Ottinger, Doesn't sscanf do this?
 Thu, 6 Jul 2006 13:08:54, , Yeah, yeah
I don't see a problem with reversing strings. I do that all the time, and I've found clever ways to do that in 1 liners, reversing in place.

The only thing that I dislike of your itoa implementation, is that the buffer must be big enough to hold the data. And the buffer must be passed as parameter, so how do I know if the buffer I'm passing is big enough? Will it core dump?

The caller has 3 options:

1. Call malloc before calling itoa, then free when done:

char * data = (char *) malloc( 10 + number );
char * result = itoa( number, data, 10 );
... use the result ...
free( data );

Notice I've removed the check of data == null after malloc for brevity.

2. Use an array and never care to return the memory because it is automatic.

char data[2048];
char * result = itoa( number, data, 10 );
... use the result ...

The problem with this is that if you call itoa several times, you need to make sure you are using differente data to hold the result. This makes the code incredibly hard to use and make you wish you worked in Java or Smalltalk.

3. Create an String class (in C++) which handles all that stuff automatically, calling itoa when apropiate and returning the memory as needed.

Unfortunately, a lot of this is just "life in C." There are some people who work in C environments and they, for some reason political or technical, can't use C++. For better or worse, in C you have to know your bounds. -- MichaelFeathers