6/26/08

Write a function that reverses the order of the words in a string.( Words not the individual charecterS)

For example, your function should transform the string “Do or do not, there is no try.” to “try. no is there not, do or Do”. Assume that all words are space delimited and treat punctuation the same as letters.

Just for variety, solve this problem in C, not Java or C#, and assume that you’re only dealing with ASCII characters that can be safely stored in byte arrays.

You probably already have a pretty good idea how you’re going to start this problem. Because you need to operate on words, you have to be able to recognize where words start and end. You can do this with a simple token scanner that iterates through each character of the string. Based on the definition given in the problem statement, your scanner will differentiate between nonword characters - namely, the space character - and word characters, which for this problem are all characters except space. A word begins, not surprisingly, with a word character and ends at the next nonword character or the end of the string.

The most obvious approach is to use your scanner to identify words, write these words into a temporary buffer, and then copy the buffer back over the original string. To reverse the order of the words, you will have to either scan the string backward to identify the words in reverse order or write the words into the buffer in reverse order (starting at the end of the buffer). It doesn’t matter which method you choose; the following discussion identifies the words in reverse order.

As always, consider the mechanics of how this works before you begin coding. First, you need to allocate a temporary buffer of the appropriate size. Next, you’ll enter the scanning loop, starting with the last character of the string. When you find a nonword character, you can write it directly to the buffer. When you find a word character, however, you can’t write it immediately to the temporary buffer. Because you’re scanning the string in reverse, the first word character you encounter is the last character of the word, so if you were to copy the characters in the order you find them, you’d write the characters within each word backward. Instead, you need to keep scanning until you find the first character of the word and then copy each character of the word in the correct, nonreversed order. When you’re copying the characters of a word, you need to identify the end of the word so that you know when to stop. You could do this by checking whether each character is a word character, but because you already know the position of the last character in the word, a better solution is to continue copying until you reach that position.


Tip

You may think you could avoid this complication by scanning the string forward and writing the words in reverse. However, you then have to solve a similar, related problem of calculating the start position of each word when writing to the temporary buffer.

An example may help to clarify this. Suppose you are given the string “piglet quantum.” The first word character you encounter is ‘m’. If you copy the characters as you found them, you end up with the string “mutnauq telgip”, which is not nearly as good a name for a techno group as the string you were supposed to produce, “quantum piglet”. To get “quantum piglet” from “piglet quantum” you need to scan until you get to ‘q’ and then copy the letters in the word in the forward direction until you get back to ‘m’ at position 13. Next, copy the space character immediately because it’s a nonword character. Then, just as for “quantum”, you would recognize the character ‘t’ as a word character, store position 5 as the end of the word, scan backward to ‘p’, and finally write the characters of “piglet” until you got to position 5.

Finally, after you scan and copy the whole string, write a null character to terminate the string in the temporary buffer and call strcpy to copy the buffer back over the original string. Then you can deallocate the temporary buffer and return from the function.

It’s obviously important that your scanner stop when it gets to the first character of the string. Although this sounds simple, it can be easy to forget to check that the read position is still in the string, especially when the read position is changed at more than one place in your code. In this function, you move the read position in the main token scanning loop to get to the next token and in the word scanning loop to get to the next character of the word. Make sure neither loop runs past the beginning of the string.

Programmed in C, the algorithm described so far looks like the following:

bool reverseWords( char str[] ){
char *buffer;
int tokenReadPos, wordReadPos, wordEnd, writePos = 0;

/* Position of the last character is length - 1 */
tokenReadPos = strlen(str) - 1;

buffer = (char *) malloc(tokenReadPos + 2);
if( !buffer )
return false; /* reverseWords failed */

while( tokenReadPos >= 0 ){

if( str[tokenReadPos] == ' ' ){ /* Non-word characters */
/* Write character */
buffer[writePos++] = str[tokenReadPos--];

} else { /* Word characters */
/* Store position of end of word */
wordEnd = tokenReadPos;

/* Scan to next non-word character */
while( tokenReadPos >= 0 && str[tokenReadPos] != ' ' )
tokenReadPos--;

/* tokenReadPos went past the start of the word */
wordReadPos = tokenReadPos + 1;

/* Copy the characters of the word */
while( wordReadPos <= wordEnd ){
buffer[writePos++] = str[wordReadPos++];
}
}
}
/* null terminate buffer and copy over str */
buffer[writePos] = '\0';
strcpy(str, buffer);

free(buffer);

return true; /* ReverseWords successful */
}

The preceding token scanner-based implementation is the general-case solution for this type of problem. It is reasonably efficient, and its functionality could easily be extended. It is important that you are able to implement this type of solution, but the solution is not perfect. All the scanning backward, storing positions, and copying forward is somewhat lacking in algorithmic elegance. The need for a temporary buffer is also less than desirable.

Often, interview problems have obvious general solutions and less-obvious special-case solutions. The special-case solution may be less extensible than a general solution, but more efficient or elegant. Reversing the words of a string is such a problem. You have seen the general solution, but a special-case solution also exists. In an interview, you might have been steered away from the general solution before you got to coding it. The general solution is followed through to code because token scanning and string scanning are important techniques to illustrate.

One way to improve an algorithm is to focus on a particular, concrete deficiency and try to remedy that. Because elegance, or lack thereof, is hard to quantify, you might try to eliminate the need for a temporary buffer from your algorithm. You can probably see that this is going to require a significantly different algorithm. You can’t simply alter the preceding approach to write to the same string it reads from - by the time you get halfway through you will have overwritten the rest of the data you need to read.

Rather than focus on what you can’t do without a buffer, you should turn your attention to what you can do. It is possible to reverse an entire string in place by exchanging characters. Try an example to see whether this might be helpful: “in search of algorithmic elegance” would become “ecnagele cimhtirogla fo hcraes ni.” Look at that! The words are in exactly the order you need them, but the characters in the words are backward. All you have to do is reverse each word in the reversed string. You can do that by locating the beginning and end of each word using a scanner similar to the one used in the preceding implementation and calling a reverse function on each word substring.

Now you just have to design an in-place reverse string function. The only trick is to remember that there’s no one-statement method of exchanging two values in C - you have to use a temporary variable and three assignments. Your reverse string function should take a string, a start index, and an end index as arguments. Begin by exchanging the character at the start index with the character at the end index, and then increment the start index and decrement the end index. Continue like this until the start and end index meet in the middle (in a string with odd length) or end is less than start (in a string with even length) - put more succinctly, continue while end is greater than start.

In C, these functions would look like the following:

void reverseWords( char str[] ){
int start = 0, end = 0, length;

length = strlen(str);
/* Reverse entire string */
reverseString(str, start, length - 1);

while( end < length ){
if( str[end] != ' ' ){ /* Skip non-word characters */

/* Save position of beginning of word */
start = end;

/* Scan to next non-word character */
while( end < length && str[end] != ' ' )
end++;
/* Back up to end of word */
end--;

/* Reverse word */
reverseString( str, start, end );
}
end++; /* Advance to next token */
}
}

void reverseString( char str[], int start, int end ){
char temp;
while( end > start ){
/* Exchange characters */
temp = str[start];
str[start] = str[end];
str[end] = temp;

/* Move indices towards middle */
start++; end--;
}
}

This solution does not need a temporary buffer and is considerably more elegant than the previous solution. It’s also more efficient, mostly because it doesn’t suffer from dynamic memory overhead and doesn’t need to copy a result back from a temporary buffer.

No comments:

ITUCU