Convert white space to tabs
I recently decided to learn C, so I started going through K&R, but I got stuck on Problem 21 in Chapter 1. You are supposed to write a program, which given a string without tabs and a certain tabwidth, converts all white space into the equivalent spacing using tabs and white space.
So开发者_如何学Go far, I've got this:
void entab (char from[], char to[], int length, int tabwidth)
{
int i, j, tabpos, flag, count;
j = tabpos = flag = count = 0;
for (i = 0; from[i] != '\0' && j < length - count - 2; i++) {
if (from[i] == ' ') {
// If you see a space, set flag to true and increment the
// whitespace counter. Don't add any characters until you reach the
// next tabstop.
count++;
tabpos = (tabpos + 1) % tabwidth;
flag = 1;
if (count >= tabwidth - tabpos) {
to[j] = '\t';
j++;
count = count - tabwidth + tabpos;
tabpos = 0;
}
} else {
if (flag == 1) {
// if you see something other than a space and flag is true,
// there weren't enough spaces to reach a tabstop. Add count
// spaces to the string.
flag = 0;
tabpos = (tabpos + count + 1) % tabwidth;
while (count > 0) {
to[j] = ' ';
j++;
count--;
}
} else {
tabpos = (tabpos + 1) % tabwidth;
}
count = 0;
to[j] = from[i];
j++;
}
}
to[j] = '\0';
return;
}
which, unfortunaly, seems to produce a slightly larger spacing than it's supposed to. Any ideas about where I screwed up?
PS I've looked at other solutions online and I understand that there is a much better approach to the problem, but I would really like to fix the error in mine as well.
EDIT: Setting tabwidth=4 and using:
foobar
foo bar foo bar
foo bar
as input, I get:
/t/tfoobar
/t/t/t/tfoo bar/t/t/t foo bar
/t/tfoo/t/t bar
as output, while the correct output would be:
/tfoobar
/t/tfoo/t bar/t/t foo bar
/tfoo/t/tbar
This much code in a single function and with that much nesting (four levels deep) is hard to get right and even harder to maintain.
I would suggest you first refactor what you have into more manageable pieces. For example:
- create a function called findTab that takes the string to test, the tab width (# of spaces) and returns the index of the first occurrence of a tab.
- Then create a function called something like replaceChars that takes the right of args and performs the replacement
Those are just a couple of ideas. What you will end up with is a handful of much shorter and much more manageable functions and most likely find your bug(s) along the way!
Btw, i tried a quick Google search for some C refactoring articles but alas came away empty handed. These days refactoring is all about OOP languages like Java, C# and C++. Still, a few rules of thumb will go a long way for you:
- Minimize nesting of code blocks:
if
statements andfor
&while
loops - Minimize the number of lines per function (within reason) keep it less than 60 tops
- (borrowed from OOP classes) make sure that each function does one very limited thing very well
- Name your variables and methods more verbosely - seems like a pain when you're doing it but it will pay dividends in the future, like now.
Good luck and welcome to SO!
if (count >= tabwidth - tabpos)
This starts outputting tabs too soon. Consider the input string:
"aa "
with 8 as tabwidth. You get to character 5 (i will be 4, count will be 3) when tabpos becomes also 5, therefore triggering the condition. You don't want to output tabs before your i variable reaches tabwidth.
I won't fix your code either. But instead of making general statements like "refactor your code" which is just a 21st century way of saying "rewrite your code", I can point out one obvious mistake. You don't need the tabpos variable. Just use (i % tabwidth). From here on things should start falling into place.
One thing that's extremely suspicious about your code is the way in which you increase tabpos. You increment it (mod tabwidth
) for every space twice. Once after incrementing count
, and again when flag == 1
.
When debugging, especially code that has conditionals that affect future iterations of a loop, it's good to think about invariants. These are statements that should always be true whenever you execute a particular line of code.
Another thing: give your variables better names. flag
and count
are especially bad, but tabpos
is kind of confusing as well. My first thought was "the position of which tab?" but it appears that you actually want it to be the number of character-cells since the last tab-stop (or equivalently, your current position within a tab-column).
In any case, they should have clearer names. If the concept is too hard to name well then that may be a sign that you need to rethink your algorithm, or you should at least comment the variables. It's generally good to figure out invariants for variables too, like "tabpos is the number of character-cells since the last tab-stop" (which your code violates == bug).
Finally, you could try stepping through your code in a debugger, or even using printf debugging to see why it misbehaves on certain inputs.
精彩评论