开发者

Why does bash ignore newlines when doing for loop over the contents of a C-style string?

Why does the following...

c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done

print out...

iteration 0 :1 2 3 4:

and not

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

From what I understand, the $'STRING' syntax should allow me to specify a string with escape characters. Shouldn't "\n" be interpreted as newline so that the for loop echos four times, once for each line? Instead, it seems as if the newline is interpreted as a space character.

I took unwind's suggestion and tried setting $IFS. The results were same.

IFS=$'\n'; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1 2 3 4:

William Purssel says in a comment that this did not work because IFS was being set to newline... but following did not work.

IFS=' '; c=0; for i in '1 2 3 4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1 2 3 4:

Using IFS=' ' on newline-separated string resulted in even more mess...

IFS=' '; c=0; for i 开发者_运维问答in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1
2
3
4:

setting IFS to '\n' rather than $'\n' had the same effect as IFS=' ' ...

IFS='\n'; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;

iteration 0 :1
2
3
4:

There's only one iteration, but the newline is visible in the echo for some reason.

What did work is first storing the string in a variable then looping over the contents of the variable (without having to set IFS):

c=0; v=$'1\n2\n3\n4'; for i in $v; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

Which still does not explain why there is this problem.

Is there a pattern here? Is this the expected behavior of IFS as defined in unwind's link?

unwind's link states... "The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting."

I guess that explains why string literals don't get split for for-loop iteration no matter what escape characters are used. Only when the literal is assigned to a variable then that variable is expanded to be split for the for-loop does it work. I guess also with command substitution.

Examples:

Result of command substitution is split

c=0; for i in `echo $'1\n2\n3\n4'`; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

Portion of the string that was expanded is split, rest is not.

c=0; v=$'1 \n\t2\t3 4'; for i in $v$'\n5\n6'; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4 5 6:

When expansion happen in double quotes, no splitting occurs.

c=0; v=$'1\n2\n3 4'; for i in "$v"; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1 2 3 4:

Any sequence of SPACE, TAB, NEWLINE is used as delimiter for splitting.

c=0; v=$'1 2\t3 \t\n4'; for i in $v; do echo iteration $c :$i:; c=$[c+1]; done

iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:

I will accept unwind's answer as his link yields the answer to my question.

No clue as to why behavior of echo within for-loop changes with value of IFS.

EDIT: extended to clarify.


Bash doesn't do word expansion on quoted strings in this context. For example:

$ for i in "a b c d"; do echo $i; done
a b c d

$ for i in a b c d; do echo $i; done
a
b
c
d

$ var="a b c d"; for i in "$var"; do echo $i; done
a b c d

$ var="a b c d"; for i in $var; do echo $i; done
a
b
c
d

In a comment, you stated "IFS='\n' also works. What doesn't work is IFS=$'\n'. I'm very very confused right now."

In IFS='\n', you're setting the separators (plural) to the two characters backslash and "n". So if you do this (inserting an "X" in the middle of a "\n") you see what happens. It's treating the "\n" sequences literally in spite of the fact you have them in $'':

$ IFS='\n'; for i in $'a\Xnb\nc\n'; do echo $i; done; rrifs
a X b
c

Edit 2 (in response to the comment):

It sees '\n' as two characters (not newline) and $'a\Xnb\nc\n' as a literal string of 10 characters (no newlines) then echo outputs the string and interprets the "\n" sequence as a newline (since the string is "marked" for interpretation), but since it's quoted it's seen as one string rather than words delimited by $IFS.

Try these for further comparison:

$ c=0; for i in "a\nb\nc\n"; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a
b
c
:

$ c=0; for i in "a\nb\nc\n"; do echo "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a\nb\nc\n:

$ c=0; for i in a\\nb\\nc\\n; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a
b
c
:

$ c=0; for i in a\\nb\\nc\\n; do echo "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a\nb\nc\n:

Setting IFS has no effect on the above.

This works (note that $var is unquoted in the for statement):

$ var=$'a\nb\nc\n'
$ saveIFS="$IFS"   # it's important to save and restore $IFS
$ IFS=$'\n'        # set $IFS to a newline using $'\n' (not '\n')
$ c=0; for i in $var; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a:
iteration 1 :b:
iteration 2 :c:
$ IFS="$saveIFS"


Change your $IFS setting to change how bash splits text into words.

Editor's note:
This answer was accepted, because it provides a link to information that ultimately explains the underlying issues.
Note, however, that the OP's problem can not be solved simply by changing $IFS, because $IFS doesn't apply to quoted strings.


Two reasons:

  1. Your for loop loops only once: there is only one element to loop on, which is the $'1\n2\n3\n4' string. If you want to loop four times, you have to change $IFS, as suggested by unwind.

  2. echo takes this string, and interprets it as four arguments separated by newlines. It then displays all arguments separated by whitespaces. If you want that echo doesn't interpret the input string, put it in double quotes, as in echo "$i".

Edit, after question edit:

  • I tried changing $IFS: it worked, but I used export $IFS='\n'

  • In your second case, $v gets interpreted by bash in for command which interprets it as four arguments separated by newlines. If you want to get your first problem again, just use for f in "$v" instead of for f in $v.


try

c=0; for i in $'1\\n2\\n3\\n4'; do echo -e iteration $c :$i:; c=$[c+1]; done

the extra backslashes preserve the escapes for the newlines, the echo -e tells echo to expand the escapes.


Dennis Williamson's helpful answer fully explains the symptoms, and even the question itself now mostly does; mouviciel's answer boils the issues down well, but (as of this writing) contains incorrect information about $IFS.
Therefore, let me attempt a summary of the rules that apply, followed by a detailed analysis:

  • With quoted strings, irrespective of the quoting style, IFS, the Internal Field Separator never comes into play.

    • A quoted string as the sole driver of a for loop always results in a single iteration, with the (potentially expanded) string getting assigned as a whole to the loop variable.
  • Splitting strings into words by the separator characters specified in $IFS (word-splitting) only applies to the results of unquoted expansions, namely:

    • unquoted variable references ($var), called parameter expansions (which includes transformations such as prefix and suffix removal, substitutions, ...)
    • unquoted command substitutions ($(...) or old-style `...`)
    • unquoted arithmetic expansions ($(( ... )) - note that syntax $[...] is obsolete and should be avoided).
  • In order to assign control characters such as <newline> and <tab> to $IFS, use an ANSI C-quoted string ($'...'), which understands escape sequences such as \n and \t; e.g., IFS=$'\n'; by contrast, IFS='\n' would assign 2 literal characters: literal \ and literal n (single-quoted strings always use their content literally).

Note that if the echo command in the original code had used a single, double-quoted argument (echo "iteration $c :$i:"), then $IFS would not have applied altogether, which would have avoided the confusion.


Analysis of the command from the question:

c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done
  • $IFS and word-splitting only apply to the echo command, not the for loop.

  • ANSI C-quoted string $'1\n2\n3\n4' as the loop driver results in the following 4-line string assigned to $i:

    1
    2
    3
    4
    
  • echo iteration $c :$i:, due to having only unquoted arguments, makes the shell subject them to word-splitting as well as globbing (filename expansion; although that has no effect in this particular case):

    • $c, due to containing just 0 (in the one and only iteration) is not modified in the process.

    • :$i:, by contrast, based on $IFS containing <space><tab><newline> by default, is split into 4 separate words: :1, 2, 3, and 4: - note how the enclosing : became part of the first and last word.

    • Note: To use a variable's value as-is, always double-quote the variable reference.
      Word splitting and globbing are instances of shell expansions, which is the umbrella term for the up-front interpretation of arguments by the shell.

  • echo is therefore handed 6 individual arguments: iteration, 0, and :1, 2, 3, and 4:. On output, echo concatenates its arguments with a single space (unrelated to $IFS), yielding iteration 0 :1 2 3 4:


How to write the loop robustly

Note the double-quoting of the string passed to echo, and the embedded arithmetic expansion that combines reporting the current value of $c with incrementing it afterwards ($((c++))).

If the iteration values are known in advance:

# Simply use an unquoted, space-separated list (the indiv. elements may be quoted, however).
c=0; for i in 1 2 3 4; do echo "iteration: $((c++)) :$i:"; done

# Alternative, with an array:
vals=( 1 2 3 4 )
c=0; for i in "${vals[@]}"; do echo "iteration: $((c++)) :$i:"; done

# If the iteration values form a range of numbers, you can also use
# brace expansion (`for i in {1..4}...`) or, better for larger ranges
# and required for variable-based endpoints, a C-style loop (`for ((i=0;i<4;++i))...`)

If the iteration values are NOT known in advance:

Using for to loop over lines of input is ill-advised, because the use of an unquoted expansion would require you to deal with possibly unwanted word-splitting and globbing, and because the entire input must be read into memory as a whole before the loop starts.

A while loop to which the lines are provided via stdin is the better choice (<<< is a here-string, a string that is passed via stdin):

c=0; while IFS= read -r i; do echo "iteration: $((c++)) :$i:"; done <<<$'1\n2\n3\n4'

read reads line by line, and -r combined with IFS= (disabling word-splitting by setting it to the null string) ensures that each line is read in full, as-is.
Note that by prepending IFS=  directly to read, its value is localized to that command, without changing the current shell's $IFS value - this is a generic mechanism in POSIX-compatible shells.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜