Why does bash ignore newlines when doing for loop over the contents of a C-style string?
Why does the following...
c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done
print out...
iteration 0 :1 2 3 4:
and not
iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:
From what I understand, the $'STRING' syntax should allow me to specify a string with escape characters. Shouldn't "\n" be interpreted as newline so that the for loop echos four times, once for each line? Instead, it seems as if the newline is interpreted as a space character.
I took unwind's suggestion and tried setting $IFS. The results were same.
IFS=$'\n'; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;
iteration 0 :1 2 3 4:
William Purssel says in a comment that this did not work because IFS was being set to newline... but following did not work.
IFS=' '; c=0; for i in '1 2 3 4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;
iteration 0 :1 2 3 4:
Using IFS=' ' on newline-separated string resulted in even more mess...
IFS=' '; c=0; for i 开发者_运维问答in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;
iteration 0 :1
2
3
4:
setting IFS to '\n' rather than $'\n' had the same effect as IFS=' ' ...
IFS='\n'; c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done; unset IFS;
iteration 0 :1
2
3
4:
There's only one iteration, but the newline is visible in the echo for some reason.
What did work is first storing the string in a variable then looping over the contents of the variable (without having to set IFS):
c=0; v=$'1\n2\n3\n4'; for i in $v; do echo iteration $c :$i:; c=$[c+1]; done
iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:
Which still does not explain why there is this problem.
Is there a pattern here? Is this the expected behavior of IFS as defined in unwind's link?
unwind's link states... "The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting."
I guess that explains why string literals don't get split for for-loop iteration no matter what escape characters are used. Only when the literal is assigned to a variable then that variable is expanded to be split for the for-loop does it work. I guess also with command substitution.
Examples:
Result of command substitution is split
c=0; for i in `echo $'1\n2\n3\n4'`; do echo iteration $c :$i:; c=$[c+1]; done
iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:
Portion of the string that was expanded is split, rest is not.
c=0; v=$'1 \n\t2\t3 4'; for i in $v$'\n5\n6'; do echo iteration $c :$i:; c=$[c+1]; done
iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4 5 6:
When expansion happen in double quotes, no splitting occurs.
c=0; v=$'1\n2\n3 4'; for i in "$v"; do echo iteration $c :$i:; c=$[c+1]; done
iteration 0 :1 2 3 4:
Any sequence of SPACE, TAB, NEWLINE is used as delimiter for splitting.
c=0; v=$'1 2\t3 \t\n4'; for i in $v; do echo iteration $c :$i:; c=$[c+1]; done
iteration 0 :1:
iteration 1 :2:
iteration 2 :3:
iteration 3 :4:
I will accept unwind's answer as his link yields the answer to my question.
No clue as to why behavior of echo within for-loop changes with value of IFS.
EDIT: extended to clarify.
Bash doesn't do word expansion on quoted strings in this context. For example:
$ for i in "a b c d"; do echo $i; done
a b c d
$ for i in a b c d; do echo $i; done
a
b
c
d
$ var="a b c d"; for i in "$var"; do echo $i; done
a b c d
$ var="a b c d"; for i in $var; do echo $i; done
a
b
c
d
In a comment, you stated "IFS='\n' also works. What doesn't work is IFS=$'\n'. I'm very very confused right now."
In IFS='\n'
, you're setting the separators (plural) to the two characters backslash and "n". So if you do this (inserting an "X" in the middle of a "\n") you see what happens. It's treating the "\n" sequences literally in spite of the fact you have them in $''
:
$ IFS='\n'; for i in $'a\Xnb\nc\n'; do echo $i; done; rrifs
a X b
c
Edit 2 (in response to the comment):
It sees '\n'
as two characters (not newline) and $'a\Xnb\nc\n'
as a literal string of 10 characters (no newlines) then echo
outputs the string and interprets the "\n" sequence as a newline (since the string is "marked" for interpretation), but since it's quoted it's seen as one string rather than words delimited by $IFS
.
Try these for further comparison:
$ c=0; for i in "a\nb\nc\n"; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a
b
c
:
$ c=0; for i in "a\nb\nc\n"; do echo "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a\nb\nc\n:
$ c=0; for i in a\\nb\\nc\\n; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a
b
c
:
$ c=0; for i in a\\nb\\nc\\n; do echo "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a\nb\nc\n:
Setting IFS has no effect on the above.
This works (note that $var
is unquoted in the for
statement):
$ var=$'a\nb\nc\n'
$ saveIFS="$IFS" # it's important to save and restore $IFS
$ IFS=$'\n' # set $IFS to a newline using $'\n' (not '\n')
$ c=0; for i in $var; do echo -e "iteration $c :$i:"; c=$[c+1]; done
iteration 0 :a:
iteration 1 :b:
iteration 2 :c:
$ IFS="$saveIFS"
Change your $IFS
setting to change how bash splits text into words.
Editor's note:
This answer was accepted, because it provides a link to information that ultimately explains the underlying issues.
Note, however, that the OP's problem can not be solved simply by changing $IFS
, because $IFS
doesn't apply to quoted strings.
Two reasons:
Your
for
loop loops only once: there is only one element to loop on, which is the$'1\n2\n3\n4'
string. If you want to loop four times, you have to change$IFS
, as suggested by unwind.echo
takes this string, and interprets it as four arguments separated by newlines. It then displays all arguments separated by whitespaces. If you want thatecho
doesn't interpret the input string, put it in double quotes, as inecho "$i"
.
Edit, after question edit:
I tried changing
$IFS
: it worked, but I usedexport $IFS='\n'
In your second case,
$v
gets interpreted by bash infor
command which interprets it as four arguments separated by newlines. If you want to get your first problem again, just usefor f in "$v"
instead offor f in $v
.
try
c=0; for i in $'1\\n2\\n3\\n4'; do echo -e iteration $c :$i:; c=$[c+1]; done
the extra backslashes preserve the escapes for the newlines, the echo -e
tells echo to expand the escapes.
Dennis Williamson's helpful answer fully explains the symptoms, and even the question itself now mostly does; mouviciel's answer boils the issues down well, but (as of this writing) contains incorrect information about $IFS
.
Therefore, let me attempt a summary of the rules that apply, followed by a detailed analysis:
With quoted strings, irrespective of the quoting style,
IFS
, the Internal Field Separator never comes into play.- A quoted string as the sole driver of a
for
loop always results in a single iteration, with the (potentially expanded) string getting assigned as a whole to the loop variable.
- A quoted string as the sole driver of a
Splitting strings into words by the separator characters specified in
$IFS
(word-splitting) only applies to the results of unquoted expansions, namely:- unquoted variable references (
$var
), called parameter expansions (which includes transformations such as prefix and suffix removal, substitutions, ...) - unquoted command substitutions (
$(...)
or old-style`...`
) - unquoted arithmetic expansions (
$(( ... ))
- note that syntax$[...]
is obsolete and should be avoided).
- unquoted variable references (
In order to assign control characters such as
<newline>
and<tab>
to$IFS
, use an ANSI C-quoted string ($'...'
), which understands escape sequences such as\n
and\t
; e.g.,IFS=$'\n'
; by contrast,IFS='\n'
would assign 2 literal characters: literal\
and literaln
(single-quoted strings always use their content literally).
Note that if the echo
command in the original code had used a single, double-quoted argument (echo "iteration $c :$i:"
), then $IFS
would not have applied altogether, which would have avoided the confusion.
Analysis of the command from the question:
c=0; for i in $'1\n2\n3\n4'; do echo iteration $c :$i:; c=$[c+1]; done
$IFS
and word-splitting only apply to theecho
command, not thefor
loop.ANSI C-quoted string
$'1\n2\n3\n4'
as the loop driver results in the following 4-line string assigned to$i
:1 2 3 4
echo iteration $c :$i:
, due to having only unquoted arguments, makes the shell subject them to word-splitting as well as globbing (filename expansion; although that has no effect in this particular case):$c
, due to containing just0
(in the one and only iteration) is not modified in the process.:$i:
, by contrast, based on$IFS
containing<space><tab><newline>
by default, is split into 4 separate words::1
,2
,3
, and4:
- note how the enclosing:
became part of the first and last word.Note: To use a variable's value as-is, always double-quote the variable reference.
Word splitting and globbing are instances of shell expansions, which is the umbrella term for the up-front interpretation of arguments by the shell.
echo
is therefore handed 6 individual arguments:iteration
,0
, and:1
,2
,3
, and4:
. On output,echo
concatenates its arguments with a single space (unrelated to$IFS
), yieldingiteration 0 :1 2 3 4:
How to write the loop robustly
Note the double-quoting of the string passed to echo
, and the embedded arithmetic expansion that combines reporting the current value of $c
with incrementing it afterwards ($((c++))
).
If the iteration values are known in advance:
# Simply use an unquoted, space-separated list (the indiv. elements may be quoted, however).
c=0; for i in 1 2 3 4; do echo "iteration: $((c++)) :$i:"; done
# Alternative, with an array:
vals=( 1 2 3 4 )
c=0; for i in "${vals[@]}"; do echo "iteration: $((c++)) :$i:"; done
# If the iteration values form a range of numbers, you can also use
# brace expansion (`for i in {1..4}...`) or, better for larger ranges
# and required for variable-based endpoints, a C-style loop (`for ((i=0;i<4;++i))...`)
If the iteration values are NOT known in advance:
Using for
to loop over lines of input is ill-advised, because the use of an unquoted expansion would require you to deal with possibly unwanted word-splitting and globbing, and because the entire input must be read into memory as a whole before the loop starts.
A while
loop to which the lines are provided via stdin is the better choice (<<<
is a here-string, a string that is passed via stdin):
c=0; while IFS= read -r i; do echo "iteration: $((c++)) :$i:"; done <<<$'1\n2\n3\n4'
read
reads line by line, and -r
combined with IFS=
(disabling word-splitting by setting it to the null string) ensures that each line is read in full, as-is.
Note that by prepending IFS=
directly to read
, its value is localized to that command, without changing the current shell's $IFS
value - this is a generic mechanism in POSIX-compatible shells.
精彩评论