Why is my Bash script adding <feff> to the beginning of files?

2022-12-14 20:46 问答作者：

I've written a script that cleans up .csv files, removing some bad commas and bad quotes (bad, means they break an in house program we use to transform these files) using sed:

# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed $1 > $1.1st

# remove all quotes
sed 's/\"//g' $1.1st > $1.tmp

# add the good quotes around good commas
sed 's/\,/\"\,\"/g' $1.tmp > $1.tmp1

# add leading quotes
sed 's/^/\"开发者_开发问答/' $1.tmp1 > $1.tmp2

# add trailing quotes
sed 's/$/\"/' $1.tmp2 > $1.tmp3

# remove utf characters
sed 's/<feff>//' $1.tmp3 > $1.tmp4

# replace original file with new stripped version and delete .tmp files
cp -rf $1.tmp4 quotes_$1

Here is clean.sed:

s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;

Then it removes the temp files and viola we have a new file that starts with the word "quotes" that we can use for our other processes.

My question is:

Why do I have to make a sed statement to remove the feff tag in that temp file? The original file doesn't have it, but it always appears in the replacement. At first I thought cp was causing this but if I put in the sed statement to remove before the cp, it isn't there.

Maybe I'm just missing something...

U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.

To get rid of these in GNU emacs:

Open Emacs
Do a find-file-literally to open the file
Edit off the leading three bytes
Save the file

There is also a way to convert files with DOS line termination convention to Unix line termination convention.

继续阅读：bash cp sed

Why is my Bash script adding <feff> to the beginning of files?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？