开发者

Processing a tab delimited file with shell script processing

normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell.

I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indee开发者_运维技巧d integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated.

Many thanks! Lilly


gawk:

BEGIN {
  FS="\t"
}

(NF != 6) || ($2 != int($2)) {
  exit 1
}

Invoke as follows:

if awk -f colcheck.awk somefile
then
  # is valid
else
  # is not valid
fi


Well you can directly tell awk what the field delimiter is (the -F option). Inside your awk script you can tell how many fields are present in each record with the NF variable.

Oh, and you can check the second field with a regex. The whole thing might look something like this:

awk < thefile -F\\t '
{ if (NF != 6 || $2 ~ /[^0123456789]/) print "Format error, line " NR; }
'

That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)


here's how to do it with awk

awk 'NF!=6||$2+0!=$2{print "error"}' file


Pure Bash:

infile='column6.dat'
lno=0

while read -a line ; do
  ((lno++))
  if [ ${#line[@]} -ne 6 ] ; then
    echo -e "line $lno has ${#line[@]} elements"
  fi
  if ! [[  ${line[1]} =~ ^[0-9]+$ ]] ; then
    echo -e "line $lno column  2 : not an integer"
  fi
done < "$infile"

Possible output:

line 19 has 5 elements
line 36 column  2 : not an integer
line 38 column  2 : not an integer
line 51 has 3 elements
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜