开发者

Filling space/tab separated, empty columns with 0

i have a huge file and as an output some columns doesn't have a value, i need to fill these columns with 0 for furth开发者_运维技巧er analysis. I can separate the columns with space or tab, now below it is seen separated with tab.

Filling space/tab separated, empty columns with 0


This is really a job for a CSV parser, but if it has to be a regex, and you never have tabs within quoted CSV entries, you could search for

(^|\t)(?=\t|$)

and replace with

$10

So, in Perl:

(ResultString = $subject) =~ 
s/(    # Match either...
   ^   # the start of the line (preferably)
   |   # or
   \t  # a tab character
  )    # remember the match in backreference no. 1
  (?=  # Then assert that the next character is either
   \t  # a(nother) tab character
   |   # or
   $   # the end of the line
  )    # End of lookahead assertion
/${1}0/xg;

This will transform

1   2       4           7   8
    2   3       5   6   7   

into

1   2   0   4   0   0   7   8   
0   2   3   0   5   6   7   0


For a tab-separated file, this AWK snippet does the trick:

BEGIN { FS = "\t"; OFS="\t" }
{
    for(i = 1; i <= NF; i++) {
         if(!$i) { $i = 0 }
    }
    print $0
}


Here's a sed solution. Note that some versions of sed don't like \t.

sed 's/^\t/0\t/;:a;s/\t\t/\t0\t/g;ta;s/\t$/\t0/' inputfile

or

sed -e 's/^\t/0\t/' -e ':a' -e 's/\t\t/\t0\t/g' -e 'ta' -e 's/\t$/\t0/' inputfile

Explanation:

s/^\t/0\t/    # insert a zero before a tab that begins a line
:a            # top of the loop
    s/\t\t/\t0\t/g    # insert a zero between a pair of tabs
ta            # if a substitution was made, branch to the top of the loop
s/\t$/\t0/    # insert a zero after a tab that ends a line


Deleting my answer after re-reading the original post. There are no tabs as data, just delimeters. If there is no data, a double delimeter will apear to align the columns.
It can't be any other way. So if a single delimeter is there, it will separate two empty fields. "" = 1 empty field, "\t" = 2 empty fields. I got it now.

Tim Pietzcker has the correct answer all along. +1 for him.
It could be written alternatively as s/ (?:^|(?<=\t)) (?=\t|$) /0/xg;, but its the same thing.


If and only if your data only contains numbers and you have clear defined field separator FS, you can use the following trick:

awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=NF;++i) $i+=0}1' file

By adding zero, we convert strings to numbers. Empty strings will be converted to the number zero. You can define your field separator to anything you like.

This, however, might be a bit slow since it will reparse $0 and split it into fields, every time you reassign a field $i.

A faster way is the solution of Dennis Williamson

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜