Filling space/tab separated, empty columns with 0

2023-02-03 19:41 问答作者：

i have a huge file and as an output some columns doesn't have a value, i need to fill these columns with 0 for furth开发者_运维技巧er analysis. I can separate the columns with space or tab, now below it is seen separated with tab.

This is really a job for a CSV parser, but if it has to be a regex, and you never have tabs within quoted CSV entries, you could search for

(^|\t)(?=\t|$)

and replace with

$10

So, in Perl:

(ResultString = $subject) =~ 
s/(    # Match either...
   ^   # the start of the line (preferably)
   |   # or
   \t  # a tab character
  )    # remember the match in backreference no. 1
  (?=  # Then assert that the next character is either
   \t  # a(nother) tab character
   |   # or
   $   # the end of the line
  )    # End of lookahead assertion
/${1}0/xg;

This will transform

1   2       4           7   8
    2   3       5   6   7

into

1   2   0   4   0   0   7   8   
0   2   3   0   5   6   7   0

For a tab-separated file, this AWK snippet does the trick:

BEGIN { FS = "\t"; OFS="\t" }
{
    for(i = 1; i <= NF; i++) {
         if(!$i) { $i = 0 }
    }
    print $0
}

Here's a sed solution. Note that some versions of sed don't like \t.

sed 's/^\t/0\t/;:a;s/\t\t/\t0\t/g;ta;s/\t$/\t0/' inputfile

sed -e 's/^\t/0\t/' -e ':a' -e 's/\t\t/\t0\t/g' -e 'ta' -e 's/\t$/\t0/' inputfile

Explanation:

s/^\t/0\t/    # insert a zero before a tab that begins a line
:a            # top of the loop
    s/\t\t/\t0\t/g    # insert a zero between a pair of tabs
ta            # if a substitution was made, branch to the top of the loop
s/\t$/\t0/    # insert a zero after a tab that ends a line

Deleting my answer after re-reading the original post. There are no tabs as data, just delimeters. If there is no data, a double delimeter will apear to align the columns.
It can't be any other way. So if a single delimeter is there, it will separate two empty fields. "" = 1 empty field, "\t" = 2 empty fields. I got it now.

Tim Pietzcker has the correct answer all along. +1 for him.
It could be written alternatively as s/ (?:^|(?<=\t)) (?=\t|$) /0/xg;, but its the same thing.

If and only if your data only contains numbers and you have clear defined field separator FS, you can use the following trick:

awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=NF;++i) $i+=0}1' file

By adding zero, we convert strings to numbers. Empty strings will be converted to the number zero. You can define your field separator to anything you like.

This, however, might be a bit slow since it will reparse $0 and split it into fields, every time you reassign a field $i.

A faster way is the solution of Dennis Williamson

继续阅读：file perl regex sed

Filling space/tab separated, empty columns with 0

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？