Filling space/tab separated, empty columns with 0
i have a huge file and as an output some columns doesn't have a value, i need to fill these columns with 0 for furth开发者_运维技巧er analysis. I can separate the columns with space or tab, now below it is seen separated with tab.
This is really a job for a CSV parser, but if it has to be a regex, and you never have tabs within quoted CSV entries, you could search for
(^|\t)(?=\t|$)
and replace with
$10
So, in Perl:
(ResultString = $subject) =~
s/( # Match either...
^ # the start of the line (preferably)
| # or
\t # a tab character
) # remember the match in backreference no. 1
(?= # Then assert that the next character is either
\t # a(nother) tab character
| # or
$ # the end of the line
) # End of lookahead assertion
/${1}0/xg;
This will transform
1 2 4 7 8
2 3 5 6 7
into
1 2 0 4 0 0 7 8
0 2 3 0 5 6 7 0
For a tab-separated file, this AWK snippet does the trick:
BEGIN { FS = "\t"; OFS="\t" }
{
for(i = 1; i <= NF; i++) {
if(!$i) { $i = 0 }
}
print $0
}
Here's a sed
solution. Note that some versions of sed
don't like \t
.
sed 's/^\t/0\t/;:a;s/\t\t/\t0\t/g;ta;s/\t$/\t0/' inputfile
or
sed -e 's/^\t/0\t/' -e ':a' -e 's/\t\t/\t0\t/g' -e 'ta' -e 's/\t$/\t0/' inputfile
Explanation:
s/^\t/0\t/ # insert a zero before a tab that begins a line
:a # top of the loop
s/\t\t/\t0\t/g # insert a zero between a pair of tabs
ta # if a substitution was made, branch to the top of the loop
s/\t$/\t0/ # insert a zero after a tab that ends a line
Deleting my answer after re-reading the original post. There are no tabs as data, just delimeters. If there is no data, a double delimeter will apear to align the columns.
It can't be any other way. So if a single delimeter is there, it will separate two empty fields. "" = 1 empty field, "\t" = 2 empty fields. I got it now.
Tim Pietzcker has the correct answer all along. +1 for him.
It could be written alternatively as s/ (?:^|(?<=\t)) (?=\t|$) /0/xg;
, but its the same thing.
If and only if your data only contains numbers and you have clear defined field separator FS
, you can use the following trick:
awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=NF;++i) $i+=0}1' file
By adding zero, we convert strings to numbers. Empty strings will be converted to the number zero. You can define your field separator to anything you like.
This, however, might be a bit slow since it will reparse $0
and split it into fields, every time you reassign a field $i
.
A faster way is the solution of Dennis Williamson
精彩评论