Extracting particular column name values using sed/ awk/ perl

2022-12-10 01:49 问答作者：

I have an input file say, such as:

a=1 b=2 c=3 d=4
a=2 b=3
a=0 c=7
a=3 b=9 c=0 d=5
a=4 d=1
c=9

Assume that the order of column names (a,b, c and d) remains the same. How do I write a script/ command which will help me extract values specific to columns b and d? So my output should be开发者_如何学运维:

b=2 d=4
b=3

b=9 d=5
d=1

I could write a "not-so-good" awk command using multiple delimiters to filter these out using pipes to use -F option, but I am sure there is a more elegant way to do this too.

Kindly help.

sed 's/[^bd]=[0-9]* *//g'

perl -pe 's/[^bd]=\d+ *//g' data_file

# awk '{ for(i=1;i<=NF;i++){if($i~/(b|d)=/){printf $i" "} }print ""}' file
b=2 d=4
b=3

b=9 d=5
d=1

Here is the one-liner version:

$ perl -lpe '@x=/([bd]=[0-9])/g; $_="@x"' test.txt

m//g in list context returns all the matches as a list.

#!/usr/bin/perl
use strict; use warnings;

while ( <DATA> ) {
    if( my @cols = /([bd]=[0-9])/g ) {
        print "@cols";
    }
    print "\n";
}

__DATA__
a=1 b=2 c=3 d=4
a=2 b=3
a=0 c=7
a=3 b=9 c=0 d=5
a=4 d=1
c=9

Output:

C:\Temp> t.pl
b=2 d=4
b=3

b=9 d=5
d=1

Sed will do it pretty nicely:

sed -e 's/[^bd]=[^ ]*//g' -e 's/^ *//' -e 's/ *$//' < filename

The first regex clears out the unwanted fields (everything except b and d), so that's where to modify it if you change your mind. The other two remove leading and trailing whitespace.

In Ruby:

#!/usr/bin/env ruby
filename = ARGV[0]
fields = ARGV[1..ARGV.length]

File.open(filename) do |file|
  file.each_line do |line|
    pairs = line.split(' ').map { |expression| expression.split('=') }
    value_hash = Hash[pairs]

    requested_fields = []

    fields.each do |field|
      requested_fields << "#{field}=#{value_hash[field]}" unless value_hash[field].nil?
    end

    puts requested_fields.join(' ')
  end
end

Call using ruby ruby_script_name.rb input_file.txt field1 field2.

I like how short the sed/perl solution is -- but how easily can it be modified to take longer field names? Seems like the regex would become messy quickly... Anyway, that strategy would be applicable here as well, if you'd want to use it.

Assuming you may want to do the something to the values in the future, other than just filtering, you could use this as a basis.

#! /usr/bin/env perl
use warnings;
use strict;

my @lines;

while(<>){
  my %kv = /([a-z])=([0-9])/ig;
  push @lines, \%kv;
}

for my $kv (@lines){
  # $kv->{a} ||= 1;
  # next unless $kv->{c};

  print "b=$kv->{b} " if defined $kv->{b};
  print "b=$kv->{d} " if defined $kv->{d};
  print "\n";
}

Clearly, PostScript is the way to go ... XD

(%stdin) (r) file
{
    dup 100 string readline not {exit} if
    {
        dup () eq {pop exit} if
        token pop 3 string cvs
        dup 0 get << 98 / 100 / >> exch known
        {print ( ) print} {pop} ifelse
    } loop
    / =
} loop

Usage: gs -q -dNOPROMPT -dNODISPLAY -dBATCH thisfile.ps < input

Notes: Replace the << 98 / 100 / >> with the appropriate ASCII values (98 = b, 100 = d), each followed by a space-delimited slash (though you don't have to use the slash; it's just a dummy object). For example, to select 'c', 'e', and 'f', use << 99 / 101 / 102 / >>

Each line can be at most 100 characters; if your lines are longer replace the 100 string with some larger number. Likewise, replace the 3 string if your x=# entries are longer than three characters. This doesn't work if the x is more than one character, though.

继续阅读：perl sed

Extracting particular column name values using sed/ awk/ perl

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？