开发者

Text Pattern Processing in paragraph with unix linux utilities

I have a file with the following pattern (please note this is 开发者_运维百科a file generated using sed, awk, grep etc processing). The part of file input is as follows.

filename1,
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
EndOfFilefilename1
filename2,
BASE=f/g/h
CONFIG=$BASE/i
propertiesfile1=$CONFIG/j.properties
EndOfFilefilename2
filename3,
BASE=k/l/m
CONFIG=$BASE/n
propertiesfile1=$CONFIG/o.properties
EndOfFilefilename3

I want the output like

filename1,a/b/c/d/e.properties,
filename2,f/g/h/i/j.properties,
filename3, k/l/m/n/o.properties,

I could not find a solution with sed or awk or grep. So I ams tuck. Please do let me know if you know the solution with these unix utilities or any other language, platform.

Regards,

Suhaas


Assuming you generated the original file, and therefore it is safe to execute it as a script:

sed -e 's/^.*,/FILE=&/'                   \
    -e 's/^.*=\$CONFIG/PROPFILE=$CONFIG/' \
    -e 's/^EndOfFile.*/echo $FILE $PROPFILE/' < yourInputFile | sh

This converts each section of your file into the form:

FILE=filename1,
BASE=a/b/c
CONFIG=$BASE/d
PROPFILE=$CONFIG/e.properties
echo $FILE $PROPFILE

... and then sends it into a shell for processing.

Line-by-line explanation:

Line 1: Searches for the lines ending in a comma (the filenames), and sets FILE to the name.
Line 2: Searches for lines that set the properties file, and renames the variable to PROPFILE.
Line 3: Replaces the EndOfFile lines with a command to echo the file name and the properties file, then pipes it into a shell.


This is an excellent use case for structural regular expressions, which have been implemented as a python library, amongst other places. Here's an article which descibes how to emulate SREs in Perl.


And here is an awk script to process that input and generate what you want:

BEGIN {
FS="="
state = 0;
base = "";
config = "";
prop = "";
filename = "";
dbg = 0;
}
/^BASE=/ {
if (dbg) {
    print "BASE";
    print $0;
}
if (state != 1) {
    print "Error base!";
    exit 1;
}
state++;
base = $2;
if (dbg > 1) printf ("BASE = %s\n", base);
}
/^CONFIG=/ {
if (dbg) {
    print "CONFIG";
    print $0;
}
if (state != 2) {
    print "Error config!";
    exit 1;
}
state++;
config = $2;
sub (/\$BASE/, base, config);
if (dbg > 1) printf ("CONFIG = %s\n", config);
}
/^propertiesfile1=/ {
if (dbg) {
    print "PROP";
    print $0;
}
if (state != 3) {
    print "Error pF!";
    exit 1;
}
state++;
prop = $2;
sub (/\$CONFIG/, config, prop);
}
/^EndOfFile/ {
if (dbg) {
    print "EOF";
    print $0;
}
if (state != 4) {
    print "Error EOF!";
    print state;
    exit 1;
}
state = 0;
printf ("%s%s,\n", filename, prop);
}
/,$/{
if (dbg) {
    print "FILENAME";
    print $0;
}
if (state != 0) {
    print "Error filename!";
    print state;
    exit 1;
}
state++;
filename = $1;
}


gawk

gawk -vRS= 'BEGIN{FS="BASE[=]?|CONFIG|\n"}
{
 s=$1 
 for(i=1;i<=NF;i++){
    if($i~/\// ){ s=s $i }
 }
 print s
 s="" 
}' file

output

$ more file
filename1,
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
EndOfFilefilename1

filename2,
BASE=f/g/h
CONFIG=$BASE/i
propertiesfile1=$CONFIG/j.properties
EndOfFilefilename2

filename3,
BASE=k/l/m
CONFIG=$BASE/n
propertiesfile1=$CONFIG/o.properties
EndOfFilefilename3

$ ./shell.sh
filename1,a/b/c/d/e.properties
filename2,f/g/h/i/j.properties
filename3,k/l/m/n/o.properties


A perl script that does what you want would be something like (note this is untested)

while (<>) {

  $base = $1 if (m/BASE=(.+)/);
  $config = $1 if (m/CONFIG=(.+)/);

  if (m/propertiesfile1=(.+)/) {

    $props = $1;
    $props =~ m/\$CONFIG/$config/;    
    $props =~ m/\$BASE/$base/;

    print $ARGV . ", " . $props . "\n";
  }
}

you give the script the filenames as arguments.


Multi-steps but it works!

    cat yourInputFile | egrep ',|\/' | \
    sed -e "s/^.*=//g" -e "s/\$.*\(\/.*\)/\1/g" | \
    awk '{if($0 ~  "properties") print $0; else printf $0}'

The egrep grabs the lines containing a "," or a "/" and so eliminates the last line:

BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties

The sed reduces the output to:

filename1,
a/b/c
/d
/e.properties

The awk portion reassembles the line to:

filename1,a/b/c/d/e.properties
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜