Text Pattern Processing in paragraph with unix linux utilities
I have a file with the following pattern (please note this is 开发者_运维百科a file generated using sed, awk, grep etc processing). The part of file input is as follows.
filename1, BASE=a/b/c CONFIG=$BASE/d propertiesfile1=$CONFIG/e.properties EndOfFilefilename1
filename2, BASE=f/g/h CONFIG=$BASE/i propertiesfile1=$CONFIG/j.properties EndOfFilefilename2
filename3, BASE=k/l/m CONFIG=$BASE/n propertiesfile1=$CONFIG/o.properties EndOfFilefilename3
I want the output like
filename1,a/b/c/d/e.properties, filename2,f/g/h/i/j.properties, filename3, k/l/m/n/o.properties,
I could not find a solution with sed
or awk
or grep
. So I ams tuck. Please do let me know if you know the solution with these unix utilities or any other language, platform.
Regards,
Suhaas
Assuming you generated the original file, and therefore it is safe to execute it as a script:
sed -e 's/^.*,/FILE=&/' \
-e 's/^.*=\$CONFIG/PROPFILE=$CONFIG/' \
-e 's/^EndOfFile.*/echo $FILE $PROPFILE/' < yourInputFile | sh
This converts each section of your file into the form:
FILE=filename1,
BASE=a/b/c
CONFIG=$BASE/d
PROPFILE=$CONFIG/e.properties
echo $FILE $PROPFILE
... and then sends it into a shell for processing.
Line-by-line explanation:
Line 1: Searches for the lines ending in a comma (the filenames), and sets FILE
to the name.
Line 2: Searches for lines that set the properties file, and renames the variable to PROPFILE.
Line 3: Replaces the EndOfFile lines with a command to echo the file name and the properties file, then pipes it into a shell.
This is an excellent use case for structural regular expressions, which have been implemented as a python library, amongst other places. Here's an article which descibes how to emulate SREs in Perl.
And here is an awk script to process that input and generate what you want:
BEGIN {
FS="="
state = 0;
base = "";
config = "";
prop = "";
filename = "";
dbg = 0;
}
/^BASE=/ {
if (dbg) {
print "BASE";
print $0;
}
if (state != 1) {
print "Error base!";
exit 1;
}
state++;
base = $2;
if (dbg > 1) printf ("BASE = %s\n", base);
}
/^CONFIG=/ {
if (dbg) {
print "CONFIG";
print $0;
}
if (state != 2) {
print "Error config!";
exit 1;
}
state++;
config = $2;
sub (/\$BASE/, base, config);
if (dbg > 1) printf ("CONFIG = %s\n", config);
}
/^propertiesfile1=/ {
if (dbg) {
print "PROP";
print $0;
}
if (state != 3) {
print "Error pF!";
exit 1;
}
state++;
prop = $2;
sub (/\$CONFIG/, config, prop);
}
/^EndOfFile/ {
if (dbg) {
print "EOF";
print $0;
}
if (state != 4) {
print "Error EOF!";
print state;
exit 1;
}
state = 0;
printf ("%s%s,\n", filename, prop);
}
/,$/{
if (dbg) {
print "FILENAME";
print $0;
}
if (state != 0) {
print "Error filename!";
print state;
exit 1;
}
state++;
filename = $1;
}
gawk
gawk -vRS= 'BEGIN{FS="BASE[=]?|CONFIG|\n"}
{
s=$1
for(i=1;i<=NF;i++){
if($i~/\// ){ s=s $i }
}
print s
s=""
}' file
output
$ more file
filename1,
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
EndOfFilefilename1
filename2,
BASE=f/g/h
CONFIG=$BASE/i
propertiesfile1=$CONFIG/j.properties
EndOfFilefilename2
filename3,
BASE=k/l/m
CONFIG=$BASE/n
propertiesfile1=$CONFIG/o.properties
EndOfFilefilename3
$ ./shell.sh
filename1,a/b/c/d/e.properties
filename2,f/g/h/i/j.properties
filename3,k/l/m/n/o.properties
A perl script that does what you want would be something like (note this is untested)
while (<>) {
$base = $1 if (m/BASE=(.+)/);
$config = $1 if (m/CONFIG=(.+)/);
if (m/propertiesfile1=(.+)/) {
$props = $1;
$props =~ m/\$CONFIG/$config/;
$props =~ m/\$BASE/$base/;
print $ARGV . ", " . $props . "\n";
}
}
you give the script the filenames as arguments.
Multi-steps but it works!
cat yourInputFile | egrep ',|\/' | \
sed -e "s/^.*=//g" -e "s/\$.*\(\/.*\)/\1/g" | \
awk '{if($0 ~ "properties") print $0; else printf $0}'
The egrep grabs the lines containing a "," or a "/" and so eliminates the last line:
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
The sed reduces the output to:
filename1,
a/b/c
/d
/e.properties
The awk portion reassembles the line to:
filename1,a/b/c/d/e.properties
精彩评论