Awk filtering with multiple fields
Suppose that I have the following text file (it can have more states, cities, and colleges:
begin_state
New York
end_state
begin_cities
Albany
Buffalo
Syracuse
end_cities
begin_colleges
Cornell
Columbia
Stony Brook
end_colleges
begin_state
California
end_ state
begin_cities
San Francisco
Sacramento
Los Angeles
end cities
begin_colleges
Berkeley
Stanford
Caltech
end_colleges
I want to use awk to filter all开发者_如何学编程 the cities and list them under the states or select all the colleges and list them under the states: For example--if I want the cities, they should be output as follows.
**New York**
Albany
Buffalo
Syracuse
**California**
San Francisco
Sacramento
Los Angeles
Any suggestions are welcome.
Here are two solutions in awk. The first is naive and repetitive but easier to follow and learn from. The later one is an attempt at reducing the repetition.
Both solutions are fragile with respect to handling errors in your data file. If you are free to choose the implementation language I suggest you do this in something like ruby, perl or python.
Save to a file (e.g. showinfo.sh
) and invoke with a single argument: "cities" or "colleges", to determine the mode. Also you must redirect the data file into stdin.
Example invocation (for either solution):
./showinfo.sh cities < states.txt
./showinfo.sh colleges < states.txt
The naive solution:
#!/bin/bash
set -e
set -u
#mode=cities
mode=$1
awk -v mode=$mode '
/begin_state/ {st="states"; next}
/end_state/ {next}
/begin_cities/ {st="cities"; next}
/end_cities/ {next}
/begin_colleges/ {st="coll"; next}
/end_colleges/ {next}
{
if (st=="states") {
sn=$0;
}
else
if (st=="cities") cities[sn]=cities[sn]"\n"$0
else if (st=="coll") colleges[sn]=colleges[sn]"\n"$0;
}
END {
if (mode=="cities") {
for (sn in cities) { print "=="sn"=="cities[sn] } ;
}
else if (mode=="colleges") {
for (sn in colleges) { print "=="sn"=="colleges[sn] } ;
}
else { print "set mode either cities or colleges" }
}'
Second solution, with repetition removed:
#!/bin/bash
set -e
set -u
mode=$1
awk -v mode=$mode '
/begin_/ {st=$1; next}
/end_/ {st=""; next}
{
if (st=="begin_state") { sn=$0 }
else { data[st, sn]=data[st, sn]"\n"$0 }
}
END {
for (combo in data) {
split(combo, sep, SUBSEP);
type = sep[1];
state_name = sep[2];
if (type == "begin_"mode) {
print "==" state_name "==" data[combo];
}
}
}'
Input file used (as I note it has changed recently in the question):
begin_state
New York
end_state
begin_cities
Albany
Buffalo
Syracuse
end_cities
begin_colleges
Cornell
Columbia
Stony Brook
end_colleges
begin_state
California
end_state
begin_cities
San Francisco
Sacramento
Los Angeles
end_cities
begin_colleges
Berkeley
Stanford
Caltech
end_colleges
Session when running the first solution:
$ bash showinfo.sh cities < states.txt
==New York==
Albany
Buffalo
Syracuse
==California==
San Francisco
Sacramento
Los Angeles
精彩评论