awk - brackets checking
i'm looking for script in awk which will check if it has proper bracket placing. used brackets开发者_StackOverflow中文版 are {} [] and () every bracket should be closed, and brackets can't be mixed, illegal example: ( [ ) ]
If what you are trying to do applies to a general purpose language, then this is a non-trivial problem.
To start with you will have to worry about comments and strings. If you want to check this on a programming language that uses regular expressions, this will make your quest harder again.
So before I can come in and give you any advice on your question I need to know the limits of your problem area. If you can guarantee that there are no strings, no comments and no regular expressions to worry about - or more generically nowhere in the code that brackets can possibly be used other than for the uses for which you are checking that they are balanced - this will make life a lot simpler.
Knowing the language that you want to check would be helpful.
If I take the hypothesis that there is no noise, i.e. that all brackets are useful brackets, my strategy would be iterative:
I would simply look for and remove all inner bracket pairs: those that contain no brackets inside. This is best done by collapsing all lines to a single long line (and find a mechanism to to add line references, should you need to get that information out). In this case the search and replace is pretty simple:
It requires an array:
B["("]=")"; B["["]="]"; B["{"]="}"
And a loop through those elements:
for (b in B) {gsub("[" b "][^][(){}]*[" B[b] "]", "", $0)}
My test file is as follows:
#!/bin/awk
($1 == "PID") {
fo (i=1; i<NF; i++)
{
F[$i] = i
}
}
($1 + 0) > 0 {
count("VIRT")
count("RES")
count("SHR")
count("%MEM")
}
END {
pintf "VIRT=\t%12d\nRES=\t%12d\nSHR=\t%12d\n%%MEM=\t%5.1f%%\n", C["VIRT"], C["RES"], C["SHR"], C["%MEM"]
}
function count(c[)
{
f=F[c];
if ($f ~ /m$/)
{
$f = ($f+0) * 1024
}
C[c]+=($f+0)
}
My full script (without line referencing) is as follows:
cat test-file-for-brackets.txt | \
tr -d '\r\n' | \
awk \
'
BEGIN {
B["("]=")";
B["["]="]";
B["{"]="}"
}
{
m=1;
while(m>0)
{
m=0;
for (b in B)
{
m+=gsub("[" b "][^][(){}]*[" B[b] "]", "", $0)
}
};
print
}
'
The output of that script stops on the innermost illegal uses of brackets. But beware: 1/ this script will not work with brackets in comments, regular expressions or strings, 2/ it does not report where in the original file the problem is located, 3/ although it will remove all balanced pairs it stops at the innermost error conditions and keeps all englobbing brackets.
Point 3/ is probably an exploitable result, though I'm not sure of the reporting mechanism you had in mind.
Point 2/ is relatively easy to implement but takes more than a few minutes work to produce, so I'll leave it up to you to figure out.
Point 1/ is the tricky one because you enter a whole new realm of competing sometimes nested beginnings and endings, or special quoting rules for special characters...
You'll have to read the file character-by-character. Build a stack of open brackets seen. When you see a close bracket, you can either pop the matching open bracket off the stack, or record an error that the brackets don't match.
awk's not the right tool for this job. I'd use a general purpose scripting language (Perl/Tcl/etc).
精彩评论