Parsing files "/etc/default" using java
I'm trying to parse the configuration files usually found in /etc/default
using java and regular expressions. So far this is the code I have iterating over every line on each fi开发者_StackOverflow社区le:
// remove comments from the line
int hash = line.indexOf("#");
if (hash >= 0) {
line = line.substring(0, hash);
}
// create the patterns
Pattern doubleQuotePattern = Pattern.compile("\\s*([a-zA-Z_][a-zA-Z_0-9]*)\\s*=\\s*\"(.*)\"\\s*");
Pattern singleQuotePattern = Pattern.compile("\\s*([a-zA-Z_][a-zA-Z_0-9]*)\\s*=\\s*\\'(.*)\\'\\s*");
Pattern noQuotePattern = Pattern.compile("\\s*([a-zA-Z_][a-zA-Z_0-9]*)\\s*=(.*)");
// try to match each of the patterns to the line
Matcher matcher = doubleQuotePattern.matcher(line);
if (matcher.matches()) {
System.out.println(matcher.group(1) + " == " + matcher.group(2));
} else {
matcher = singleQuotePattern.matcher(line);
if (matcher.matches()) {
System.out.println(matcher.group(1) + " == " + matcher.group(2));
} else {
matcher = noQuotePattern.matcher(line);
if (matcher.matches()) {
System.out.println(matcher.group(1) + " == " + matcher.group(2));
}
}
}
This works as I expect but I'm pretty sure that I can make this way smaller by using better regular expression but I haven't had any luck. Anyone know of a better way to read these types of files?
You can use antlr to generate a parser. Basically you write a grammar for the language you want to work with (or use one of the many grammars already written and antlr will generate a parser for you.
Here is a single Pattern you can use that is equivalent to the three you have above:
Pattern etcPattern = Pattern.compile(
"\\s*([a-zA-Z_]\\w*)\\s*=\\s*"+
"(\"|'|.{0,0})(.*?)\\2"+ //QUOTE MATCHING
"\\s*");
There are three differences between this and yours: first I replaced the expression [a-zA-Z0-9_] with the its predefined character class \w (a word character). The second part (QUOTE MATCHING) is a pattern that will match and strip outer balanced quotes, but also allow unbalanced quotes as your three patterns did.
It begins by using the pattern (\"|'|.{0,0}). This is
- A double quote
- A single quote
- Anything zero times
Then your .* pattern followed by a backreference \2. The backreference says to match what was matched by pattern 2 (the quote pattern). This is where the third case above is important. If the value does not begin with a single or double quote, it needs to be able to ignore it. So it begins by attempting to match one of the quotes. If it can't then it will match the empty string, which in turn allows the backreference to match the empty string.
The final change to make it work is to change the internal .* pattern to be reluctant (to .*?) so that it will allow the quotes to be matched by the back reference if possible and be stripped.
So you should be able to run this as:
Matcher matcher = etcPattern.matcher(line);
if (matcher.matches()) {
System.out.println(matcher.group(1) + " == " + matcher.group(3));
}
equivalently to your example above (note the value is in match group 3 now instead of two. As I said this matched what your patterns did, specifically it will allow unbalanced quotes, and allow any internal quoting to the value.
In many cases you can use java.util.Properties
to process shell configuration files.
Actually, if you don't make these files overly complex you can share them this way between shell scripts and java programs.
Things that do not process really well are the quoted strings.
精彩评论