Character sets of the pattern symbols
Is there a way to get sets of the pattern symbols?
For example, I have a regular expression [az]+[A-Z]*
. Then the symbol set of the first symbols is a
and z
. Then the symbol set of the second symbol is a
and z
. Then the symbol set of the third symbol is a
and z
. ....
The task is: I have a pattern and a string. Now I want to know whether the string start with the same characters as one of the string which match to the pattern.
UPDATE:
For example, I have a regular开发者_运维技巧 expression [az]\\:[A-Z]*
. Then the symbol set of the first symbols is a
and z
. Then the symbol set of the second symbol is :
. Then the symbol set of the third symbol is A-Z
. Then the symbol set of the fourth symbol is A-Z
. ....
It sounds like you are asking for a function that takes a regular expression as an argument and returns a set of characters that could match at a given offset into a string to be matched:
Set<Character> getSymbols(String regEx, int offset);
This is non-trivial.
Using your example:
getSymbols("[az]\\:[A-Z]*", 1)
should return ['a', 'z'],
getSymbols("[az]\\:[A-Z]*", 2)
should return [':'],
getSymbols("[az]\\:[A-Z]*", 3)
should return ['A', 'B', 'C', ..... 'Y', 'Z']
But this is a trivial input. What if the input was:
getSymbols("[abc]*FRED[xzy]*", 5)
Now you have to factor in the fact that any number of "abc" characters could proceed FRED, and would shift everything else, leading to a result set like this:
1: ['a', 'b', 'c', 'F']
2: ['a', 'b', 'c', 'F', 'R']
3: ['a', 'b', 'c', 'F', 'R', 'E']
4: ['a', 'b', 'c', 'F', 'R', 'E', 'D']
5: ['a', 'b', 'c', 'x', 'y', 'z', 'F', 'R', 'E', 'D']
The code that solves that has to parse regular expressions, which has a lot of expressiveness with all the escape characters (\w for whitespace, etc. etc.), then needs a recursive algorithm to build the output set.
If this is what you intend, the next question is, "What problem are you really trying to solve?"
精彩评论