开发者

Character sets of the pattern symbols

Is there a way to get sets of the pattern symbols?

For example, I have a regular expression [az]+[A-Z]*. Then the symbol set of the first symbols is a and z. Then the symbol set of the second symbol is a and z. Then the symbol set of the third symbol is a and z. ....

The task is: I have a pattern and a string. Now I want to know whether the string start with the same characters as one of the string which match to the pattern.

UPDATE:

For example, I have a regular开发者_运维技巧 expression [az]\\:[A-Z]*. Then the symbol set of the first symbols is a and z. Then the symbol set of the second symbol is :. Then the symbol set of the third symbol is A-Z. Then the symbol set of the fourth symbol is A-Z. ....


It sounds like you are asking for a function that takes a regular expression as an argument and returns a set of characters that could match at a given offset into a string to be matched:

Set<Character> getSymbols(String regEx, int offset);

This is non-trivial.

Using your example:

getSymbols("[az]\\:[A-Z]*", 1)

should return ['a', 'z'],

getSymbols("[az]\\:[A-Z]*", 2)

should return [':'],

getSymbols("[az]\\:[A-Z]*", 3)

should return ['A', 'B', 'C', ..... 'Y', 'Z']

But this is a trivial input. What if the input was:

getSymbols("[abc]*FRED[xzy]*", 5)

Now you have to factor in the fact that any number of "abc" characters could proceed FRED, and would shift everything else, leading to a result set like this:

1: ['a', 'b', 'c', 'F']
2: ['a', 'b', 'c', 'F', 'R']
3: ['a', 'b', 'c', 'F', 'R', 'E']
4: ['a', 'b', 'c', 'F', 'R', 'E', 'D']
5: ['a', 'b', 'c', 'x', 'y', 'z', 'F', 'R', 'E', 'D']

The code that solves that has to parse regular expressions, which has a lot of expressiveness with all the escape characters (\w for whitespace, etc. etc.), then needs a recursive algorithm to build the output set.

If this is what you intend, the next question is, "What problem are you really trying to solve?"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜