Parsing a custom string-generating-pattern syntax
Background: I'm developing a custom regex-like syntax for URL filenames. It will work like this:
- User writes a pattern, something like
"[a-z][0-9]{0,2}"
, and passes it as input - It is parsed by the program and translated into the set of permutations it represents i.e.
'a'
,'a0'
,'a00'
...'z99'
These patterns will vary in complexity, basically anything that could appear in a URL f开发者_开发知识库ilename must be accommodated. The language is either Java or PHP, but examples in any language or abstract/conceptual help is more than welcome.
My questions are:
- Where to start with the implementation of a "parser" for the above
and less importantly,
- How to translate parsed complex patterns into strings programmatically
There is a good answer for this here: SO: /generate-all-permutations-of-text-from-a-regex-pattern-in-c
The crux of the thing is this...define what you really need well and figure out a way to halt once you have what you need and narrow your search range as much as possible because you are flirting with a quickly exploding number of permutations. "anything that could appear in a URL filename must be accommodated." is not going to cut it. For example, if you limit yourself to English characters and numbers, for a string 6 characters long you are looking at over 2 billion combinations. For each additional character multiply by 36.
If you go with ISO 8859 you get over 274 trillion combinations and Unicode over 745 trillion-trillion combinations.
精彩评论