Regular Expression Longest Possible Matching
I have an input string which 开发者_C百科is a directory address:
Example: ProgramFiles/Micro/Telephone
And I want to match it against a list of words very strictly:
Example: Tel|Tele|Telephone
I want to match against Telephone
and not Tel
. Right now my regex looks like this:
my( $output ) = ( $input =~ m/($list)/o );
The regex above will match against Tel
. What can I do to fix it?
If you want a whole word match:
\b(Tel|Tele|Telephone)\b
\b
is a zero-width word boundary. Word boundary in this case means the transition from or to a word character. A word character (\w
) is [0-9a-zA-Z_]
.
If you simply want to match against the longest in a partial word match put the longest first. For example:
\b(Telephone|Tele|Tel)
or
(Telephone|Tele|Tel)
Change the orders: Tel|Tele|Telephone
to Telephone|Tele|Tel
.
By the regexp algorithm, alternation is searched from left-to-right, if there found a match, that's it, no greedy matching.
For example: /a|ab|abc/ working on "abc" matches "a" instead of the most greedy "abc".
or use the matching expressions.
Tel(?:e(?:phone)?)?
How about trying to find a match, as long as the longest match is not anywhere in the input? Something like:
Find telephone, OR find tel, and tele where telephone is not anywhere in the input. So, to start making it look like a regex:
(telephone) OR characters without telephone, followed by (tel|tele) followed by characters without telephone
(telephone|.*(telephone){0}.*(tel|tele).*(telephone){0}.*)
Does that make any sense?
精彩评论