Differentiating between slashes in a string using a regular expression
A program that I'm writing (in Java) gets input data made up of three kinds of parts, separated by a slash /
. The parts can be one of the following:
- A name matching the regular expression
\w*
- A call matching the expression
\w*\(.*\)
- A path matching the expression
<.*>|\".*\"
. A path can contain slashes.
An example string could look like this:
bar/foo()/foo(bar)/<foo/bar>/bar/"foo/bar"/foo()
which has the following structure
name/call/call/path/name/path/call
I want to split this string into parts, and I'm trying to do this using a regular expression. My current expression captures slashes after calls and paths, but I'm having trouble getting it to capture slashes after names without also including slashes that may exist within paths. My current expression, just capturing slashes after paths and calls looks like this:
(?<=[\)>\"])/
How can I expand this expression to also capture开发者_高级运维 slashes after names without including slashes within paths?
(\w+|\w+\([^/]*\)(?:/\w+\([^/]*\))*|<[^>]*>|"[^"]*")(?=/|$)
captures this from the string 'bar/foo()/foo(bar)/<foo/bar>/bar/"foo/bar"/foo()'
'bar'
'foo()/foo(bar)'
'<foo/bar>'
'bar'
'"foo/bar"'
'foo()'
It does not capture the separating slashes, though (what for? - just assume they are there).
The simpler (\w+|\w+\([^/]*\)|<[^>]*>|"[^"]*")(?=/|$)
would capture calls separately:
"foo()"
"foo(bar)"
EDIT: Usually, I do a regex breakdown:
( # begin group 1 (for alternation) \w+ # at least one word character | # or... \w+ # at least one word character \( # a literal "(" [^/]* # anything but a "/", as often as possible \) # a literal ")" | # or... < # a "<" [^>]* # anything but a ">", as often as possible > # a ">" | # or... " # a '"' [^"]* # anything but a '"', as often as possible " # a '"' ) # end group 1 (?=/|$) # look-ahead: ...followed by a slash or the end of string
My first thought was to match slashes with an even number of quotes to the left of it. (I.e., having a positive look behind of something like (".*")*
but this ends up in an exception saying
Look-behind group does not have an obvious maximum length
Honestly I think you'd be better of with a Matcher
, using an or:ed together version of your components, (something like \w*|\w*\(.*\)|(<.*>|\".*\")
) and do while (matcher.find())
.
Having your deliminator for your string not escaped when used inside your input might not be the best choice. However, you do have the luxury of the "false" slash being inside a regular pattern. What I suggest...
- Split the whole string on "/"
- Parse each part until you get to the start of the path
- Put the path elements into a list until the end of the path
- Rejoin the path back on "/"
I highly recommend you consider escaping the "/" in your paths to make your life easier.
This pattern captures all parts of your example string separately without including the delimiter into the results:
\w+\(.*?\)|<.*>|\".*\"|\w+
精彩评论