Regex to figure out a complex string
I am trying to parse some text files into a database and there is a string that includes 2 pieces of information in it.
There are a few options for what the string can look like.
It can either look like a single word Word
or it can have that first word, followed by a dash, followed by any number of other words like Word - Seco开发者_C百科nd
.
The key though, is that IF the string ends in a number like Word - Second 4
or two numbers separated by a slash like Word - Second 2/3
then those numbers need to be put into a different variable.
I do NOT know enough about regex to do this one. Help? (with explanations?)
I think you might be looking for something like this:
^([a-zA-Z]+(?: *- *[a-zA-Z]+(?: +[a-zA-Z]+)*)?)(?: +(\d+(?:\/\d+)?))?$
Explanation:
^ Start of line ( First capturing group (for the words) [a-zA-Z]+ A word (?:...)? (Omitted for clarity) ) Close first group (?: Start non-capturing group \s+ Some whitespace ( Second capturing group (for the numbers) \d+ A number (?:\/\d+)? Optionally a slash followed by another number ) Close capturing group )? Close optional non-capturing group $ End of line
I omitted an explanation of this part above: (?: *- *[a-zA-Z]+(?: +[a-zA-Z]+)*)?
. It matches a dash followed by one or more space separated words. I also wrote \s
in the explanation instead of because the space is invisible. But
\s
matches any whitespace, including new lines. You may prefer to match only spaces.
Rubular
精彩评论