开发者

Regex to figure out a complex string

I am trying to parse some text files into a database and there is a string that includes 2 pieces of information in it. There are a few options for what the string can look like. It can either look like a single word Word or it can have that first word, followed by a dash, followed by any number of other words like Word - Seco开发者_C百科nd. The key though, is that IF the string ends in a number like Word - Second 4 or two numbers separated by a slash like Word - Second 2/3 then those numbers need to be put into a different variable.

I do NOT know enough about regex to do this one. Help? (with explanations?)


I think you might be looking for something like this:

^([a-zA-Z]+(?: *- *[a-zA-Z]+(?: +[a-zA-Z]+)*)?)(?: +(\d+(?:\/\d+)?))?$

Explanation:

^               Start of line
(               First capturing group (for the words)
  [a-zA-Z]+     A word
  (?:...)?      (Omitted for clarity)
)               Close first group
(?:             Start non-capturing group
  \s+           Some whitespace
  (             Second capturing group (for the numbers)
    \d+         A number
    (?:\/\d+)?  Optionally a slash followed by another number
  )             Close capturing group
)?              Close optional non-capturing group
$               End of line

I omitted an explanation of this part above: (?: *- *[a-zA-Z]+(?: +[a-zA-Z]+)*)?. It matches a dash followed by one or more space separated words. I also wrote \s in the explanation instead of because the space is invisible. But \s matches any whitespace, including new lines. You may prefer to match only spaces.

Rubular

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜