开发者

Regex to match words and those with an apostrophe

Update: As per comments regarding the ambiguity of my question, I've increased the detail in the question.

(Terminology: by words I am refering to any succession of alphanumerical characters.)

I'm looking for a regex to match the following, verbatim:

  • Words.
  • Words with one apostrophe at the beginning.
  • Words with any number of non-contiguous apostrophe throughout the middle开发者_开发百科.
  • Words with one apostrophe at the end.

I would like to match the following, however not verbatim, rather, removing the apostrophes:

  • Words with an apostrophe at the beginning and at the end would be matched to the word, without the apostrophes. So 'foo' would be matched to foo.
  • Words with more than one contiguous apostrophe in the middle would be resolved to two different words: the fragment before the contiguous apostrophes and the fragment after the contiguous apostrophes. So, foo''bar would be matched to foo and bar.
  • Words with more than one contiguous apostrophe at the beginning or at the end would be matched to the word, without the apostrophes. So, ''foo would be matched to foo and ''foo'' to foo.

Examples These would be matched verbatim:

  • 'bout
  • it's
  • persons'

But these would be ignored:

  • '
  • ''

And, for 'open', open would be matched.


Try using this:

(?=.*\w)^(\w|')+$

'bout     # pass
it's      # pass
persons'  # pass
'         # fail
''        # fail

Regex Explanation

NODE      EXPLANATION
  (?=       look ahead to see if there is:
    .*        any character except \n (0 or more times
              (matching the most amount possible))
    \w        word characters (a-z, A-Z, 0-9, _)
  )         end of look-ahead
  ^         the beginning of the string
  (         group and capture to \1 (1 or more times
            (matching the most amount possible)):
    \w        word characters (a-z, A-Z, 0-9, _)
   |         OR
    '         '\''
  )+        end of \1 (NOTE: because you're using a
            quantifier on this capture, only the LAST
            repetition of the captured pattern will be
            stored in \1)
  $         before an optional \n, and the end of the
            string


/('\w+)|(\w+'\w+)|(\w+')|(\w+)/
  • '\w+ Matches a ' followed by one or more alpha characters, OR
  • \w+'\w+ Matche sone or more alpha characters followed by a ' followed by one or more alpha characters, OR
  • \w+' Matches one or more alpha characters followed by a '
  • \w+ Matches one or more alpha characters


How about this?

'?\b[0-9A-Za-z']+\b'?

EDIT: the previous version doesn't include apostrophes on the sides.


I submitted this 2nd answer coz it looks like the question has changed quite a bit and my previous answer is no longer valid. Anyway, if all conditions are listed up, try this:

(((?<!')')?\b[0-9A-Za-z]+\b('(?!'))?|\b[0-9A-Za-z]+('[0-9A-Za-z]+)*\b)


This works fine

 ('*)(?:'')*('?(?:\w+'?)+\w+('\b|'?[^']))(\1)

on this data no problem

    'bou
    it's
    persons'
    'open'
    open
    foo''bar
    ''foo
    bee''
    ''foo''
    '
    ''

on this data you should strip result (remove spaces from matches)

    'bou it's persons' 'open' open foo''bar ''foo ''foo'' ' ''

(tested in The Regulator, results in $2)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜