Regex to match words and those with an apostrophe
Update: As per comments regarding the ambiguity of my question, I've increased the detail in the question.
(Terminology: by words I am refering to any succession of alphanumerical characters.)
I'm looking for a regex to match the following, verbatim:
- Words.
- Words with one apostrophe at the beginning.
- Words with any number of non-contiguous apostrophe throughout the middle开发者_开发百科.
- Words with one apostrophe at the end.
I would like to match the following, however not verbatim, rather, removing the apostrophes:
- Words with an apostrophe at the beginning and at the end would be matched to the word, without the apostrophes. So
'foo'
would be matched tofoo
. - Words with more than one contiguous apostrophe in the middle would be resolved to two different words: the fragment before the contiguous apostrophes and the fragment after the contiguous apostrophes. So,
foo''bar
would be matched tofoo
andbar
. - Words with more than one contiguous apostrophe at the beginning or at the end would be matched to the word, without the apostrophes. So,
''foo
would be matched tofoo
and''foo''
tofoo
.
Examples These would be matched verbatim:
'bout
it's
persons'
But these would be ignored:
'
''
And, for 'open'
, open
would be matched.
Try using this:
(?=.*\w)^(\w|')+$
'bout # pass
it's # pass
persons' # pass
' # fail
'' # fail
Regex Explanation
NODE EXPLANATION
(?= look ahead to see if there is:
.* any character except \n (0 or more times
(matching the most amount possible))
\w word characters (a-z, A-Z, 0-9, _)
) end of look-ahead
^ the beginning of the string
( group and capture to \1 (1 or more times
(matching the most amount possible)):
\w word characters (a-z, A-Z, 0-9, _)
| OR
' '\''
)+ end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
$ before an optional \n, and the end of the
string
/('\w+)|(\w+'\w+)|(\w+')|(\w+)/
- '\w+ Matches a ' followed by one or more alpha characters, OR
- \w+'\w+ Matche sone or more alpha characters followed by a ' followed by one or more alpha characters, OR
- \w+' Matches one or more alpha characters followed by a '
- \w+ Matches one or more alpha characters
How about this?
'?\b[0-9A-Za-z']+\b'?
EDIT: the previous version doesn't include apostrophes on the sides.
I submitted this 2nd answer coz it looks like the question has changed quite a bit and my previous answer is no longer valid. Anyway, if all conditions are listed up, try this:
(((?<!')')?\b[0-9A-Za-z]+\b('(?!'))?|\b[0-9A-Za-z]+('[0-9A-Za-z]+)*\b)
This works fine
('*)(?:'')*('?(?:\w+'?)+\w+('\b|'?[^']))(\1)
on this data no problem
'bou
it's
persons'
'open'
open
foo''bar
''foo
bee''
''foo''
'
''
on this data you should strip result (remove spaces from matches)
'bou it's persons' 'open' open foo''bar ''foo ''foo'' ' ''
(tested in The Regulator, results in $2)
精彩评论