Break up a string by words an punctionation
To split up a string,开发者_如何转开发 I come up with...
<php
preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
print_r($matches[0]);
I thought this would separate each word (\w) and the specified punctuation (,.!?;).
For example: ["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]
Instead I get:
Array
(
[0] => I
[1] => m
[2] => a
[3] => l
[4] => i
[5] => t
[6] => t
[7] => l
[8] => e
[9] => t
[10] => e
[11] => a
[12] => p
[13] => o
etc...
What am I doing wrong here?
Thanks in advance.
You have two faults:
- The
\w
matches only a single character. You want to match multiple by\w+
. Furthermore\w
matches only alphanumeric characters. If you want to match other characters like'
you will need to include them:[\w']
. - The
(,.!?;)
matches the character sequence,.!?;
. Instead you want to match any of these characters using[,.!?;]
.
The correct regex is:
'/[\w\']+|[,.!?;]/'
If you want to be more permissive you should use unicode character classes instead (allows letters, numbers, combining marks, dash characters and the apostrophe for words and punctuation for punctuation):
'/[\pL\pN\pM\pPd\']+|\pP/u'
Try this - sure it works as you want:
([\w]+)|[,.!?;]+
Also want to share with you one very useful service - online regex tester
You may want to try something like:
/([^,.!?; ]+)|(,.!?;)/
精彩评论