开发者

splitting text with javascript match

for below code

var str = "I left the United States with my eyes full of tears! I knew I would miss my American friends very much.All the best to you";
var re = new RegExp("[^\.\?!]*(?:[\.\?!]+|\s$)", "g");
var myArray = str.match(re);

and This is what I am getting as a result

myArray[0] = "I left the United States with my eyes full of tears!"
myArray[1] = " I knew I would miss my American friends very much."

I want to add one more condition to regex such that the text will break only if there is a s开发者_如何学JAVApace after the the punctuation mark (? or . or !)

I do it do that so the result for above case is

myArray[0] = "I left the United States with my eyes full of tears!"
myArray[1] = " I knew I would miss my American friends very much.All the best to you "
myArray[2] = ""


var str = "I left the United States with my eyes full of tears! I knew I would miss my American friends very much.All the best to you";

var re =/[^\.\?!]+[\.?!]( +|[^\.\?!]+)/g;
var myArray = str.match(re);
myArray.join('\n')

/*  returned value: (String)
I left the United States with my eyes full of tears! 
I knew I would miss my American friends very much.All the best to you
*/


.+?([!?.](?= |$)|$)

should work.

It will match any sequence of characters that are either

  • followed by a punctuation sign that is itself followed by a space or end-of-string, or
  • followed by the end of the string.

By using the reluctant quantifier +?, it finds the shortest possible sequences (=single sentences).

In JavaScript:

result = subject.match(/.+?([!?.](?= |$)|$)/g);

EDIT:

In order to avoid the regex splitting on "space/single letter or multidigit number/dot", you can use:

result = subject.match(/( \d+\.| [^\W\d_]\.|.)+?([!?.](?= |$)|$)/g);

This will split

I left the United States with my eyes full of tears! 23. I knew I would miss my American friends very much. I. All the best to you.

into

I left the United States with my eyes full of tears!
 23. I knew I would miss my American friends very much.
 I. All the best to you.

What it does is instead of simply matching any character until it finds a dot is:

  • First try to match a space, a number, and a dot.
  • If that fails, try to match a space, a letter, and a dot.
  • If that fails, match any character.

That way, the dot after a number/letter has already been matched and will not be matched as the delimiting punctuation character that follows next in the regex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜