开发者

Javascript regular expression for pretty formatting user text

I am doing a research here to find you the best way to format user text messages.

A sample of what I am trying to achieve:

1) user sends this message:

   Doctor,
I would   like to    have
an appointment tomorrow morning.Please,call me! 

2) my application formats this text outputting this:

Doctor, I would like to have an appointment tomorrow morning. Please, call me!

Notice that:

  • trailing and leading spaces must gone (something like using $.trim())
  • extra spaces between two words must be replaced by one space
  • new lines, break lines, tabs, <br> must be replaced by one space
  • dots and commas must be separated from next word (morning.Please,call-> morning. Please, call)

Here something I have got so far:

 text.repl开发者_如何学JAVAace(/<(.|\n\r)*?>/g, '')
 .replace(/\s/g,' ')
 .replace(/<br>/g,' ')
 .replace(/ +/g,' ');

It would be good to merge all expressions in just one pattern. Is there a shorter way to do it?


In two regexes (jsFiddle demo):

text.replace(/\s+|([.,])(?=\S)/g, '$1 ').replace(/^\s|\s$/g, '')

Breaking it down, it matches either:

  • One or more whitespace characters (linefeed, tab, space)
  • A period or comma that is followed by a non-whitespace character (we use (?= positive lookahead for this)

and replaces it with a single space (ASCII 32), leaving in any matched period or comma as $1. Then any leading or trailing whitespace character is stripped in the second regex. The second regex is necessary because a regex that adds a space to the original string has to have the space in the replacement substring, and we want no spaces at the beginning or end.

If <br> matters, you are best off replacing that to a space character before using the above pair of regexes (.replace(/<br>/g, ' ')), but if you really want to do so in the same regex: (jsFiddle demo)

text.replace(/(?:<br>|\s)+|([.,])(?=\S)(?!<br>)/g, '$1 ').replace(/^\s|\s$/g, '')


Haven't tested it but I believe this is equivalent:

text.replace(/^\s+|\s+$/g, '')
    .replace(/\s+|\s*<br>\s*/g,' ')

EDIT

I didn't understand why the first expression replaced < and > so I left it out.


Maybe, but I'm not sure the benefit of reducing it further. Regular expressions are already somewhat difficult to read, so breaking it up logically can be useful later when you are debugging.


text.replace(/\s/g,' ') replaces any space, line feed, vertical tab, regular tab, and space with a space

.replace(/<\s*br\s*\/*\s*>/g,' ') replaces any <br>, < br/ >, <br /> <br /> <br //> (etc) with a space

.replace(/\s{2,}/g,' ') replaces any double or more spaces space with a single space

.replace(/^\s|\s$/,'') ltrim+rtrim (though because of the alternation, performance wise it could be better to split into seperate ltrim and rtrims depending on string size)

final:

text = text.replace(/\s/g,' ').replace(/<\s*br\s*\/*\s*>/g,' ').replace(/\s{2,}/g,' ').replace(/^\s|\s$/,'');

You can't really do "one pattern" because either of the first two have the potential of leaving two spaces in a row when they are done, so you'd need to always have the last clause.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜