开发者

How to find a Word that is enclosed from Html Tags?

I'm programming a spell checker in Javascript in combination with OpenOffice dictionary, and I have a serious problem.

I can find whole words using RegEx, but if the word looks like prog<b>ram</b>ing, I can find it if I remove all html tags with the .text() method from jQuery. But how can I replace this word and rebuild the original html structure?

Sp开发者_开发问答ellchecker.com does it very smartly - the spell check recognizes even words like prog<b>ram</b>ing if they are misspelled!


/([\s>"'])prog(<[^>]+>)ram(<[^>]+>)ing([\s\.,:;"'<])/g 

will match your example

So roughly the following regex will find all instances of the word, even those broken with html

 var regExp = new RegExp('([\s>"\'])' + word.split('').join('(<[^>]+>)') + '([\s\.,:;"\'<])',g);

God knows how that'll help you build a spellchecker though. I suspect the approach used in spellcheckers would be more like 'do a spellcheck assuming no html, and if there is html in a word then strip it out using something like the method below, and do a spellcheck as normal for the string you get:

String.prototype.stripHtml = function() {
  return this.replace(/(<[^>]+>)/, '');
}


I would use something to pull out any HTML so that you are dealing with plaintext. I cannot speak for any tools like this in javascript but I'm sure they exists. If you can find something to 'scrub' the html out of your .text() you can run a search this way.

Try something like this: http://metacpan.org/pod/HTML::Scrubber

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜