Highlighting long sentences using jQuery

2022-12-21 03:28 问答作者：

I'd like to highlight long sentences (say, 50 words or greater) contained in an array of paragraph objects on a page, ie $("#content p"). I'm not sure how开发者_JS百科 to tackle this.

I originally tried to highlight all sentences, but ran in trouble when they contained HTML tags (example highlighting code on the net seem to be for individual words only, so they don't take child nodes into account). I'm aware that splitting sentences is difficult; I'd like to use .!? followed either by a space then a capital letter or nothing at all (ie the end of the paragraph).

Thanks in advance for any help/advice.

As you've said it's gonna be tricky to get right, given the fact you;re not going to catch them all, I'd stick with something simple like:

var regex = \[^.!?]{50,}[.!?]\;

Getting too clever and you will end up spending more time coding for edge cases than I guess you would reasonably want to.

I'm not sure the best thing to do is to do this on the client side. I would consider sending the paragraphs back to the server to do the work. But the work should be the same either way.

First take all the content of a paragraph make sure to get it all it could be in a few nodes in the DOM. (Read This) Then you will need to make a parser that looks for your split characters while still ignoring them while they are in HTML entities.

As an example the . in a href attribute should be ignored and not split. While doing the parsing you can keep a word count as well breaking working on the spaces. Make each sentence an object that contains the whole sentence and the word count. So you can push those objects into an array that represents the paragraph. Once done you can then iterate through the array and wrap any sentence in a span for highlighting with CSS if the word count reaches your threshold.

The major problem is Tags that may be parts of two sentences such as the following.

I'm typing <b> in bold. NOW!</b>

what I've talked about doing doesn't deal with that but you could make the parser more complex later to support that.

So a quick overview of my rambling parse through all the characters with a state machine that deals with counting words and splitting in the correct spot. On split add the data you collected to an array. When done iterate through the array outputting the newly wrapped sentences.

This is probably a rather slow solution, and ugly too, but it should be pretty simple to code:

Read all the text into a string, and then parse through it, counting characters and finding every .!?-character. In the parsing loop, you also look for < and >, where < means "ignore all .!? until finding another >". Then every time you find a .!?-character, you check the length since the last one, and if it's long enough you save the index for starting- and end-point into an array or something.

When the whole thing is done, make another loop, that moves substrings from the first string into a new string, prepending every "long sentence" with a highlight-tag, and appending an end-highlight-tag to the end of it, before moving on.

When finished, put the new string back where you got it from...

To do this you need get the HTML of each paragraph (node.html()) and then replace all of the HTML tags with the same number of spaces. This should be fairly straightfoward and as you can just look for opening angled brackets and the first closing bracket. You need to do this firstly to prevent any full stops and words inside the tag from confusing the rest of the algorithm, but also to prevent a tag itself being counted as a word.

Split the text based on a full stop followed by nothing or any amount of whitespace to get your sentences. You need to perform this split manually using a matching regular expression so you can keep track of the start and end positions of the sentence in the original string.

Next split each sentence on whitespace and remove any 'words' from the array which just consist of whitespace. This gives you the length of the sentence. If it's over your limit then insert the appropriate HTML at the start and end positions of the sentence in your original HTML string. You'll need to keep track of how much extra HTML you've added so you can find find the right start and end positions of subsequent long sentences.

继续阅读：jquery paragraph regex split text-segmentation

Highlighting long sentences using jQuery

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？