开发者

Regex for everything *after* the first complete sentence (period and space) *after* N characters

I'd like to get smarter excerpts of sections of text. As I'll be using Movable Type's regex_replace function, I'm gonna be trying to grab everything after the first few sentences.

While \..* gets everything after the first period, that often leaves a too-short excerpt. How might I do the s开发者_Go百科ame thing (everything after the first period) but skipping the first 100 characters?

Alternatively, how would I just grab everything after, say, the second or third period?


Not familiar with regex_replace, I'll use the PHP preg_replace function and you can adapt accordingly:

$truncated = preg_replace('/^(.{100}.*?\.).*$/s', '$1', $long);

Edit: I don't know what's up with the syntax highlighting on output treating the entire thing as a string, it looks fine in the preview.

And another version, which will try to be smart about not breaking up numbers with a decimal point (or other places a period might occur somewhere other than the end of a sentence):

$truncated = preg_replace('/^(.{100}.*?\.(?![a-z0-9])).*$/s', '$1', $long);

Explanation:

  1. The part you want to keep is grouped with parentheses.
  2. You'll keep at least 100 characters: .{100}
  3. You'll then keep any following characters up to the first decimal point: .*?\.
  4. In the second version, I used a negative lookahead—(?![a-z0-9])—which will cause the last part to continue on to the next decimal place if the period character is followed by either a number or letter.
  5. Dot matches new-line (the s modifier at the end of the pattern). If Movable Type's regex_replace function takes a pattern without delimiters (the leading slash and the trailing /s in my pattern), you can use (?s) at the beginning of the pattern instead.
  6. Use $1 in the replacement to keep the first captured group.


Complete sentence is vague, since different languages have different ways of encoding end-of-sentence. Let's assume that a space after a period is EOS: /^.*?\.\s+(?:.{N})(.*)/ Replace N by desired number.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜