开发者

Where did I go wrong in my regex lookaround?

I'm trying to pull the first paragraph out of Markdown formatted documents:

This is the first paragraph.

This is the second paragraph.

The answer here gives me a solution that matches the first string ending in a double line break.

Perfect, except some of the texts begin with Markdown-style headers:

### This is an h3 header.

This is the first paragraph.

So I need to:

  • Skip any line that begins with one or more # symbols.
  • Match the first string ending in a double line break.

In other words, return 'This is the first paragraph' in both of the examples above.

So far, I've tried many variations on:

"/(?s)(?:(?!\#))((?!(\r?\n){2}).)*+/

But I can't get it to return the proper m开发者_如何学运维atch.

Where did I go wrong in my lookaround?

I'm doing this in PHP (preg_match()), if that makes a difference.

Thanks!


You could try

"/(?sm)^[^#](?:(?!(?:\r\n|\r|\n){2}).)*/"

I enable the multiline option by using (?sm) instead of (?s) and start each check at a new line, which may not be starting with a #. And I used \r\n|\r|\n instead of \r?\n because my testing environment had funny line breaks =)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜