Regex match for text
I am tring to create开发者_JAVA百科 a regex to match the content between numbered lists, e.g. with the following content:
1) Text for part 1 2) Text for part 2 3) Text for part 3
The following PCRE should work, assuming you haven't got any thing formatted like "1)" or the like inside of the sections:
\d+\)\s*(.*?)\s*(?=\d+\)|$)
Explanation:
\d+\)
gives a number followed by a)
.\s*
matches the preceding whitespace.(.*?)
captures the contents non-greedily.\s*
matches the trailing whitespace.(?=\d+\)|$)
ensures that the match is followed by either the start of a new section or the end of the text.
Note, it doesn't enforce that they must be ascending or anything like that, so it'd match the following text as well:
4) Hello there 1) How are you? 5) Good.
I'd suggest the following (PCRE):
(?:\d+\)\s*(.*?))*$
The inner part
\d+\)\s*
matches the list number and the closing brace, followed by optional white space(s).(.*?)
matches the list text, but in a non-greedy manner (otherwise, it would also match the next list item).The enclosing
(?: )*$
then matches the above zero or more times, until the end of the input.
You should keep in mind text after number and bracket might be any text, this would find your substrings:
\d\).+?(?=\d\)|$)
EDIT:
To get rid of whitespace and return only text without a number, get group 1 from following match:
\d\)\w*(.+?)(?=\d\)|$)
To get number in group(1) and text in group(2) use this:
(\d)\)\w*(.+?)(?=\d\)|$)
精彩评论