开发者

to search for consecutive list elements prefixed by number and dot in plain text

The text looks like this:

"Beginning. 1. The container is 1.5 meters long 2. It can hold up to 2lt of fluid. 3. It 4 holes."

There may not be a dot at the end of each list element.

How can I split this text into a list as shown below?

"Beginning."
"The container is 1.5 meters long"
"It can hold up to 2lt of fluid."
"It has 4 holes."

In other words I need to match (\d+)\. such that all (\d+) are consecutive integers so that I can split and trim the text between them. Is it possible开发者_如何学Go with regex? How far do I have to venture into the realm of computer science?


Use

\d+\.(?!\d)

as the splitting regex, i. e. in PHP

$result = preg_split('/\d+\.(?!\d)/', $subject);

The negative lookahead (?!\d) ensures that no digit follows after the dot has been matched.

Or make the spaces mandatory - if that's an option:

$result = preg_split('/\s+\d+\.\s+/', $subject);


This is working c# code:

string s = "Beginning. 1. The container is 1.5 meters long 2. It can hold up to 2lt of fluid. 3. It has 4 holes.";
string[] res = Regex.Split(s, @"\s*\d+\.\s+");

foreach (var r in res)
{
    Console.WriteLine(r);
}

Console.ReadLine();

I split on \s*\d+\.\s+ that means optional white space, followed by at least one digit ,followed by a dot, then at least one whitespace.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜