to search for consecutive list elements prefixed by number and dot in plain text
The text looks like this:
"Beginning. 1. The container is 1.5 meters long 2. It can hold up to 2lt of fluid. 3. It 4 holes."
There may not be a dot at the end of each list element.
How can I split this text into a list as shown below?
"Beginning."
"The container is 1.5 meters long"
"It can hold up to 2lt of fluid."
"It has 4 holes."
In other words I need to match (\d+)\. such that all (\d+) are consecutive integers so that I can split and trim the text between them. Is it possible开发者_如何学Go with regex? How far do I have to venture into the realm of computer science?
Use
\d+\.(?!\d)
as the splitting regex, i. e. in PHP
$result = preg_split('/\d+\.(?!\d)/', $subject);
The negative lookahead (?!\d)
ensures that no digit follows after the dot has been matched.
Or make the spaces mandatory - if that's an option:
$result = preg_split('/\s+\d+\.\s+/', $subject);
This is working c# code:
string s = "Beginning. 1. The container is 1.5 meters long 2. It can hold up to 2lt of fluid. 3. It has 4 holes.";
string[] res = Regex.Split(s, @"\s*\d+\.\s+");
foreach (var r in res)
{
Console.WriteLine(r);
}
Console.ReadLine();
I split on \s*\d+\.\s+
that means optional white space, followed by at least one digit ,followed by a dot, then at least one whitespace.
精彩评论