开发者

Regex get content between two pipes AND return a space where two pipes are next to each other with no spaces

How can I get all content between pipes and return a space where it comes across two pipes next to each other?

An example string and desired output is:

开发者_开发技巧
|test1| test2|test3 || test 4 |

Result1: "test1"
Result2: "test2"
Result3: "test3"
Result4: " "
Result5: "test4"

The closest I've got so far is:

  • /[^\|]+)/ which will get all data between pipes but does not detect ||.
  • /\|([^\|]*)/ which will get all data between pipes and detect || but have an extra whitespace result at the end.


This is not possible with a regular expression alone - regexes can only return text they have matched, not create new text.

So you'll have to detect programmatically whether there was an empty match and change the result to a single space. What language are you using?

As an example, in C# you could do this:

Regex regexObj = new Regex(@"(?<=\|\s*).*?(?=\s*\|)", RegexOptions.Multiline);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    text = matchResults.Value
    if (text == "") {
        text = " "
    }
    // now do whatever you want with it
    matchResults = matchResults.NextMatch();
} 

In Ruby, you don't have lookbehind until version 1.8, so you need a different approach. First remove leading and trailing delimiters:

temp = subject.gsub(/^\s*\|\s*|\s*\|\s*$/, '')

Then split along the remaining delimiters:

result = temp.split(/\s*\|\s*/)

and then iterate over the array you get, replacing empty strings with spaces.


In Ruby I'd not bother with a regex:

str = '|test1| test2|test3 || test 4 |'
str.split('|')[1 .. -1].map{ |s| (s.strip.empty?) ? ' ': s.strip } #=> ["test1", "test2", "test3", " ", "test 4"]


You can split the string with \s*\|\s* and get an array with each of the pieces. Without knowing what language you are using, I can't say what the specific API would for doing regular expression split on a string.


As already mentioned by Tim it is not possible using just a regex.

One way to do it is:

  1. Remove the leading and trailing pipe.
  2. Split the string on space(s) followed by pipe followed by space(s).
  3. If you find any piece to be empty, make it " ".

In Perl:

$str = '|test1| test2|test3 || test 4 |';
$str =~s/^\||\|$//;
@pieces = split/\s*\|\s*/,$str;
foreach(@pieces) {
        $_ = ' ' if($_ eq '');                                                  
        print $_,"\n";
}


(?<=\|)([^\|]*)(?=|) should do what you want. It uses positive and negative lookarounds, so it will not consume the pipes from being used in other matches.

This will give you the results: "test1", " test2", "test3 ", "", and " test 4 ".

If you want to trim your results using regex, use (?<=\|)\s*([^\|]*)\s*(?=|), giving you "test1", "test2", "test3", "", and "test 4".

Test 4 is tougher, because you cannot remove the internal space. And, as mentioned, regular expressions cannot create text, so it is impossible to return " " between tests 3 and 4. Of course, you can test for empty strings and replace them later, using whatever other language you are using.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜