开发者

Regexp split by space preserving string in curl braces

I have a string that looks like that

arg1 {0 1} arg2 {5 87} string {with space} ar3 1

It is split by space, but string may contain spaces as well, so it causes problems for 开发者_如何学Cstrings with spaces. I still need to split this string, but I'd like to do not split string contained in curl braces and prefixed by string keyword. That means that the string above should be split like that

arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1

Can't implement this, I really need to read a lot about regular expressions. Could you please help me?


step 1:split with space as usual, get an array

step 2: go through the array, if find {[a-zA-Z]+, join the next element with a space, and remove the next element.

then you got what you want. the following awk command shows as an example.

echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1"|awk '{split($0,a); 
for(i=1;i<=length(a);i++){
  if(a[i]~/{[a-zA-Z]+/){a[i]=a[i]" "a[i+1];delete a[i+1];} 
  if(a[i])print a[i];} }'

arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1

==update==

OK, based on your comment, this works too:

step1, find out those strings that you don't want to "split", replace with a special string. and important is saving found strings to another array. The pattern in grep example:

echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|grep -E -o '\{([a-zA-Z]+\s*)*\}'

        {with space}
        {abc def}
        {xyz zyx}

after replace:xxxxxxxxx as the special string

kent$  echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|sed -r 's#\{([a-zA-Z]+\s*)*\}#xxxxxxxxx#g'

arg1 {0 1} arg2 {5 87} string xxxxxxxxx ar3 1 xxxxxxxxx xxxxxxxxx

step2, do split

step3, replace the special string back with right index.


I don't know QRegExp, so I don't know if it has lookaround capabilities. If it does, you could try splitting on something like this:

(?<!(^|})[^{]*\bstring\s{[^}]*)\s

That should split on any whitespace character except those inside a pair of braces immediately preceded by the word string. It will ignore the string keyword if it's already inside a set of braces.

You can also use a simplified version: (?<!\bstring\s{[^}]*)\s, although this will be affected by weird stuff like foo {string {bar qux}}.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜