Regexp split by space preserving string in curl braces
I have a string that looks like that
arg1 {0 1} arg2 {5 87} string {with space} ar3 1
It is split by space, but string may contain spaces as well, so it causes problems for 开发者_如何学Cstrings with spaces. I still need to split this string, but I'd like to do not split string contained in curl braces and prefixed by string
keyword. That means that the string above should be split like that
arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1
Can't implement this, I really need to read a lot about regular expressions. Could you please help me?
step 1:split with space as usual, get an array
step 2: go through the array, if find {[a-zA-Z]+
, join the next element with a space, and remove the next element.
then you got what you want. the following awk command shows as an example.
echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1"|awk '{split($0,a);
for(i=1;i<=length(a);i++){
if(a[i]~/{[a-zA-Z]+/){a[i]=a[i]" "a[i+1];delete a[i+1];}
if(a[i])print a[i];} }'
arg1
{0
1}
arg2
{5
87}
string
{with space}
ar3
1
==update==
OK, based on your comment, this works too:
step1, find out those strings that you don't want to "split", replace with a special string. and important is saving found strings to another array. The pattern in grep example:
echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|grep -E -o '\{([a-zA-Z]+\s*)*\}'
{with space}
{abc def}
{xyz zyx}
after replace:xxxxxxxxx as the special string
kent$ echo "arg1 {0 1} arg2 {5 87} string {with space} ar3 1 {abc def} {xyz zyx}"|sed -r 's#\{([a-zA-Z]+\s*)*\}#xxxxxxxxx#g'
arg1 {0 1} arg2 {5 87} string xxxxxxxxx ar3 1 xxxxxxxxx xxxxxxxxx
step2, do split
step3, replace the special string back with right index.
I don't know QRegExp, so I don't know if it has lookaround capabilities. If it does, you could try splitting on something like this:
(?<!(^|})[^{]*\bstring\s{[^}]*)\s
That should split on any whitespace character except those inside a pair of braces immediately preceded by the word string
. It will ignore the string
keyword if it's already inside a set of braces.
You can also use a simplified version: (?<!\bstring\s{[^}]*)\s
, although this will be affected by weird stuff like foo {string {bar qux}}
.
精彩评论