How do you "debug" a regular expression with sed?

2023-01-22 19:11 问答作者：

I'm trying to use a regexp using sed. I've tested my regex with kiki, a gnome application to test regexpd, and it works in kiki.

date: 2010-10-29 14:46:33 -0200;  author: 00000000000;  state: Exp;  lines: +5 -2;  commitid: bvEcb00aPyqal6Uu;

I want to replace author: 00000000000; with nothing. So, I created the regexp, that works when I test it in kiki:

author:\s[0-9]{11};

But doesn't work when I test it in sed.

sed -i "s/author:\s[0-9]{11};//g" /tmp/test_regex.txt

I know regex have different implementations, and this could be 开发者_StackOverflow中文版the issue. My question is: how do I at least try do "debug" what's happening with sed? Why is it not working?

My version of sed doesn't like the {11} bit. Processing the line with:

sed 's/author: [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9];//g'

works fine.

And the way I debug it is exactly what I did here. I just constructed a command:

echo 'X author: 00000000000; X' | sed ...

and removed the more advanced regex things one at a time:

used <space> instead of \s, didn't fix it.
replaced [0-9]{11} with 11 copies of [0-9], that worked.

It pretty much had to be one of those since I've used every other feature of your regex before with sed successfully.

But, in fact, this will actually work without the hideous 11 copies of [0-9], you just have to escape the braces [0-9]\{11\}. I have to admit I didn't get around to trying that since it worked okay with the multiples and I generally don't concern myself too much with brevity in sed since I tend to use it more for quick'n'dirty jobs :-)

But the brace method is a lot more concise and adaptable and it's good to know how to do it.

In sed you need to escape the curly braces. "s/author:\s[0-9]\{11\};//g" should work.

Sed has no debug capability. To test you simplify at the command line iteratively until you get something to work and then build back up.

command line input:

$ echo 'xx a: 00123 b: 5432' | sed -e 's/a:\s[0-9]\{5\}//'

command line output:

xx  b: 5432

There is a Python script called sedsed by Aurelio Jargas which will show the stepwise execution of a sed script. A debugger like this isn't going to help much in the case of characters being taken literally (e.g. {) versus having special meaning (e.g. \{), especially for a simple substitution, but it will help when a more complex script is being debugged.

The latest SVN version.
The most recent stable release.
^{_{Disclaimer: I am a minor contributor to sedsed.}}

How do you "debug" a regular expression with sed?

Another sed debugger, sd by Brian Hiles, written as a Bourne shell script (I haven't used this one).

You have to use the -r flag for extended regex:

sed -r 's/author:\s[0-9]{11};//g'

or you have to escape the {} characters:

sed 's/author:\s[0-9]\{11\};//g'

If you want to debug a sed command, you can use the w (write) command to dump which lines sed has matched to a file.

From sed manpages:

Commands which accept address ranges

(...)

w filename

Write the current pattern space to filename.

Applying to your question

Let's use a file named sed_dump.txt as the sed dump file.

1) Generate the sed dump:

sed "/author:\s[0-9]{11};/w sed_dump.txt" /tmp/test_regex.txt

2) Check file sed_dump.txt contents:

cat sed_dump.txt

Output:

It's empty...

3) Trying to escape '{' regex control character:

sed "/author:\s[0-9]\{11\};/w sed_dump.txt" /tmp/test_regex.txt

4) Check file sed_dump.txt contents:

cat sed_dump.txt

Output:

date: 2010-10-29 14:46:33 -0200; author: 00000000000; state: Exp; lines: +5 -2; commitid: bvEcb00aPyqal6Uu;

Conclusion

In step 4), a line has been matched, this means that sed matched your pattern in that line. It does not guarantee the correct answer, but it's a way of debugging using sed itself.

You are using the -i flag incorrectly. You need to put give it a string to put on the temporary file. You also need to escape your curly braces.

sed -ibak -e "s/author:\s[0-9]\{11\};//g" /tmp/test_regex.txt

I usually debug my statement by starting with a regex I know will work (like 's/author//g' in this case). When that works I know that I have the right arguments. Then I expand the regex incrementally.

That looks more like a perl regex than it does a sed regex. Perhaps you would prefer using

perl -pi.orig -e 's/author:\s[0-9]{11};//g' file1 file2 file3

At least that way you could always add -Mre=debug to debug the regex.

The fact that you are substituting author: 00000000000 is already said in sed when you add the s before the first /.

继续阅读：debugging regex sed

How do you "debug" a regular expression with sed?

Applying to your question

Conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Applying to your question

Conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？