modify text file

2023-02-28 14:19 问答作者：

I need to modify all files that has a ".txt" extension within a directory in the following way:

remove all text lines beginning with the line that starts with "xxx" and the line that ends 开发者_如何学Pythonwith "xxx", inclusive.

I know how to do this in Java or C++, but can someone show me a simple script that can get this done?

Thanks!

I assume that you want to lose start and end, and those words appear by themselves on the lines you want lost.

perl -ni.bak -e 'print unless /^start$/../^end$/' *.txt

Note that I made a backup of the modified files so that you can inspect the change and fix it if you want.

Not that there’s anything wrong with @btilly’s answer — in fact, I would do it his way myself — but just to show you that There’s More Than One Way To Do It, you could also use a substitution:

% perl -i.save -0777 -pe 's/^start.*end$//gsm' *.txt

That will leave you an extra newline sequence at the end, but it works if the end is at EOF and there’s no newline. You could also take that into account this way:

% perl -i.save -0777 -pe 's/^start.*end$\R?//gsm' *.txt

You said a line that starts with "xxx" but you didn’t say that was all that was on the line, and you said the line that ends with "xxx", but you didn’t say that was all that was on its line either. And you didn’t mention what happens if those are the same line. I believe you’ll find that my solution handles those cases.

It doesn’t, however, handle the case of the start and the end strings overlapping. If you really want that, too, tell me and I’ll fiddle with it so it works.

Another nice thing about using Perl for this is that it very easily works with UTF-8 datafiles, too:

bash-3.2$ cat /tmp/data
     1  fee 
     2  commencé
     3  fie foo
     4  fum
     5  terminé
     6  beat on 
     7  the drum

bash-3.2$ perl -Mutf8 -CSD -nle 'print unless /commencé/ .. /terminé/' /tmp/data
     1  fee 
     6  beat on 
     7  the drum

bash-3.2$ perl -i.guardé -Mutf8 -CSD -nle 'print unless /commencé/ .. /terminé/' /tmp/data

bash-3.2$ cat /tmp/data
     1  fee 
     6  beat on 
     7  the drum

bash-3.2$ cat /tmp/data.guardé 
     1  fee 
     2  commencé
     3  fie foo
     4  fum
     5  terminé
     6  beat on 
     7  the drum

Et voilà! :)

This is one of those problem domains where Perl especially lends itself to extremely short, simple, readable, and maintainable answers. It really is the ultimate Unix Power Tool.

Obviously you’ll never approach this sort of power-tool operation from Java or C++. Ruby, I suspect, might be able to do something similar, but I think Python is too far from the Unix style to provide as succinct and simple an answer.

Plus it runs quite quickly, too: not quite as fast as C, but certainly much, much faster than some ponderously slow shell script. Well, at least if you do the linewise processing, that is. Reading everything into memory is never going to scale, but it’s ok for little things. Also, shell tools tend to bomb out on files with binary data in them, or very long lines, so you can’t always rely on them for such things, especially in a portable, cross-platform fashion. And almost none of them work reliably with Unicode, which is a real must these days.

ruby -i.bak -ne 'print unless /^start/.../^end/' *.txt

继续阅读：perl python regex

modify text file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？