开发者

modify text file

I need to modify all files that has a ".txt" extension within a directory in the following way:

remove all text lines beginning with the line that starts with "xxx" and the line that ends 开发者_如何学Pythonwith "xxx", inclusive.

I know how to do this in Java or C++, but can someone show me a simple script that can get this done?

Thanks!


I assume that you want to lose start and end, and those words appear by themselves on the lines you want lost.

perl -ni.bak -e 'print unless /^start$/../^end$/' *.txt

Note that I made a backup of the modified files so that you can inspect the change and fix it if you want.


Not that there’s anything wrong with @btilly’s answer — in fact, I would do it his way myself — but just to show you that There’s More Than One Way To Do It, you could also use a substitution:

% perl -i.save -0777 -pe 's/^start.*end$//gsm' *.txt

That will leave you an extra newline sequence at the end, but it works if the end is at EOF and there’s no newline. You could also take that into account this way:

% perl -i.save -0777 -pe 's/^start.*end$\R?//gsm' *.txt

You said a line that starts with "xxx" but you didn’t say that was all that was on the line, and you said the line that ends with "xxx", but you didn’t say that was all that was on its line either. And you didn’t mention what happens if those are the same line. I believe you’ll find that my solution handles those cases.

It doesn’t, however, handle the case of the start and the end strings overlapping. If you really want that, too, tell me and I’ll fiddle with it so it works.

Another nice thing about using Perl for this is that it very easily works with UTF-8 datafiles, too:

bash-3.2$ cat /tmp/data
     1  fee 
     2  commencé
     3  fie foo
     4  fum
     5  terminé
     6  beat on 
     7  the drum

bash-3.2$ perl -Mutf8 -CSD -nle 'print unless /commencé/ .. /terminé/' /tmp/data
     1  fee 
     6  beat on 
     7  the drum

bash-3.2$ perl -i.guardé -Mutf8 -CSD -nle 'print unless /commencé/ .. /terminé/' /tmp/data

bash-3.2$ cat /tmp/data
     1  fee 
     6  beat on 
     7  the drum

bash-3.2$ cat /tmp/data.guardé 
     1  fee 
     2  commencé
     3  fie foo
     4  fum
     5  terminé
     6  beat on 
     7  the drum

Et voilà! :)

This is one of those problem domains where Perl especially lends itself to extremely short, simple, readable, and maintainable answers. It really is the ultimate Unix Power Tool.

Obviously you’ll never approach this sort of power-tool operation from Java or C++. Ruby, I suspect, might be able to do something similar, but I think Python is too far from the Unix style to provide as succinct and simple an answer.

Plus it runs quite quickly, too: not quite as fast as C, but certainly much, much faster than some ponderously slow shell script. Well, at least if you do the linewise processing, that is. Reading everything into memory is never going to scale, but it’s ok for little things. Also, shell tools tend to bomb out on files with binary data in them, or very long lines, so you can’t always rely on them for such things, especially in a portable, cross-platform fashion. And almost none of them work reliably with Unicode, which is a real must these days.


ruby -i.bak -ne 'print unless /^start/.../^end/' *.txt
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜