开发者

In BASH, delete everything between two sets of characters

I've combined a bunch of email files into one large text file & now I'm trying to delete all the header lines from the emails out of thi开发者_Go百科s new text files. I have a set of unique characters I can use as markers to delete between them, but I'm coming up short with finding a RegEx that will strip out the header files. An example set is below (including the two asterisks and the double equals at the bottom).


**


 w54cs6547wem;         Sat, 30 Oct 2010 00:06:43 -0700 (PDT)
 s10mr13764658ybi.218.1288422402631;         Sat, 30 Oct 2010 00:06:42 -0700 (PDT)


p13si451872ybk.2.2010. .36;         Sat, 30 Oct 2010 00:06:42 -0700 (PDT)

  Sat, 30 Oct 2010 02:01:23 -0500 

Date: Sat, 30 Oct 2010 02:01:22 -0500 Subject: 
Message-ID:  
Thread-Index: Act4ABHi0HfIPTIzRwe9oy8ojziTig==


sed -i '/\*\*/,/==/d' FILE 

changes your file in place (-i),

sed '/\*\*/,/==/d' FILE > MODIFIED

saves the modification to a newly created file.


I don't know bash replacement syntax, but the regex you want is:

/\*\*.*?==/

In PHP, the code would be:

$str = preg_replace('/\*\*.*?==/', '', $str);

Hopefully you can translate that into bash without any trouble.

Explanation:

The trick here is the .*?. The ? makes the .* lazy, so it will start at ** and match everything until the first == it finds. Without the ?, the .* would be greedy and grab everything between the first ** and the last == in the document. So if you have something like this:

**foo==bar **baz==quux **abc==xyz

...using /\*\*.*?==/ as your regex would give you bar quux xyz, while /\*\*.*==/ would give only xyz.


If you are going to do that, most probably you would be processing the entire file in memory. Here's a line by line approach.

$> cat  file
some words
here that i want
**


 w54cs6547wem;         Sat, 30 Oct 2010 00:06:43 -0700 (PDT)
 s10mr13764658ybi.218.1288422402631;         Sat, 30 Oct 2010 00:06:42 -0700 (PDT)


p13si451872ybk.2.2010. .36;         Sat, 30 Oct 2010 00:06:42 -0700 (PDT)

  Sat, 30 Oct 2010 02:01:23 -0500

Date: Sat, 30 Oct 2010 02:01:22 -0500 Subject:
Message-ID:
Thread-Index: Act4ABHi0HfIPTIzRwe9oy8ojziTig==

other words
here that i also want

$> awk '/^\*\*/{f=1;next} f&&/==$/{f=0;next} f{next} !f' file
some words
here that i want

other words
here that i also want

The idea is to set a flag when the ** is found, then skip the line until == is found.


It is easily expressible in perl: cat file | perl -p -i -e 'undef $_ if /^\*\*/ .. /==$/'. Same for sed: cat file | sed -e '/^\*\*/,/==$/d'.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜