In BASH, delete everything between two sets of characters
I've combined a bunch of email files into one large text file & now I'm trying to delete all the header lines from the emails out of thi开发者_Go百科s new text files. I have a set of unique characters I can use as markers to delete between them, but I'm coming up short with finding a RegEx that will strip out the header files. An example set is below (including the two asterisks and the double equals at the bottom).
** w54cs6547wem; Sat, 30 Oct 2010 00:06:43 -0700 (PDT) s10mr13764658ybi.218.1288422402631; Sat, 30 Oct 2010 00:06:42 -0700 (PDT) p13si451872ybk.2.2010. .36; Sat, 30 Oct 2010 00:06:42 -0700 (PDT) Sat, 30 Oct 2010 02:01:23 -0500 Date: Sat, 30 Oct 2010 02:01:22 -0500 Subject: Message-ID: Thread-Index: Act4ABHi0HfIPTIzRwe9oy8ojziTig==
sed -i '/\*\*/,/==/d' FILE
changes your file in place (-i),
sed '/\*\*/,/==/d' FILE > MODIFIED
saves the modification to a newly created file.
I don't know bash replacement syntax, but the regex you want is:
/\*\*.*?==/
In PHP, the code would be:
$str = preg_replace('/\*\*.*?==/', '', $str);
Hopefully you can translate that into bash without any trouble.
Explanation:
The trick here is the .*?. The ? makes the .* lazy, so it will start at ** and match everything until the first == it finds. Without the ?, the .* would be greedy and grab everything between the first ** and the last == in the document. So if you have something like this:
**foo==bar **baz==quux **abc==xyz
...using /\*\*.*?==/ as your regex would give you bar quux xyz, while /\*\*.*==/ would give only xyz.
If you are going to do that, most probably you would be processing the entire file in memory. Here's a line by line approach.
$> cat file
some words
here that i want
**
w54cs6547wem; Sat, 30 Oct 2010 00:06:43 -0700 (PDT)
s10mr13764658ybi.218.1288422402631; Sat, 30 Oct 2010 00:06:42 -0700 (PDT)
p13si451872ybk.2.2010. .36; Sat, 30 Oct 2010 00:06:42 -0700 (PDT)
Sat, 30 Oct 2010 02:01:23 -0500
Date: Sat, 30 Oct 2010 02:01:22 -0500 Subject:
Message-ID:
Thread-Index: Act4ABHi0HfIPTIzRwe9oy8ojziTig==
other words
here that i also want
$> awk '/^\*\*/{f=1;next} f&&/==$/{f=0;next} f{next} !f' file
some words
here that i want
other words
here that i also want
The idea is to set a flag when the ** is found, then skip the line until == is found.
It is easily expressible in perl: cat file | perl -p -i -e 'undef $_ if /^\*\*/ .. /==$/'. Same for sed: cat file | sed -e '/^\*\*/,/==$/d'.
加载中,请稍侯......
精彩评论