开发者

How can I remove a variable number of lines from the top and bottom of multiple HTML documents?

I have a large number of html documents that need a variable number of lines removed from the top and bottom. The pa开发者_运维知识库rt I want always starts with <div class="someclass"> and the bottom section always starts with <div class="bottomouter>. Something like this:

<html>
[...]
<div class="someclass"><!-- stuff i want to keep --></div>
<div class="bottomouter">[...]</div>
[...]
</html>

How could this be accomplished?

I'm working on a Linux box so I have access to Perl, Sed, Awk, &c. However, I don't know how to approach this (or if this is the right place to ask).

Edit: To clarify I'm moving a bunch of static document into a template system and they need the headers and footers removed.


How about a perl script like this:

#!/usr/bin/perl -n
$output_enabled = 1 if (/^<div class="someclass">/);
$output_enabled = 0 if (/^<div class="bottomouter">/);
print if ($output_enabled);

The -n option tells perl to apply the script to each line of input, putting the line in the $_ variable (which is used implicitly in a lot of places in Perl; think of it like the word "it"). I set the $output_enabled variable (which persists across lines since it's a global variable, not declared with my) to 1 (true) if the current line matches the regex /^<div class="someclass">/, that is, if it starts with <div class="someclass">. Similarly, I set $output_enabled to 0 (false) if the line starts with <div class="bottomouter">. Finally, I print out the line if $output_enabled is true (it's initially false because it's undefined).


sed -n '/begPattern/,/endPattern/p'


To output the part of the file between the delimiting lines without including them:

sed '1,/<div class="someclass">/d;/<div class="bottomouter">/,$d' inputfile
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜