Extract contents of paragraph tag using a Perl one liner

2023-02-11 21:38 问答作者：

I would like to match the contents of a paragraph tag u开发者_JAVA百科sing a perl reg ex one liner. The paragraph is something like this:

<p style="font-family: Calibri,Helvetica,serif;">Text I want to extract</p>

so I have been using something like this:

perl -nle 'm/<p>($.)<\/p>/ig; print $1' file.html

Any ideas appreciated

thanks

Mandatory link to what happens when you try to parse HTML with regular expressions.

David Dorward's comment, to use HTML::TreeBuilder, is a good one. Another good way to do this, is by using HTML::DOM:

perl -MHTML::DOM -e 'my $dom = HTML::DOM->new(); $dom->parse_file("file.html"); my @p = $dom->getElementsByTagName("p"); print $p[0]->innerText();'

$ in matching part means 'end-of-the-string' and you need also match all in p-tag non-greedy way:

perl -nle 'm/<p.*?>(.+)<\/p/ig; print $1' test.html

继续阅读：expression perl tags

精彩评论