开发者

awk return parent HTML tag value if its child tag content is matched - possible?

I've been searching for solution to this problem for quite some time, but I can't figure it out on my own.

So I have bunch of HTML blocks of code, and I want to search for specific string that is contained in one of the inner tags and if there's match I want return it's parent tag value. Here's example"

<li rel="Returns this value">
    <some other tags and elements here />
    <a class="link"><span>This match</span></a>
</li>

We search for string This match and it will return Returns this value. Is this possible in awk? If not, what is easiest way to accomplish this? I do not mind any solution, however awk or similar command-line tool would be prefered. I'm runing on Ubuntu server and I have root access, so if needed I could rely on other languages, such as Ruby, Python, Perl, PHP, and others.

So far I've been able to search for string between the span tags, and retur开发者_开发技巧n its contents. It could be however be done much easier with simple sed command, so there's not much use for it yet. However, it may be still be useful and may be improved to make what I need it to do, so here goes:

awk 'BEGIN{RS="";FS="</span>"}
/li/{
 for(i=1;i<=NF;i++){
    if($i ~ /span/){
        gsub(/.*span>/,"",$i)
        print $i
    }    
 } 
}'

When used on above example, it will return This match. Thanks a lot for suggestions.


In general you can't parse html with regular expressions.

Which doesn't mean that you can't parse html in awk, though it would be a big job and I've never heard of anyone doing it.

If your targets are well defined and the input is pretty uniform and you can guarantee certain things about the nesting of tags in you input, you might be able to manage it.

However, for the most part, awk is the wrong tool for the job. Better to choose a language that has a HTML parsing engine available and use that. Perl, python, php, ruby...lots of choices.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜