Getting text from inside an HTML tag within a local file with grep [duplicate]

2023-01-13 06:59 问答作者：

This question already has answers here: Closed 12 years ago.

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

Excerpt From Input File

<TD class="clsTDLabelWeb" width="28%">Municipality:&nbsp;</TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>

My Regular Expression

(?<=<span id="DInfo1_Municipality">)([^</span>]*)

I have an HTML file saved to disk. I would 开发者_Go百科like to use grep to search through the file and output the contents of a specific span, though I don't know if this is a proper use of grep. When I run grep on the file with the expression read from another file (so I dont mess up escaping any special characters), it doesn't output anything. I have tested the expression in RegExr and it matches "JUPITER" which is exactly what I want returned. Thank you so much for your help!

Desired Output

JUPITER

Give this a try:

sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$|\1|p' file

or with GNU grep and your regex:

grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)'

Grep doesn't support that type of regex (lookbehind assertions), and its a very poor tool for this, but for the example given it is workable, will break under many situtions.

grep -io "<span id=\"DInfo1_Municipality\">.*</span>" file.htlm | grep -io ">[^<]*" | grep -io [^>]*

something crazy like that, not a good idea.

sed -n '/DInfo1_Municipality/s/<\/span.*//p' file | sed 's/.*>//'

继续阅读：bash grep regex screen-scraping

Getting text from inside an HTML tag within a local file with grep [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？