Getting text from inside an HTML tag within a local file with grep [duplicate]
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
Excerpt From Input File
<TD class="clsTDLabelWeb" width="28%">Municipality: </TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>
My Regular Expression
(?<=<span id="DInfo1_Municipality">)([^</span>]*)
I have an HTML file saved to disk. I would 开发者_Go百科like to use grep to search through the file and output the contents of a specific span, though I don't know if this is a proper use of grep. When I run grep on the file with the expression read from another file (so I dont mess up escaping any special characters), it doesn't output anything. I have tested the expression in RegExr and it matches "JUPITER" which is exactly what I want returned. Thank you so much for your help!
Desired Output
JUPITER
Give this a try:
sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$|\1|p' file
or with GNU grep
and your regex:
grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)'
Grep doesn't support that type of regex (lookbehind assertions), and its a very poor tool for this, but for the example given it is workable, will break under many situtions.
grep -io "<span id=\"DInfo1_Municipality\">.*</span>" file.htlm | grep -io ">[^<]*" | grep -io [^>]*
something crazy like that, not a good idea.
sed -n '/DInfo1_Municipality/s/<\/span.*//p' file | sed 's/.*>//'
精彩评论