开发者

Getting text from html page, shell

I am trying to get text from a html page in shell, as part of a script to show me the temperature in my local area.

I however can't get my head around how to use grep properly

Excerpt from web page

</div><div id="yw-forecast" class="night" style="height:auto"><em>Current conditions as of 8:18 PM GMT</em><div id="yw-cond">Light Rain Shower</div><dl><dt>Feels Like:</dt><dd>6 &deg;C</dd><dt>Baro开发者_如何学JAVAmeter:</dt><dd style="position:relative;">1,015.92 mb and steady</dd><dt>Humidity:</dt><dd>87 %</dd><dt>Visibility:</dt><dd>9.99 km</dd><dt>Dewpoint

Except shorter cut down further

<dt>Feels Like:</dt><dd>6 &deg;C</dd>

Trying to grab the 6 °C

I have tried a variety of different tactics, including grep and awk. Can a shell wizard help me out?


Try

grep -o -e "<dd>.*deg;C</dd>" the_html.txt

From the man page:

-e PATTERN, --regexp=PATTERN
      Use PATTERN as  the  pattern.   This  can  be  used  to  specify
      multiple search patterns, or to protect a pattern beginning with
      a hyphen (-).  (-e is specified by POSIX.)

...

-o, --only-matching
      Print only the matched (non-empty) parts  of  a  matching  line,
      with each such part on a separate output line.

If you want to get rid of <dd> and </dd> too, just append | cut -b 5-12.


Give this a try:

grep -Po '(?<=Feels Like:</dt><dd>).*?(?=</dd>)' | sed 's/ &deg;/°/'

Result:

6°C


If x is your input file and the HTML source is as regularly formatted as your write, this should work --

grep deg x | sed -e "s#^.>([0-9]{1,2} \°[CF])<.#\1#"

Seth

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜