Getting text from html page, shell
I am trying to get text from a html page in shell, as part of a script to show me the temperature in my local area.
I however can't get my head around how to use grep properly
Excerpt from web page
</div><div id="yw-forecast" class="night" style="height:auto"><em>Current conditions as of 8:18 PM GMT</em><div id="yw-cond">Light Rain Shower</div><dl><dt>Feels Like:</dt><dd>6 °C</dd><dt>Baro开发者_如何学JAVAmeter:</dt><dd style="position:relative;">1,015.92 mb and steady</dd><dt>Humidity:</dt><dd>87 %</dd><dt>Visibility:</dt><dd>9.99 km</dd><dt>Dewpoint
Except shorter cut down further
<dt>Feels Like:</dt><dd>6 °C</dd>
Trying to grab the 6 °C
I have tried a variety of different tactics, including grep and awk. Can a shell wizard help me out?
Try
grep -o -e "<dd>.*deg;C</dd>" the_html.txt
From the man page:
-e PATTERN, --regexp=PATTERN
Use PATTERN as the pattern. This can be used to specify
multiple search patterns, or to protect a pattern beginning with
a hyphen (-). (-e is specified by POSIX.)
...
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
If you want to get rid of <dd>
and </dd>
too, just append | cut -b 5-12
.
Give this a try:
grep -Po '(?<=Feels Like:</dt><dd>).*?(?=</dd>)' | sed 's/ °/°/'
Result:
6°C
If x is your input file and the HTML source is as regularly formatted as your write, this should work --
grep deg x | sed -e "s#^.>([0-9]{1,2} \°[CF])<.#\1#"
Seth
精彩评论