Getting text from html page, shell

2023-02-07 19:43 问答作者：

I am trying to get text from a html page in shell, as part of a script to show me the temperature in my local area.

I however can't get my head around how to use grep properly

Excerpt from web page

</div><div id="yw-forecast" class="night" style="height:auto"><em>Current conditions as of 8:18 PM GMT</em><div id="yw-cond">Light Rain Shower</div><dl><dt>Feels Like:</dt><dd>6 &deg;C</dd><dt>Baro开发者_如何学JAVAmeter:</dt><dd style="position:relative;">1,015.92 mb and steady</dd><dt>Humidity:</dt><dd>87 %</dd><dt>Visibility:</dt><dd>9.99 km</dd><dt>Dewpoint

Except shorter cut down further

<dt>Feels Like:</dt><dd>6 &deg;C</dd>

Trying to grab the 6 °C

I have tried a variety of different tactics, including grep and awk. Can a shell wizard help me out?

Try

grep -o -e "<dd>.*deg;C</dd>" the_html.txt

From the man page:

-e PATTERN, --regexp=PATTERN
      Use PATTERN as  the  pattern.   This  can  be  used  to  specify
      multiple search patterns, or to protect a pattern beginning with
      a hyphen (-).  (-e is specified by POSIX.)

...

-o, --only-matching
      Print only the matched (non-empty) parts  of  a  matching  line,
      with each such part on a separate output line.

If you want to get rid of <dd> and </dd> too, just append | cut -b 5-12.

Give this a try:

grep -Po '(?<=Feels Like:</dt><dd>).*?(?=</dd>)' | sed 's/ &deg;/°/'

Result:

6°C

If x is your input file and the HTML source is as regularly formatted as your write, this should work --

grep deg x | sed -e "s#^.>([0-9]{1,2} \°[CF])<.#\1#"

Seth

继续阅读：shell

Getting text from html page, shell

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？