how to grep part of the content from a string in bash

2023-01-31 08:56 问答作者：

For example when filte开发者_运维技巧ring html file, if every line is in this kind of pattern:

<a href="xxxxxx" style="xxxx"><i>some text</i></a>

how can I get the content of href, and how can I get the text between <i> and </i>?

cat file | cut -f2 -d\"

FYI: Just about every other HTML/regexp post on Stackoverflow explains why getting values from HTML using anything other than HTML parsing is a bad idea. You may want to read some of those. This one for example.

If href is always the second token separated by space in a,ine then u can try

grep "href" file | cut -d' ' -f2 | cut -d'=' -f2

Here's how to do it using xmlstarlet (optionally with tidy):

# extract content of href and <i>...</i>
echo '<a href="xxxxxx" style="xxxx"><i>some text</i></a>' |
xmlstarlet sel -T -t -m "//a" -v @href -n -v i -n

# using tidy & xmlstarlet
echo '<a href="xxxxxx" style="xxxx"><i>some text</i></a>' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null | 
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:a" -v @href -n -v . -n

继续阅读：bash regex shell

how to grep part of the content from a string in bash

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？