Remove links from text file

2023-01-06 05:49 问答作者：

how can I remove links from a raw html text? I've got:

Foo bar <a href="http://www.foo.开发者_StackOverflowcom">blah</a> bar foo

and want to get:

Foo bar blah bar foo

afterwards.

You're looking to parse HTML with regexps, and this won't work in all but the simplest cases, since HTML isn't regular. A much more reliable solution is to use an HTML parser. Numerous exist, for many different languages.

sed -re 's|<a [^>]*>([^<]*)</a>|\1|g'

But Brian's answer is right: This should only be used in very simple cases.

try with:

sed -e 's/<a[^>]*>.*<\/a>//g' test.txt

$ echo 'Foo bar <a href="http://www.foo.com">blah</a> bar foo' | awk 'BEGIN{RS="</a>"}/<a href/{gsub(/<a href=\042.*\042>/,"")}1'

Foo bar blah bar foo

继续阅读：html-parsing regex sed

Remove links from text file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？