Delete strings from an html file containing a pattern using unix commands

2023-01-08 16:04 问答作者：

I have a messy html that looks like this:

<div id=":0.page.0" class="page-element" style="width: 1620px;">
 <div>
  <img src="viewer_files/viewer_004.png" class="page-image" style="width: 800px; height: 1131px; display: none;">
  <img src="viewer_files/viewer_005.png" class="page-image" style="width: 1600px;">
 </div>
</div>// this repeats 100+ times with different 'src' attributes

Now this is all one line actually (i have formatted in multiple lines for easy readibility). I am trying to remove all <img> tags that have display:none; set in the inline css. Is it possible to use sed/awk or some other unix command to achieve this? I think if it were a well indente开发者_如何转开发d html document, it would've been easy.

HTML and regexes are a notoriously bad match, so you probably want something that is HTML-aware. I'd probably go for something like TagSoup, but there are no doubt other options that are more shell-friendly, or suitable for any favourite scripting language you may have.

I would use either Twig or XMLStarlet to do this kind of processing. A lot more reliable than sed/awk/grep. Since your pattern is regular and repeating, they would work too.

sed 's/<img.*display: none;[^>]>//g' file

sed -e "s/<img[^>]*display: none;[^>]*>//g" filein

A quick explanation about sed :

s stands for substitution / are delimiters

s means that the first field will be a pattern to be search, that will be replaced by the second one. The last one are options. g means global (replace it many times if many matches are found).

to replace inplace : sed -i -e "..."

That would do it

sed -e "s@<img.*display: none;.*>@@g" FILINAME

Sed has several commands, but most people only learn the substitute command: "s". A useful command deletes every line that matches the restriction: "d".

sed -e "/<img[^>]*display: none;[^>]*>/d" File

Be carreful it's delete entire line.

继续阅读：sed shell

Delete strings from an html file containing a pattern using unix commands

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？