开发者

.NET regular expression

I have html page source code with img tags like

<p>xyz </p>< img ....... 1 . gif >........<p>xyz</p>
           < img ........ 2 . jpg >..............<p>xyz</p>    
           < img ........ 3 . jpg ><p>xyz</p>
           < img ....... 4 . gif >......<span>xyz</span>

Img tags can contains both jpg and other format images and can be in any order in web page source.Now I want to use .NET regular expression which can give me first img tag with JPG image like

< img ... 2. jpg >

or any first img tag with no gif image. Basically i want to remove sm开发者_如何学Ciley gif images in my regular expression

Please suggest me the regular expression


Do not parse HTML with RegEx. See here for compelling reasons.

HTML is not a regular language and as such not suitable for parsing with a regular expression.

Use the HTML Agility Pack to parse HTML. It exposes the parsed HTML similarly to XmlDocument and can be queried using XPath.


<.*img[^>]*\.[^>]*jpg[^>]*>


Using regular expressions for parsing or modifying HTML documents is frowned upon. For a one shot operation, you could use

<img\s+[^>]*2.jpg[^>]*>(</img>)?

to identify image tags containing "2.jpg". If you want to do this more than once, you'd do yourself a favor using a HTML Parser like the HTML Agility Pack. There are much less fragile when confronted with real world HTML code.


if the html is valid xhtml you can also use xpath or xslt.

xpath should look like that (sorry not tested):

//img[not fn:ends-with(@src, ".gif")]


how about jquery?

it is easy to find html dom parts and change them $('img[src~=.gif]').hide();

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜