.NET regular expression
I have html page source code with img tags like
<p>xyz </p>< img ....... 1 . gif >........<p>xyz</p>
< img ........ 2 . jpg >..............<p>xyz</p>
< img ........ 3 . jpg ><p>xyz</p>
< img ....... 4 . gif >......<span>xyz</span>
Img tags can contains both jpg and other format images and can be in any order in web page source.Now I want to use .NET regular expression which can give me first img tag with JPG image like
< img ... 2. jpg >
or any first img tag with no gif image. Basically i want to remove sm开发者_如何学Ciley gif images in my regular expression
Please suggest me the regular expression
Do not parse HTML with RegEx. See here for compelling reasons.
HTML is not a regular language and as such not suitable for parsing with a regular expression.
Use the HTML Agility Pack to parse HTML. It exposes the parsed HTML similarly to XmlDocument
and can be queried using XPath
.
<.*img[^>]*\.[^>]*jpg[^>]*>
Using regular expressions for parsing or modifying HTML documents is frowned upon. For a one shot operation, you could use
<img\s+[^>]*2.jpg[^>]*>(</img>)?
to identify image tags containing "2.jpg". If you want to do this more than once, you'd do yourself a favor using a HTML Parser like the HTML Agility Pack. There are much less fragile when confronted with real world HTML code.
if the html is valid xhtml you can also use xpath or xslt.
xpath should look like that (sorry not tested):
//img[not fn:ends-with(@src, ".gif")]
how about jquery?
it is easy to find html dom parts and change them
$('img[src~=.gif]').hide();
精彩评论