Looking for regex to erase href text
If I have a bunch of urls like this:
<li><a href="http://www.xyz.com/sometext/someothertext/123/sometext/">Xyz 123</a></li>
<li><a href="http://www.xy开发者_开发知识库z.com/345/sometext/someothertext/">Xyz 345</a></li>
What would a regex look like to erase the urls inside the hrefs so that they become:
<li><a href="">Xyz 123</a></li>
<li><a href="">Xyz 345</a></li>
The following should do what you like:
/href=\"([^\"]*)\"/
Basically match href="<any text but a '"'>"
.
Search for <a href="[^"]*"
and replace with <a href=""
.
If you add more details about which language you're using, I can be more specific. Be aware also that regular expressions are usually not the tool of choice when dealing with HTML.
First of all, do not use regex to parse HTML — why? Have a look here or here.
Process the HTML using an XML reader / XML document processing engine. Then use XPath to find nodes matching your criteria and alter href
attributes in the DOM.
Note: For HTML which is not well-formed XML a more-general HTML (SGML) parser is required.
I partially agree with the others but a more complete version would be
/(<a[^>]+href\s*=\s*\")(.*?)("[^>]*>)/$1$3/gi
精彩评论