How to strip EVERYTHING from a html string including texts but leave all <a> tags and their data intact using regex?

2023-03-20 11:35 问答作者：

Firstly I would like to say to the more experienced people than myself that it has to be done in regex. No access to a DOM parser due to weird situation.

So I have a full HTML/XHTML string and would like to strip everything from it except the links. Basically just the <a> tags are important. I need the tags to keep their information fully, so href, target, class, etc and it should work if its a self terminatin开发者_运维百科g tag or if it has a separate end tag. i.e. <a /> or <a></a>

Thanks for any HELP guys!

Of course you have the possibility to parse HTML in a Firefox extension. Have a look at HTML to DOM, especially the second and third way.

It might seem to be more complex, but it is less error prone than a regular expression.

As soon as you have a reference to the parsed content, all you have to do is to call ref.getElementsByTagName('a') and you are done.

result = subject.match(/<a[^<>]*?(?:\/>|>(?:(?!<\/a>).)*<\/a>)/ig);

gets you an array of all <a> tags in the HTML source (even self-closed tags which are illegal but which you specifically asked for). Is that sufficient?

Explanation:

<a         # Match <a
[^<>]*?    # Match any characters besides angle brackets, as few as possible
(?:        # Now either match
 />        # /> (self-closed tag)
|          # or 
 >         # a closing angle bracket
 (?:       # followed by...
  (?!</a>) # (if we're not at the closing tag)
  .        # any character
 )*        # any number of times
 </a>      # until the closing tag
)

the regex will look something like this

/\<\a.*[\/]{0,1}>(.*<\/\a>){0,1}/gm

继续阅读：regex string

How to strip EVERYTHING from a html string including texts but leave all <a> tags and their data intact using regex?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？