How to find a matching closing tag in html string?
Imagine the following HTML:
<div>
<b></b>
<div>
<tabl开发者_如何学运维e>...</table>
</div>
</div> <!-- this one -->
...
How could I find the matching closing tag for the first opening div tag? Is there a reg ex that could find it? I guess this is quite a common requirement but I'm struggling to find anything straightforward, just full blown HTML parsers.
No.
Use a full blown HTML parser. There's a reason they exist.
Use Html Agility Pack.
I'm assuming that you have tokeinized the html tags... Now create a stack and every time you see an opening tag push and everytime you see a closing tag pop... and see if the ones you pop macth the closing tag...
But there are already HTML parsers for this so search for one on codeplex.
Well, You need to have a 'clear' view of the syntax ! However, regexp are very limited in scope and I would'nt recommand using it for multi-line/tag syntax.
You rather need to track each tag (open/close) and use a 'handler' to deal with your request. You could use some Lex/Yacc tools but this may be overkilling. Depending on the language you use, you may already have modules for this purpose (like HTMLParser in Python).
There's always LinqToXml if you want to parse HTML and don't need every little detail.
精彩评论