MySQL: find all records with empty anchor tags in a field (regex) and remove them
I have a table which has a column containing html code. I need to find out all records that have empty anchor tags in them, ignoring child tags. Afterwards I want to remove all those empty tags.
To clarify things - these anchors should be found, for example:
<a href="http://www.example.com" target="_blank"></a>
<a href="http://www.example.com"> </a>
<a href="http://www.example.com"><span><strong></strong></span></a>
<a href="http://www.example.com"><span></span></a>
Whereas these should not be found:
<a href="http://www.example.com">something&开发者_C百科lt;/a>
<a href="http://www.example.com"><span>some text</span></a>
I managed to get all empty anchors (like in the first example) with this query:
select * from table_a where html regexp '<a.*href=".*".*></a>'
But this does not find the 2nd and 3rd example. Can someone help?
UPDATE: I want to remove those empty tags from the database. I fiddled around with MySQLs UpdateXML but have not found a solution yet.
I think the best way is to use xml functions in order to extract needed text nodes from xml document.
update table_a
set html = UpdateXML(html, '//a', '')
where ExtractValue(html, '//a//text()') regexp '^[[:space:]]*$'
Note: if the table has a field with several a
tags where some of them are empty and others are non-empty then this field will not be updated.
an extra .* at the end:
<a.*href=".*".*>.*</a>
精彩评论