开发者

Inserting Ending Tags For Missing Tags In html

How to insert the ending html tags where th开发者_如何学编程ere are missing ending tags ?

Like

 <tr>
 <td>Index No.</td><td>Name</td>

 <tr>
 <td>1</td><td>Harikrishna</td>

Where two missing ending tags.That is "/tr".Now in this case how to Search where are the missing tag and there how to insert appropriate ending tag such as "/tr".


This seems like a very though task to do if you want to handle all possible cases. HTML is not a regular language. IMHO you should try to solve the problem at the source which is how in the first place you got invalid HTML.


You might take a look at HTML Tidy and see if it works for what you need.


I cannot comment on the above, so I'll note it here. You can use HTML Tidy also for cleaning HTML fragments. See examples here:
http://www.php.net/manual/en/tidy.examples.basic.php

An alternative to HTML Tidy is to clean your output code with regular expressions - I provide an example below. However please note that even though this might be faster in terms of processing, it is not that universal not robust (maintenance-wise) as HTML Tidy is.

Code

<?php

$html = "
<table>
<tr class=\"lorem\">
<td>Index No.</td>
<td>Name</td>

<tr>
<td>0</td>
<td>FooBaz</td>

<tr>
<td>1</td>
<td>Harikrishna</td>

<tr class=\"ipsum\">
<td>2</td>
<td>Foo</td>
</tr>

<tr>
<td>3</td>
<td>Bar</td>


</table>
";

// regex magic
$start_cond = "<tr(?:\s[^>]*)?>";
$end_cond = "(?:{$start_cond}|<\/table>)";
$row_contents = "(?:(?!{$end_cond}).)*";

// first remove all </tr> tags
$xhtml = preg_replace( "/<\/tr>/ism", "", $html );

// now re-add </tr> tags where appropriate
$xhtml = preg_replace( "/({$start_cond})({$row_contents})/ism", "$1$2</tr>\n", $xhtml );

// ignore: just for writing comparision output
echo "<h2>Before:</h2>"; show_count( $html );
echo "<h2>After</h2>"; show_count( $xhtml );

function cmp($patt,$html) {
    $count = preg_match_all( "/{$patt}/ism", $html, $matches);
    return htmlentities("\n{$count} x {$patt}");
}
function show_count($html) {
    echo "<pre>"
        . cmp("<tr(\s[^>]*)?>",$html)
        . cmp("<\/tr>",$html)
        . "</pre>";
}
?>

Output


Before:
5 x <tr(\s[^>]*)?>
1 x <\/tr>

After
5 x <tr(\s[^>]*)?>
5 x <\/tr>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜