开发者

Regular Expression for Replacing Content Not Inside HTML Tags

I've got a function that helps to interlink pages within my site by scanning blog entries, news, and other items for certain core keywords. It then replaces those keywords with a link to the corresponding page.

I'm running into a problem where some words that should not be replaced with links are. For example, I have a summary tag in a few of my HTML tables that contains a small summary of the table content. So for example, I might have a tag that looks like this:

<table width="500" cellspacing="0" cellpadding="4" border="0" summary="This table contains a list of all car parts in inventory along with their corresponding prices">
...
</table>

My function incorrectly replaces a keyword or phrase like "car parts" with a link. How can I structure my replacement regular expression to NOT replace it in cases like this, but DO replace it should it appear within a paragraph or even within a cell in an HTML table.

Thanks in advance for any help and guidance!

EDIT: Just to clarify, I'm using PHP to render my pages. I'm using a str_replace() before the content is output as HTML开发者_运维技巧 to the page. I want to be able to replace that with an ereg_replace() so that I replace the content only if it meets certain conditions (i.e. as explained above). Sorry if this caused any confusion!


Don't use regexes to parse HTML. Use the PHP DOM:

$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your HTML

//get all tds
$cells = $DOM->getElementsByTagName('td');

// Do stuff to the cells

//get all paragraphs
$paragraphs = $DOM->getElementsByTagName('p');

// Do stuff to the paragraphs

// Etc...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜