Regular Expression for Replacing Content Not Inside HTML Tags
I've got a function that helps to interlink pages within my site by scanning blog entries, news, and other items for certain core keywords. It then replaces those keywords with a link to the corresponding page.
I'm running into a problem where some words that should not be replaced with links are. For example, I have a summary tag in a few of my HTML tables that contains a small summary of the table content. So for example, I might have a tag that looks like this:
<table width="500" cellspacing="0" cellpadding="4" border="0" summary="This table contains a list of all car parts in inventory along with their corresponding prices">
...
</table>
My function incorrectly replaces a keyword or phrase like "car parts" with a link. How can I structure my replacement regular expression to NOT replace it in cases like this, but DO replace it should it appear within a paragraph or even within a cell in an HTML table.
Thanks in advance for any help and guidance!
EDIT: Just to clarify, I'm using PHP to render my pages. I'm using a str_replace() before the content is output as HTML开发者_运维技巧 to the page. I want to be able to replace that with an ereg_replace() so that I replace the content only if it meets certain conditions (i.e. as explained above). Sorry if this caused any confusion!
Don't use regexes to parse HTML. Use the PHP DOM:
$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your HTML
//get all tds
$cells = $DOM->getElementsByTagName('td');
// Do stuff to the cells
//get all paragraphs
$paragraphs = $DOM->getElementsByTagName('p');
// Do stuff to the paragraphs
// Etc...
精彩评论