Match only backticks not inside a <code> block with Regex
First things first. I know how to parse XML/HTML with simplexml, and I know all the arguments against using RegEx to parse it. This question is for the sake of knowledge.
What needs to happen
In a block of text let's say we have the following line of text:
The query you need to use is
<code>SELECT `post_name` FROM table WHERE id= $id</code>
where `$id` is the `user_ID` we got earlier.
How do you match the following:
`$id`
`user_ID`
without also matching
`post_name`?
Requirements
This needs to be a regex-only solution. I understand开发者_Python百科 and know how to use things like preg_replace_callback etc to remove <code> blocks from the string first, but I'm looking for a regex-only solution. Also, it needs to be able to handle possible attributes like<code lang="php">.
The regex needs to match pairs of backticks that are not between <code> and </code> and the matches may not contain either <code> or </code> to handle lone backticks in other contexts.
The content in the backticks will never be multi-lined.
Reasoning
I'm working on a personal project where this was a possible edge-case. This is not a mark-down type project where it is possible to change the order of the calls. The <code> tags are in the source text and not going anywhere.
Also, part of the reason I don't want "use simpleXML" answers is because the backticks are not inside actual <code> blocks. It is just a handy way to explain the problem and the solution for <code> blocks will work with slight changes.
I don't think regular expressions are a good tool for this, but it can be done if you assume that the code tags aren't nested:
`(?:(?!</?code>)[^`])*`(?!(?:(?!<code>).)*</code>)
This means:
`(?:(?!</?code>)[^`])*` : Match something in backticks unless it
contains <code> or </code> or a backtick...
(?!(?:(?!<code>).)*</code>) : unless it is followed by a </code>
without a <code> first.
See the regular expression in action at rubular.
加载中,请稍侯......
精彩评论