开发者

Regex for finding elements without a certain attribute (e.g., "id")

I'm scrubbing through a large number of XML based files in a JSF project, and would like to find certain components that are missing an ID attribute. For example, let's say I want to find all of the <h:inputText /> elements that do not have an id-attribute specified.

I've tried the following in RAD (Eclipse), but something's not quite right because I still get some components that do have a valid ID.

<([hf]|ig):(?!output)\w+\s+(?!\bid\b)[^>]*?\s+(?!\bid\b)[^>]*?>

Not sure if my negative-lookahead is correct or not?

The desired result would be that I would find the following (or similar) in any JSP in the project:

<h:inputText value="test" />

... but not:

<h:inputText id="good_id" value="test" />

I'm just using <h:inputText/> as 开发者_运维技巧an example. I was trying to be broader than that, but definitely excluding <h:outputText/>.


Disclaimer:

As others correctly point out, it is best to use a dedicated parser when working with non-regular markup languages such as XML/HTML. There are many ways for a regex solution to fail with either false positives or missed matches.

That said...

This particular problem is a one-shot editing problem and the target text (an open tag) is not a nested structure. Although there are ways for the following regex solution to fail, it should still do a pretty good job.

I don't know Eclipse's regex syntax, but if it provides negative lookahead, the following is a regex solution that will match a list of specific target elements which do not have an ID attribute: (First, presented in PHP/PCRE free-spacing mode commented syntax for readability)

$re_open_tags_with_no_id_attrib = '%
    # Match specific element open tags having no "id" attribute.
    <                    # Literal "<" start of open tag.
    (?:                  # Group of target element names.
      h:inputText        # Either h:inputText element,
    | h:otherTag         # or h:otherTag element,
    | h:anotherTag       # or h:anotherTag element.
    )                    # End group of target element names.
    (?:                  # Zero or more open tag attributes.
      \s+                # Whitespace required before each attribute.
      (?!id\b)           # Assert this attribute not named "id".
      [\w\-.:]+          # Non-"id" attribute name.
      (?:                # Group for optional attribute value.
        \s*=\s*          # Value separated by =, optional ws.
        (?:              # Group of attrib value alternatives.
          "[^"]*"        # Either double quoted value,
        | \'[^\']*\'     # or single quoted value,
        | [\w\-.:]+      # or unquoted value.
        )                # End group of value alternatives.
      )?                 # Attribute value is optional.
    )*                   # Zero or more open tag attributes.
    \s*                  # Optional whitespace before close.
    /?                   # Optional empty tag slash before >.
    >                    # Literal ">" end of open tag.
    %x';

And here is the same regex in bare-bones native format which may be suitable for copy and paste into an Eclipse search box:

<(?:h:inputText|h:otherTag|h:anotherTag)(?:\s+(?!id\b)[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>

Note the group of target element names to be matched at the beginning of this expression. You can add or subtract desired target elements to this ORed list. Note also that this expression is designed to work pretty well for HTML as well as XML (which may have value-less attributes, unquoted attribute values and quoted attribute values containing <> angle brackets).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜