regex splitting tags in the string
I have following regex (<.*?>.*?</.*?>|[\w[-]]+)\p{Punct}*
which works perfectly for most string with tags but if a tag is not preceded by space then it开发者_StackOverflow中文版 breaks the tag while finding a match.
Please help me in modifying this regex such that it doesn't break tags. All I am looking is to split on spaces but not if space is within a tag.
For Example:
BIRD-<abc attr="co_1">ab</span> @apos;<abc attr="co_12">cd</span>FEE DEF
should split into:
BIRD-<abc attr="co_1">ab</span> @apos;<abc attr="co_12">cd</span>FEE DEF
I am currently using a matcher to match this pattern and get the tokens
Matcher matcher = REGEX.matcher(newString);
while (matcher.find())
{
token = matcher.group();
}
Try this :
.*?<.*?>.*?</.*?>[^\s]*
It will produce the result you expect.
I would be wary of performing that type of parsing using regex. The pattern you are suggesting, as well as various adaptations of it may start behaving weirdly if attributes contain the > and/or < characters. The following example would throw your pattern off, for example.
<element attr="></>">value</element>
Any time you need to parse or process an XML file, I would advice you to consider using a proper XML parser. Please see this answer for a longer explanation.
精彩评论