Checking if a string is valid tag/attribute name for a XML document
Scenario
I need to write a validation function that validates XML tag names (or attribute names) .
Eg.:
"div"
is valid"d<iv"
is not valid"d\iv"
is not valid
If a string is not valid i should escape that makes it invalid, and replace them with some arbitrary character (or remove it) .
Eg.:
"d<iv"
is not valid -> I replace it with"div"
.
Those functions will be heavily called - so I need to take in consideration code effectiveness.
My problem(s)
- What are the rules that describe a valid XML tag/attribute name ? Is it safe to consider a valid XML tag/attribute to be described by the same rules as java variable name ? Or are开发者_JAVA技巧 those rules too restrictive ?
- Should I use the java regex package or I should write my own specialized method ? (As I said speed is important) .
- Do you have any suggestions ?
Thank you!
The rules are defined in the xml spec (look at the name definition)
If speed matters, then don't use regular expressions. Do it more like this:
public static String correctName(String name) {
StringBuilder nameBuilder = new StringBuilder();
for (char nameChar:name.charArray())
if (isValidXml(nameChar)) // some magic left to do ;)
nameBuilder.append(nameChar);
return nameBuilder.toString();
}
Note - the code above is a simple guideline, it does not cover the little annoyance, that the first char of an xml name has a different value range ... if you want to correct illegal tags like $%&div
then it's a bit more complicated (more magic needed)
精彩评论