how to remove empty tags in input xml
My java module gets a huge input xml from a mainframe. Unfortunately, the mainframe is unable to skip optional elements, with the result that I get a LOT of empty tags in my input :
So,
<SSN>111111开发者_如何学C111</SSN>
<Employment>
<Current>
<Address>
<line1/>
<line2/>
<line3/>
<city/>
<state/>
<country/>
</Address>
<Phone>
<phonenumber/>
<countryCode/>
</Phone>
</Current>
<Previous>
<Address>
<line1/>
<line2/>
<line3/>
<city/>
<state/>
<country/>
</Address>
<Phone>
<phonenumber/>
<countryCode/>
</Phone>
</Previous>
</Employment>
<MaritalStatus>Single</MaritalStatus>
should be:
<SSN>111111111</SSN>
<MaritalStatus>SINGLE</MaritalStatus>
I use jaxb to unmarshall the input xml string that the mainframe sends it. Is there a clean/ easy way to remove all the empty group tags, or do I have to do this manuall in the code for each element. I have over 350 elements in my input xml, so I would love to it if jaxb itself had a way of doing this automatically?
Thanks, SGB
You could preprocess using XSLT. I know it's considered a bit "Disco" nowadays, but it is fast and easy to apply.
From this tek-tips discussion, you could transform with XSLT to remove empty elements.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:if test=". != '' or ./@* != ''">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
I think you'd have to edit your mainframe code for the best solution. When your mainframe generates the XML, you'll have to tell it not to output a tag if it's empty.
There's not much you can do on the client side I don't think. If the XML that you get is filled with empty tags, then you have no choice but to parse them all--after all, how can you tell if a tag is empty without parsing it in some way!
But maybe you could do a regex string replace on the XML text before JAX-B gets to it:
String xml = //get the XML
xml = xml.replaceAll("<.*?/>", "");
This will remove empty tags like "<city/>" but not "<Address></Address>".
The only technique I'm aware of in JAXB to do this is by writing a custom XmlAdapter
which collapses your empty strings to nulls.
The downside is that you'd have to add this as an annotation to every single element in your code, and if you have 350 of them, that's going to be tedious.
Ok, accasionally stepped in here. Simple working solution with jaxb (at least for jdk 1.6.x):
set the unwanted Attribute or Element null! e.g. ...setEmployment(null); then the whole Employment structure is gone.
Cheers Masi
public static void main(String[] args) {
final String regex1 = "<([a-zA-Z][a-zA-Z0-9]*)[^>]*/>";
final String regex2 = "<([a-zA-Z][a-zA-Z0-9]*)[^>]*>\\s*</\\1>";
String xmlString = "<SSN>111111111</SSN><Employment><Current><Address><line1/><line2/><line3/><city/><state/><country/></Address><Phone><phonenumber/><countryCode/></Phone></Current><Previous><Address><line1/><line2/><line3/><city/><state/><country/> </Address><Phone><phonenumber/><countryCode/></Phone></Previous></Employment><MaritalStatus>Single</MaritalStatus>";
System.out.println(xmlString);
final Pattern pattern1 = Pattern.compile(regex1);
final Pattern pattern2 = Pattern.compile(regex2);
Matcher matcher1;
Matcher matcher2;
do {
matcher1 = pattern1.matcher(xmlString);
matcher2 = pattern2.matcher(xmlString);
xmlString = xmlString.replaceAll(regex1, "").replaceAll(regex2, "");
} while (matcher1.find() || matcher2.find());
System.out.println(xmlString);
}
Console:
<SSN>111111111</SSN>
<Employment>
<Current>
<Address>
<line1/>
<line2/>
<line3/>
<city/>
<state/>
<country/>
</Address>
<Phone>
<phonenumber/>
<countryCode/>
</Phone>
</Current>
<Previous>
<Address>
<line1/>
<line2/>
<line3/>
<city/>
<state/>
<country/>
</Address>
<Phone>
<phonenumber/>
<countryCode/>
</Phone>
</Previous>
</Employment>
<MaritalStatus>Single</MaritalStatus>
<SSN>111111111</SSN>
<MaritalStatus>Single</MaritalStatus>
Online demo here
精彩评论