开发者

Java : Convert formatted xml file to one line string

I have a formatted XML file, and I want to convert it to one line string, how can I do that.

Sample xml:

<?xml version="1.0" encoding="UTF-8"?>
<books>
   <book>
       <title>Basic XML</title>
       <price>100</price>
       <qty>5</qty>
   </book>
   <book>
     <title>Basic Java</title>
    开发者_JAVA技巧 <price>200</price>
     <qty>15</qty>
   </book>
</books>

Expected output

<?xml version="1.0" encoding="UTF-8"?><books><book> <title>Basic XML</title><price>100</price><qty>5</qty></book><book><title>Basic Java</title><price>200</price><qty>15</qty></book></books>


//filename is filepath string
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
StringBuilder sb = new StringBuilder();

while((line=br.readLine())!= null){
    sb.append(line.trim());
}

using StringBuilder is more efficient then concat http://kaioa.com/node/59


Run it through an XSLT identity transform with <xsl:output indent="no"> and <xsl:strip-space elements="*"/>

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="no" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

It will remove any of the non-significant whitespace and produce the expected output that you posted.


// 1. Read xml from file to StringBuilder (StringBuffer)
// 2. call s = stringBuffer.toString()
// 3. remove all "\n" and "\t": 
s.replaceAll("\n",""); 
s.replaceAll("\t","");

edited:

I made a small mistake, it is better to use StringBuilder in your case (I suppose you don't need thread-safe StringBuffer)


In java 1.8 and above

BufferedReader br = new BufferedReader(new FileReader(filePath));
String content = br.lines().collect(Collectors.joining("\n"));


Open and read the file.

Reader r = new BufferedReader(filename);
String ret = "";
while((String s = r.nextLine()!=null)) 
{
  ret+=s;
}
return ret;


Using this answer which provides the code to use Dom4j to do pretty-printing, change the line that sets the output format from: createPrettyPrint() to: createCompactFormat()

public String unPrettyPrint(final String xml){  

    if (StringUtils.isBlank(xml)) {
        throw new RuntimeException("xml was null or blank in unPrettyPrint()");
    }

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createCompactFormat();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error un-pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}


Underscore-java library has static method U.formatXml(xmlstring). Live example

import com.github.underscore.U;
import com.github.underscore.Xml;

public class MyClass {
    public static void main(String[] args) {
        System.out.println(U.formatXml("<a>\n  <b></b>\n  <b></b>\n</a>",
        Xml.XmlStringBuilder.Step.COMPACT));
    }
}

// output: <a><b></b><b></b></a>


I guess you want to read in, ignore the white space, and write it out again. Most XML packages have an option to ignore white space. For example, the DocumentBuilderFactory has setIgnoringElementContentWhitespace for this purpose.

Similarly if you are generating the XML by marshaling an object then JAXB has JAXB_FORMATTED_OUTPUT


The above solutions work if you are compressing all white space in the XML document. Other quick options are JDOM (using Format.getCompactFormat()) and dom4j (using OutputFormat.createCompactFormat()) when outputting the XML document.

However, I had a unique requirement to preserve the white space contained within the element's text value and these solutions did not work as I needed. All I needed was to remove the 'pretty-print' formatting added to the XML document.

The solution that I came up with can be explained in the following 3-step/regex process ... for the sake of understanding the algorithm for the solution.

String regex, updatedXml;

// 1. remove all white space preceding a begin element tag:
regex = "[\\n\\s]+(\\<[^/])";
updatedXml = originalXmlStr.replaceAll( regex, "$1" );

// 2. remove all white space following an end element tag:
regex = "(\\</[a-zA-Z0-9-_\\.:]+\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );

// 3. remove all white space following an empty element tag
// (<some-element xmlns:attr1="some-value".... />):
regex = "(/\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );

NOTE: The pseudo-code is in Java ... the '$1' is the replacement string which is the 1st capture group.

This will simply remove the white space used when adding the 'pretty-print' format to an XML document, yet preserve all other white space when it is part of the element text value.


Below I present the prepared solution. Only the standard library of Java 1.8 was used.

XSLT:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="no"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Java:

public static String convertXmlToOneLine(String xml) throws TransformerException {
    final String xslt =
        "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" +
        "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">\n" +
        "    <xsl:output indent=\"no\"/>\n" +
        "    <xsl:strip-space elements=\"*\"/>\n" +
        "    <xsl:template match=\"@*|node()\">\n" +
        "        <xsl:copy>\n" +
        "            <xsl:apply-templates select=\"@*|node()\"/>\n" +
        "        </xsl:copy>\n" +
        "    </xsl:template>\n" +
        "</xsl:stylesheet>";

    /* prepare XSLT transformer from String */
    Source xsltSource = new StreamSource(new StringReader(xslt));
    TransformerFactory factory = TransformerFactory.newInstance();
    Transformer transformer = factory.newTransformer(xsltSource);

    /* where to read the XML? */
    Source source = new StreamSource(new StringReader(xml));

    /* where to write the XML? */
    StringWriter stringWriter = new StringWriter();
    Result result = new StreamResult(stringWriter);

    /* transform XML to one line */
    transformer.transform(source, result);

    return stringWriter.toString();
}

Sample output:

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:output indent="no"/><xsl:strip-space elements="*"/><xsl:template match="@*|node()"><xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy></xsl:template></xsl:stylesheet>

License: The MIT License


FileUtils.readFileToString(fileName);

link

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜