开发者

Is there a stylesheet or Windows commandline tool for controllable XML formatting, specifically putting attributes one-per-line?

I am searching for an XSLT or command-line tool (or C# code that can be made into a command-line tool, etc) for Windows that will do XML pretty-printing. Specifically, I want one that has the ability to put attributes one-to-a-line, something like:

<Node>
   <ChildNode 
      value1='5'
      value2='6'
      value3='happy' />
</Node>

It doesn't have to be EXACTLY like that, but I want to use it for an XML file that has nodes 开发者_运维百科with dozens of attributes and spreading them across multiple lines makes them easier to read, edit, and text-diff.

NOTE: I think my preferred solution is an XSLT sheet I can pass through a C# method, though a Windows command-line tool is good too.


Here's a PowerShell script to do it. It takes the following input:

<?xml version="1.0" encoding="utf-8"?>
<Node>
    <ChildNode value1="5" value2="6" value3="happy" />
</Node>

...and produces this as output:

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Here you go:

param(
    [string] $inputFile = $(throw "Please enter an input file name"),
    [string] $outputFile = $(throw "Please supply an output file name")
)

$data = [xml](Get-Content $inputFile)

$xws = new-object System.Xml.XmlWriterSettings
$xws.Indent = $true
$xws.IndentChars = "  "
$xws.NewLineOnAttributes = $true

$data.Save([Xml.XmlWriter]::Create($outputFile, $xws))

Take that script, save it as C:\formatxml.ps1. Then, from a PowerShell prompt type the following:

C:\formatxml.ps1 C:\Path\To\UglyFile.xml C:\Path\To\NeatAndTidyFile.xml

This script is basically just using the .NET framework so you could very easily migrate this into a C# application.

NOTE: If you have not run scripts from PowerShell before, you will have to execute the following command at an elevated PowerShell prompt before you will be able to execute the script:

Set-ExecutionPolicy RemoteSigned

You only have to do this one time though.

I hope that's useful to you.


Here's a small C# sample, which can be used directly by your code, or built into an exe and called at the comand-line as "myexe from.xml to.xml":

    using System.Xml;

    static void Main(string[] args)
    {
        XmlWriterSettings settings = new XmlWriterSettings {
            NewLineHandling = NewLineHandling.Entitize,
            NewLineOnAttributes = true, Indent = true, IndentChars = "  ",
            NewLineChars = Environment.NewLine
        };

        using (XmlReader reader = XmlReader.Create(args[0]))
        using (XmlWriter writer = XmlWriter.Create(args[1], settings)) {
            writer.WriteNode(reader, false);
            writer.Close();
        }
    }

Sample input:

<Node><ChildNode value1='5' value2='6' value3='happy' /></Node>

Sample output (note you can remove the <?xml ... with settings.OmitXmlDeclaration):

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Note that if you want a string rather than write to a file, just swap with StringBuilder:

StringBuilder sb = new StringBuilder();
using (XmlReader reader = XmlReader.Create(new StringReader(oldXml)))
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    writer.WriteNode(reader, false);
    writer.Close();
}
string newXml = sb.ToString();


Try Tidy over on SourceForge. Although its often used on [X]HTML, I've used it successfully on XML before - just make sure you use the -xml option.

http://tidy.sourceforge.net/#docs

Tidy reads HTML, XHTML and XML files and writes cleaned up markup. ... For generic XML files, Tidy is limited to correcting basic well-formedness errors and pretty printing.

People have ported to several platforms and it available as an executable and callable library.

Tidy has a heap of options including:

http://api.html-tidy.org/tidy/quickref_5.0.0.html#indent

indent-attributes
Top Type: Boolean
Default: no Example: y/n, yes/no, t/f, true/false, 1/0
This option specifies if Tidy should begin each attribute on a new line.

One caveat:

Limited support for XML

XML processors compliant with W3C's XML 1.0 recommendation are very picky about which files they will accept. Tidy can help you to fix errors that cause your XML files to be rejected. Tidy doesn't yet recognize all XML features though, e.g. it doesn't understand CDATA sections or DTD subsets.

But I suspect unless your XML is really advanced, the tool should work fine.


There is a tool, that can split attributes to one per line: xmlpp. It's a perl script, so you'll have to install perl. Usage:

perl xmlpp.pl -t input.xml

You can also determine the ordering of attributes by creating a file called attributeOrdering.txt, and calling perl xmlpp.pl -s -t input.xml . For more options, use perl xmlpp.pl -h

I hope, it doesn't have too many bugs, but it has worked for me so far.


XML Notepad 2007 can do so manually ... let me see if it can be scripted.

Nope ... it can launch it like so:

XmlNotepad.exe a.xml

The rest is just clicking the save button. Power Shell, other tools can automate that.


Just use this xslt:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">
</xsl:text>
  </xsl:template>

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
    <xsl:copy />
  </xsl:template>

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>
  </xsl:template>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
      <xsl:choose>
       <xsl:when test="count(child::*) > 0">
        <xsl:copy>
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         </xsl:apply-templates>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        </xsl:copy>
       </xsl:when>       
       <xsl:otherwise>
        <xsl:copy-of select="."/>
       </xsl:otherwise>
     </xsl:choose>
  </xsl:template>    
</xsl:stylesheet>

Or, as another option, here is a perl script: http://software.decisionsoft.com/index.html


You can implement a simple SAX application that will copy everything as is and indent attributes how you like.

UPD:

SAX stands for Simple API for XML. It is a push model of XML parsing (a classical example of Builder design pattern). The API is present in most of the current development platforms (though native .Net class library lacks one, having XMLReader intead)

Here is a raw implementation in python, it is rather cryptic but you can realize the main idea.

from sys import stdout
from xml.sax import parse
from xml.sax.handler import ContentHandler
from xml.sax.saxutils import escape

class MyHandler(ContentHandler):

    def __init__(self, file_, encoding):
        self.level = 0
        self.elem_indent = '    '

        # should the next block make a line break
        self._allow_N = False
        # whether the opening tag was closed with > (to allow />)
        self._tag_open = False

        self._file = file_
        self._encoding = encoding

    def _write(self, string_):
        self._file.write(string_.encode(self._encoding))

    def startElement(self, name, attrs):
        if self._tag_open:
            self._write('>')
            self._tag_open = False

        if self._allow_N:
            self._write('\n')
            indent = self.elem_indent * self.level
        else:
            indent = ''
        self._write('%s<%s' % (indent, name))

        # attr indent equals to the element indent plus '  '
        attr_indent = self.elem_indent * self.level + '  '
        for name in attrs.getNames():
            # write indented attribute one per line
            self._write('\n%s%s="%s"' % (attr_indent, name, escape(attrs.getValue(name))))

        self._tag_open = True

        self.level += 1
        self._allow_N = True

    def endElement(self, name):
        self.level -= 1
        if self._tag_open:
            self._write(' />')
            self._tag_open = False
            return

        if self._allow_N:
            self._write('\n')
            indent = self.elem_indent * self.level
        else:
            indent = ''
        self._write('%s</%s>' % (indent, name))
        self._allow_N = True

    def characters(self, content):
        if self._tag_open:
            self._write('>')
            self._tag_open = False

        if content.strip():
            self._allow_N = False
            self._write(escape(content))
        else:
            self._allow_N = True


if __name__ == '__main__':
    parser = parse('test.xsl', MyHandler(stdout, stdout.encoding))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜