开发者

Java DOM Parser XML

I need to extract attribute values from <Item Name="CanonicalSmiles"> from following XML file (part is shown) ?

I tried getElementsByTagName("Item")开发者_开发知识库.item(12).getTextContent()); But for different <DocSum>s item(i) is different (ie not 12 always!)

How do I do this??

  <?xml version="1.0"?>
    <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29 October 2004//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd">
    <eSummaryResult>
    <DocSum>
        <Id>53359352</Id>
        <Item Name="CID" Type="Integer">53359352</Item>
        <Item Name="SourceNameList" Type="List"></Item>
        <Item Name="SourceIDList" Type="List"></Item>
        <Item Name="SourceCategoryList" Type="List">
            <Item Name="string" Type="String">Journal Publishers</Item>
        </Item>
        <Item Name="CreateDate" Type="Date">2011/09/19 00:00</Item>
        <Item Name="SynonymList" Type="List"></Item>
        <Item Name="MeSHHeadingList" Type="List"></Item>
        <Item Name="MeSHTermList" Type="List"></Item>
        <Item Name="PharmActionList" Type="List"></Item>
        <Item Name="CommentList" Type="List"></Item>
        <Item Name="IUPACName" Type="String">2-hydroxy-6-[2-(4-hydroxyphenyl)-2-oxoethyl]benzoic acid</Item>
        <Item Name="CanonicalSmiles" Type="String">C1=CC(=C(C(=C1)O)C(=O)O)CC(=O)C2=CC=C(C=C2)O</Item>
        <Item Name="RotatableBondCount" Type="Integer">4</Item>
        <Item Name="MolecularFormula" Type="String">C15H12O5</Item>
        <Item Name="MolecularWeight" Type="String">272.252780</Item>
        <Item Name="TotalFormalCharge" Type="Integer">0</Item>
        <Item Name="XLogP" Type="String"></Item>
        <Item Name="HydrogenBondDonorCount" Type="Integer">3</Item>
        <Item Name="HydrogenBondAcceptorCount" Type="Integer">5</Item>
        <Item Name="Complexity" Type="String">359.000000</Item>
        <Item Name="HeavyAtomCount" Type="Integer">20</Item>
        <Item Name="AtomChiralCount" Type="Integer">0</Item>
        <Item Name="AtomChiralDefCount" Type="Integer">0</Item>
        <Item Name="AtomChiralUndefCount" Type="Integer">0</Item>
        <Item Name="BondChiralCount" Type="Integer">0</Item>
        <Item Name="BondChiralDefCount" Type="Integer">0</Item>
        <Item Name="BondChiralUndefCount" Type="Integer">0</Item>
        <Item Name="IsotopeAtomCount" Type="Integer">0</Item>
        <Item Name="CovalentUnitCount" Type="Integer">1</Item>
        <Item Name="TautomerCount" Type="Integer">67</Item>
        <Item Name="SubstanceIDList" Type="List"></Item>
        <Item Name="TPSA" Type="String">94.8</Item>
        <Item Name="AssaySourceNameList" Type="List"></Item>
        <Item Name="MinAC" Type="String"></Item>
        <Item Name="MaxAC" Type="String"></Item>
        <Item Name="MinTC" Type="String"></Item>
        <Item Name="MaxTC" Type="String"></Item>
        <Item Name="ActiveAidCount" Type="Integer">0</Item>
        <Item Name="InactiveAidCount" Type="Integer">0</Item>
        <Item Name="TotalAidCount" Type="Integer">0</Item>
        <Item Name="InChIKey" Type="String">YIGHIFUVVSYMFG-UHFFFAOYSA-N</Item>
        <Item Name="InChI" Type="String">InChI=1S/C15H12O5/c16-11-6-4-9(5-7-11)13(18)8-10-2-1-3-12(17)14(10)15(19)20/h1-7,16-17H,8H2,(H,19,20)</Item>
    </DocSum>

    <DocSum>
        <Id>53346823</Id>
        <Item Name="CID" Type="Integer">53346823</Item>
        <Item Name="SourceNameList" Type="List"></Item>
        <Item Name="SourceIDList" Type="List"></Item>
        <Item Name="SourceCategoryList" Type="List">
            <Item Name="string" Type="String">Biological Properties</Item>
        </Item>
        <Item Name="CreateDate" Type="Date">2011/09/01 00:00</Item>
        <Item Name="SynonymList" Type="List">
            <Item Name="string" Type="String">HMS2478O14</Item>
        </Item>
        <Item Name="MeSHHeadingList" Type="List"></Item>
        <Item Name="MeSHTermList" Type="List"></Item>
        <Item Name="PharmActionList" Type="List"></Item>
        <Item Name="CommentList" Type="List">
            <Item Name="string" Type="String">Asinex Ltd.:BAS 02768155</Item>
        </Item>
        <Item Name="IUPACName" Type="String">ethyl 3-amino-3-(1,3-benzodioxol-5-yl)propanoate chloride</Item>
        <Item Name="CanonicalSmiles" Type="String">CCOC(=O)CC(C1=CC2=C(C=C1)OCO2)N.[Cl-]</Item>
        <Item Name="RotatableBondCount" Type="Integer">5</Item>
        <Item Name="MolecularFormula" Type="String">C12H15ClNO4-</Item>
        <Item Name="MolecularWeight" Type="String">272.704800</Item>
        <Item Name="TotalFormalCharge" Type="Integer">-1</Item>
        <Item Name="XLogP" Type="String"></Item>
        <Item Name="HydrogenBondDonorCount" Type="Integer">1</Item>
        <Item Name="HydrogenBondAcceptorCount" Type="Integer">6</Item>
        <Item Name="Complexity" Type="String">271.000000</Item>
        <Item Name="HeavyAtomCount" Type="Integer">18</Item>
        <Item Name="AtomChiralCount" Type="Integer">1</Item>
        <Item Name="AtomChiralDefCount" Type="Integer">0</Item>
        <Item Name="AtomChiralUndefCount" Type="Integer">1</Item>
        <Item Name="BondChiralCount" Type="Integer">0</Item>
        <Item Name="BondChiralDefCount" Type="Integer">0</Item>
        <Item Name="BondChiralUndefCount" Type="Integer">0</Item>
        <Item Name="IsotopeAtomCount" Type="Integer">0</Item>
        <Item Name="CovalentUnitCount" Type="Integer">2</Item>
        <Item Name="TautomerCount" Type="Integer">1</Item>
        <Item Name="SubstanceIDList" Type="List"></Item>
        <Item Name="TPSA" Type="String">70.8</Item>
        <Item Name="AssaySourceNameList" Type="List"></Item>
        <Item Name="MinAC" Type="String"></Item>
        <Item Name="MaxAC" Type="String"></Item>
        <Item Name="MinTC" Type="String"></Item>
        <Item Name="MaxTC" Type="String"></Item>
        <Item Name="ActiveAidCount" Type="Integer">0</Item>
        <Item Name="InactiveAidCount" Type="Integer">0</Item>
        <Item Name="TotalAidCount" Type="Integer">0</Item>
        <Item Name="InChIKey" Type="String">NKQHQIJWIYNEIX-UHFFFAOYSA-M</Item>
        <Item Name="InChI" Type="String">InChI=1S/C12H15NO4.ClH/c1-2-15-12(14)6-9(13)8-3-4-10-11(5-8)17-7-16-10;/h3-5,9H,2,6-7,13H2,1H3;1H/p-1</Item>
    </DocSum>


For what you're doing, XPath is likely easier than DOM. See this Java XPath tutorial.


    XPathFactory xpf = XPathFactory.newInstance();
    XPath xp = xpf.newXPath();
    XPathExpression xe = xp.compile("//DocSum/Item[@Name='CanonicalSmiles']/text()");
    NodeList nodes = (NodeList)xe.evaluate(yourdom, XPathConstants.NODESET);


As others have pointed out, XPath is the standard way to go. If you're using a tool like jOOX, writing XPath is even simpler:

String text = $(document).xpath("//DocSum/Item[@Name='CanonicalSmiles']").text();

With jOOX, you don't need to use XPath, however. You could also use jOOX's jQuery-like API directly, for instance using filters:

String text = $(document).find("Item")
                         .filter(attr("Name", "CanonicalSmiles"))
                         .text();

Or by using CSS-style selectors:

String text = $(document).find("Item[Name='CanonicalSmiles']").text();


As I see, the problem of parser each time reading XML elements in different order remained still unanswered.

XML has not any order of elements. You can't wait that the element read as num. 12 today will be num. 12 tomorrow. The only way to number your elements is go give them numbers explicitely.

<Item Name="TotalFormalCharge" Type="Integer">-1</Item>

will become:

<Item Name="TotalFormalCharge" Num=6 Type="Integer">-1</Item>

And you can get it by the attribute value.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜