开发者

Talend tExtractXMLField

I have this job in Talend that is supposed to retrieve a field and loop through it.

My big problem is that the code is looping through the XML fields but it's returning null. Here is a sample of the XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
    <empresa>
        <imoveis>
            <imovel>
                [-- some fields --  ]

                <fotos>
                    <nome id="" order="">photo1</nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                </fotos>
            </imovel>
            [ -- other entries here -- ]
        </imoveis>
    </empresa>
</empresas>

Now using the tExtractXMLField component I am trying to get the "fotos" element. Here is what I have in the component:

Talend tExtractXMLField

I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.

Here is an image of the job:

Talend tExtractXMLField

You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must 开发者_运维技巧be something wrong with the XPath but I can't seem to find the problem :(

Hope someone can help me out. Thanks Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit


If you want to loop on <nome> nodes your Loop XPath Query has to be

"/empresas/empresa/imoveis/imovel/fotos/nome"

and foto_nome XPath Query something like

"text()"

Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").


There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.

The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".

Your extractXMLField component looks to be well configured. Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.


Talend tExtractXMLField


I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.


Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.

Also make sure that the encoding is set correctly in the tFileInputXML.


I think you are confusing reading XML and extracting XML from XML.

Reading XML: If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:

  • set the xpath loop to the <nome> elements, like this "//nome"
  • add 3 columns in the tFileInputXML component id, order and content
  • get content column with xpath query "."
  • get id value with xpath query "@id"
  • get order value with xpath query "@order"

Talend tExtractXMLField

Extracting XML from XML: That is the goal of the tExtractXMLField component: It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.

To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML. It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:

<arg2> 
  <![CDATA[
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <exportInscriptionEnLigneType>
      <date>2015-04-10</date>
      <nbDossiers>2</nbDossiers>
      <reference>20150410100</reference>
      <listeDossiers>
        <dossier>
          <numOrdre>1</numOrdre>
          <identifiantDossier>AAAAA</identifiantDossier>
        </dossier>
        <dossier>
          <numOrdre>2</numOrdre>
          <identifiantDossier>BBBBB</identifiantDossier>
        </dossier>
      </listeDossiers>
    </exportInscriptionEnLigneType>
]]>
</arg2> 

In XML above, arg2>element contains an XML document that you may need to parse.

tExtractXMLField has been created for this purpose. I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.

Hope it will help.

Best regards,

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜