开发者

Using XPath to extract multiple relative nodes

The XML shown is a simplified version of what I'm working with. I'm using PHP, and DOMDocument and DOMXPath.

I have a number of similar nodes that are adjacent to each other, but have slightly different children. Given that I can locate one of these nodes, based on the content of the children, how can I use XPath to also grab the preceding node, the originally selected node, the following node, and the following node two positions later.

Here's the sample XML:

<w:p>        
    <w:r>
        <w:rPr>...</w:rPr>
        <w:t>Text</w:t>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="begin" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:instrText> MERGEFIELD  [PatName]  \* MERGEFORMAT  </w:instrText>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="separate" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:t>[PatName]</w:t>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="end" />
    </w:r>
</w:p>

The starting node to work with is the w:instrText node, with XPath looking like:

//w:r[contains(w:instrText,'MERGFIELD [PatFirstName]')].

Then I can use the preceding-sibling axis to locate the previous item. The XPath looks like:

//w:r[contains(w:instrText,'MERGFIELD [PatFirstName]')]/preceding-sibling::w:r[1].

Then I'd like to grab the original w:r containing w:instrText, and the two remaining w:r nodes containing w:fldChar, leaving the w:t node out of the selection. But my attempts to write XPath for this become unravelled:

//w:r[contains(w:instrText,'MERGEFIELD  [PatFirstName]')]/preceding-sibling::w:r[1]/following-sibling::w:r[1 and 2] 

grabs too many nodes, probably because the original contains condition does not apply to the Following-sibling conditions).

Ultimately, the following entries 开发者_如何学Cwould be extracted from that snippet.

    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="begin" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:instrText> MERGEFIELD  [PatName]  \* MERGEFORMAT  </w:instrText>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="separate" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="end" />
    </w:r>

It's important that relative nodes are used for the search, since there may be other similar looking node combinations in the XML.

Some of you may recognise this XML as the Word 2003 XML format for a mergefield, with much of the cruft removed. I'm trying to isolate the w:r node containing the w:t, so I can update that, and delete the surrounding nodes used to identify it as a mergefield.


I've come to the conclusion that what I'm asking is too ambitious for XPath alone. follow-sibling and preceding-sibling axes are 1 or all deals (unless someone can show me otherwise).

I've ended up using XPath to get the w:t node I'm interested in replacing, based on the MERGEFIELD, and then I walk the DOM, using DOMDocument in PHP to remove the other nodes.

Here's the XPATH I ended up using, expressed as an assignment to a variable in PHP.

$query = '//w:r[preceding-sibling::w:r[2][contains(w:instrText,\'MERGEFIELD  '.$mergeField.'\')]]/w:t';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜