Using XPath to extract multiple relative nodes

2023-03-21 00:36 问答作者：

The XML shown is a simplified version of what I'm working with. I'm using PHP, and DOMDocument and DOMXPath.

I have a number of similar nodes that are adjacent to each other, but have slightly different children. Given that I can locate one of these nodes, based on the content of the children, how can I use XPath to also grab the preceding node, the originally selected node, the following node, and the following node two positions later.

Here's the sample XML:

<w:p>        
    <w:r>
        <w:rPr>...</w:rPr>
        <w:t>Text</w:t>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="begin" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:instrText> MERGEFIELD  [PatName]  \* MERGEFORMAT  </w:instrText>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="separate" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:t>[PatName]</w:t>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="end" />
    </w:r>
</w:p>

The starting node to work with is the w:instrText node, with XPath looking like:

//w:r[contains(w:instrText,'MERGFIELD [PatFirstName]')].

Then I can use the preceding-sibling axis to locate the previous item. The XPath looks like:

//w:r[contains(w:instrText,'MERGFIELD [PatFirstName]')]/preceding-sibling::w:r[1].

Then I'd like to grab the original w:r containing w:instrText, and the two remaining w:r nodes containing w:fldChar, leaving the w:t node out of the selection. But my attempts to write XPath for this become unravelled:

//w:r[contains(w:instrText,'MERGEFIELD  [PatFirstName]')]/preceding-sibling::w:r[1]/following-sibling::w:r[1 and 2]

grabs too many nodes, probably because the original contains condition does not apply to the Following-sibling conditions).

Ultimately, the following entries 开发者_如何学Cwould be extracted from that snippet.

    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="begin" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:instrText> MERGEFIELD  [PatName]  \* MERGEFORMAT  </w:instrText>
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="separate" />
    </w:r>
    <w:r>
        <w:rPr>...</w:rPr>
        <w:fldChar w:fldCharType="end" />
    </w:r>

It's important that relative nodes are used for the search, since there may be other similar looking node combinations in the XML.

Some of you may recognise this XML as the Word 2003 XML format for a mergefield, with much of the cruft removed. I'm trying to isolate the w:r node containing the w:t, so I can update that, and delete the surrounding nodes used to identify it as a mergefield.

I've come to the conclusion that what I'm asking is too ambitious for XPath alone. follow-sibling and preceding-sibling axes are 1 or all deals (unless someone can show me otherwise).

I've ended up using XPath to get the w:t node I'm interested in replacing, based on the MERGEFIELD, and then I walk the DOM, using DOMDocument in PHP to remove the other nodes.

Here's the XPATH I ended up using, expressed as an assignment to a variable in PHP.

$query = '//w:r[preceding-sibling::w:r[2][contains(w:instrText,\'MERGEFIELD  '.$mergeField.'\')]]/w:t';

继续阅读：php

Using XPath to extract multiple relative nodes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？