How to delete nodes found with xpath-> query from a string that contains an HTML document with PHP

2023-01-03 20:42 问答作者：

The use case is quite simple. I would like to find node via an xpath statement in a string(!) that basically contains an HTML document and delete them.

I know how to find the nodes with PHP. It is basically like this: create new DOMDocument LoadHTML (or LoadXML) Create new DOMXpath and then method "query" or "evaluate". Done.

However deleting is the tricky part. One would think that you just delete the nodes with a few statements (and at the end parentNode->remove开发者_C百科child) and just save the result back into the string with saveHTML. Unfortunately this operation transforms almost every time "too many things" in the original HTML string.

So my question now is. How could I delete the nodes return by xpath->query ($query) without using saveHTML or saveXML? And without writing my own parser.

Hope it was clear enough :-)

Thanks for looking at this!

First of all, make sure you remove the found nodes from the bottom and up. This is to make sure you remove child nodes before parent nodes.

Second, what do you mean by "transforms to many things"? PHP's DOM XML will parse the document into a DOM node tree. Then you work on the tree, and when you aree done it will convert the DOM tree back into XML/HTML. You may very well lose indentation, arguments may change places and so on. The important thing is that the document means exactly the same thing, i.e. is an exact XML/HTML representation of the DOM tree.

Emil thanks for your quick answer

Yes, you are right. This is how I removed the nodes and it worked:

Convert html STRING to DOM with loadHTML/loadXML -> identify nodes with xpath query -> remove nodes from DOM (like you described) -> convert DOM to html STRING with saveHTML/XML

That works - however the problem is that the output after saveHTML is usually significantly different (besides the deleted nodes). I don't care about arguments positioning or white space. But sometimes sites don't even render correctly in a browser after saveHTML. I suspect that browsers deal better with less than perfect HTML code ...

Is there another way I could try - besides saveHTML?

May be it is not possible (or at least not without significant effort)? What do you think?

继续阅读：nodes php

How to delete nodes found with xpath-> query from a string that contains an HTML document with PHP

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？