Can I look at the actual line that was the source of an element parsed from an html document using lxml

2023-01-12 06:50 问答作者：

I have been having fun manipulating html with lxml. Now I want to do some manipulation of the actual file, after finding a particular element that meets my needs I want to know if it is possible to retrieve the source of the element.

I jumped up and down in my chair after seeing sourceline as a method of my element but that did not give me what I wanted.

some_element.sourceline

Near as I can figure, sourceline can only be used when the htm source is a file of lists so you get the line number.

I better add that I generated my elements by

theTree=html.fromstring(open(myFileRef).read())

the_elements=[e  for e in theTree.iter()]

To be clear, I am getting None as the value for some_element.sourceline - I tested this for all 27,000 el开发者_开发百科ements in my tree

One thing I am imagining doing is using the html source in an expression to find that particular place in the document, maybe to snip something out. I can't rely on the text of an element because the text is not necessarily unique.

One solution that was posted but taken down was to use sourceline but even after reading in my file as a list I was not able to get any value other than None for sourceline. I am going to post another question to see if someone has an example using sourceline

I just tried and discarded html.tostring(myelement) as it converts at least some encodings automatically (I am probably not phrasing that correctly) Here is an example:

Snip of the html source

<b>  KEY 1A.&nbsp;&nbsp;&nbsp;&nbsp;REGIONAL PRODUCTION    <br>    </b>

html.tostring(the_element,method='html')

Clearly I am not getting the original, unvarnished source.

'<b>  KEY 1A.&#160;&#160;&#160;&#160;REGIONAL PRODUCTION    <br></b>'

I think I found the issue as I was having the same problem.

I believe the element.sourceline is lost if you do any kind of xslt transform to the document when you parse it.

When I do not transform the document I get the sourceline fine, however, when I use etree.XSLT I lose all sourceline data.

继续阅读：lxml parsing python

Can I look at the actual line that was the source of an element parsed from an html document using lxml

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？