HtmlAgilityPack with XPath - retrieve nodes that doesn't contain  

2023-01-22 09:30 问答作者：

I'm trying to retrieve a select amount of elements that doesn't contain the value   (a space) using the HtmlAgilityPack in C#. Here's my XPath expression:

"(td)[(position() >= 10 and position() <= last()) and not(.='&nbsp;')]"

but it is still giving me these nodes, I've tried using a literal space,   ALT + 1060 - nothing seems to work. Here is what I'm parsing:

 <tr height=20 style='mso-height-source:userset;height:15.0pt'>
  <td height=20 class=xl96 style='height:15.0pt'>&nbsp;</td>
  <td class=xl97>&nbsp;</td>
  <td class=xl106 style='border-top:none'>JIM COCKS</td>
  <td class=xl107 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl107 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl107 style='border-top:none;border-left:none'>HOL</td>
  <td class=xl76>&nbsp;</td>
  <td class=xl103 style='border-left:none'>&nbsp;</td>
  <td class=xl97>&nbsp;</td>
  <td class=xl104 style='border-top:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&n开发者_JAVA技巧bsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>09:30</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td> 
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>17:00</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl104 style='border-top:none;border-left:none'>&nbsp;</td>
  <td class=xl76>&nbsp;</td>
 </tr>

The items with the class 'xl104' is what I want to grab (I've done this with position statements as their classes change) but I only want nodes that contain something other than  , e.g. the 09:30 AND 17:00 you see above.

"(td)[(position() >= 10 and position() <= last()) and not(.='&nbsp;')]"

not(.=' ')

tests that the whole text() node is not the string ' '.

You want to use the XPath contains() function:

not(contains(., '&#xA0;'))

I'm trying to retrieve a select amount of elements that doesn't contain the value  

I believe @Dimitre has answered for that specification of the task.

I only want nodes that contain something other than  

A slightly different specification. Does this work? (Edited; thanks to Alejandro.)

"td[position() >= 10 and translate(., '&#xA0;', '') != '']"

This is equivalent and shorter, but less readable:

"td[position() >= 10 and translate(., '&#xA0;', '')]"

Anyway, you found the problem so we won't go farther with this.

Do note, though, that using   literally in XPath won't normally work unless you define it. This character entity is predefined in HTML but not in XML. That's why   or   is more reliable. However, it's possible that the HtmlAgilityPack defines for you.

HtmlAgilityPack with XPath - retrieve nodes that doesn't contain

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？