HtmlAgilityPack with XPath - retrieve nodes that doesn't contain
I'm trying to retrieve a select amount of elements that doesn't contain the value
(a space) using the HtmlAgilityPack in C#. Here's my XPath expression:
"(td)[(position() >= 10 and position() <= last()) and not(.=' ')]"
but it is still giving me these nodes, I've tried using a literal space,  
ALT + 1060 - nothing seems to work. Here is what I'm parsing:
<tr height=20 style='mso-height-source:userset;height:15.0pt'>
<td height=20 class=xl96 style='height:15.0pt'> </td>
<td class=xl97> </td>
<td class=xl106 style='border-top:none'>JIM COCKS</td>
<td class=xl107 style='border-top:none;border-left:none'> </td>
<td class=xl107 style='border-top:none;border-left:none'> </td>
<td class=xl107 style='border-top:none;border-left:none'>HOL</td>
<td class=xl76> </td>
<td class=xl103 style='border-left:none'> </td>
<td class=xl97> </td>
<td class=xl104 style='border-top:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'>&n开发者_JAVA技巧bsp;</td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'>09:30</td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'>17:00</td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl104 style='border-top:none;border-left:none'> </td>
<td class=xl76> </td>
</tr>
The items with the class 'xl104' is what I want to grab (I've done this with position statements as their classes change) but I only want nodes that contain something other than
, e.g. the 09:30 AND 17:00 you see above.
"(td)[(position() >= 10 and position() <= last()) and not(.=' ')]"
not(.=' ')
tests that the whole text() node is not the string ' '
.
You want to use the XPath contains()
function:
not(contains(., ' '))
I'm trying to retrieve a select amount of elements that doesn't contain the value
I believe @Dimitre has answered for that specification of the task.
I only want nodes that contain something other than
A slightly different specification. Does this work? (Edited; thanks to Alejandro.)
"td[position() >= 10 and translate(., ' ', '') != '']"
This is equivalent and shorter, but less readable:
"td[position() >= 10 and translate(., ' ', '')]"
Anyway, you found the problem so we won't go farther with this.
Do note, though, that using
literally in XPath won't normally work unless you define it. This character entity is predefined in HTML but not in XML. That's why  
or  
is more reliable. However, it's possible that the HtmlAgilityPack defines for you.
精彩评论