开发者

XPath choking on entities in Firefox / GreaseMonkey

I am writing a fairly basic GreaseMonkey script that locates text in a specific element and then uses that text to do things later. The relevant bits of code are as follows:

In the HTML there is a span with the class 'someclass', which contains a small string of text:

<span class="someclass">some text</span>

Then in the JavaScript i am trying to find this class and pull its contents (the 'some text') into a variable using the standard XPath jazz:

document.evaluate("//span[@class='someclass']/text()", document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

Here's the problem: When i run this on pages where 'some text' is a basic string with basic characters, everything works fine, but when i run it on pages where 'some text' contains entities, then it fails. For example, these are all fine and XPath returns the text i want:

<span class="someclass">some text</span>
<span class="someclass">some other text</span>
<span class="someclass">sometext</span>
<span class="someclass">some text 12345</span>

However, this gives me an error:

<span class="someclass">some text&#39;s text</span>

The error returned is:

Error: 开发者_运维问答The expression is not a legal expression.
Source File: file:///blahblahblah.user.js
Line: (JS line i gave above)

I found a few results on here and on Google talking about how XPath has trouble with entities, but they were all doing things like [text() = 'blah &racquo; blah'] — in other words, their entities are in the XPath query itself. Mine aren't, they're in the text that i'm trying to return from the XPath query.

Is this the same problem? Is there any easy way around it?

Thanks!


The problem is that a string literal in an XPath expression must be surrounded by either quotes or apostrophes and should not contain the surrounding character.

A literal string that contains both quotes and apostrophes needs to be transformed (in your case by your Javascript program) into one that doesn't contain both these types of characters.

The simplest way to do this is to replace each instance of one of these types of characters with its character entity -- say replace every ' with &#39; and use the ' as surrounding character for the literal string.

A second way is to replace

some text&#39;s text

with the XPath expression:

concat('some text', "'", ' text')

Warning: It is not a good idea to use untrusted data to create an XPath expression -- this may result in XPath injection. To avoid XPath injections, if your programming language and function libraries allow this, always compile your XPath expression and run it with passing the data as parameter(s).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜