Listing in the output in a specific crawler
i have been making xml template for a specific concern for crawling t开发者_开发技巧he jobs of that concern. Am using xpath for making templates but at the the runnable time the crawler is running with out giving error but with out listing the jobs
Eg:the template of Sopra technologies(the url provided in the coding)
<?xml version="1.0" encoding="UTF-8"?>
<site>
<request-type>link</request-type>
<base-url><![CDATA[http://www.in.sopragroup.com/index.htm]]></base-url>
<start-url><![CDATA[http://www.in.sopragroup.com/careers/JobListing.aspx]]>
</start-url>
<data>
<intermediate>
<navigation-request>
<navigation-type>link</navigation-type>
<url>
<xpath></xpath>
<sub-xpath></sub-xpath>
</url>
</navigation-request>
<xpath><![CDATA[//table[@class='bg_lgrey']/tbody/tr[position>2]]></xpath>
<apply-url>
<sub-xpath><![CDATA[td/@href]]></sub-xpath>
</apply-url>
<title>
<sub-xpath><![CDATA[td/a/text()]]></sub-xpath>
</title>
</intermediate>
<detail>
<xpath><![CDATA[//table[@id='tbl']/tbody]]></xpath>
<experience>
<sub-xpath><![CDATA[tr[8]/td[2]/text()]]></sub-xpath>
</experience>
<location>
<sub-xpath><![CDATA[tr[10]/td[2]/text()]]></sub-xpath>
</location>
<description>
<sub-xpath><![CDATA[tr[2]/td[2]/text()]]></sub-xpath>
</description>
</detail>
</data>
</site>
//table[@class='bg_lgrey']/tbody/tr[position>2]
This is one of the problems in the code. Such Xpath expression has chances of selecting something only if the tr
element has a child named position
whose string value is castable to a number with value greater than 2.
You want:
//table[@class='bg_lgrey']/tbody/tr[position() >2]
A second problem:
The string "bg_lgrey"
is not present at all in the source of the pages pointed by the two urls.
精彩评论