开发者

Getting portion of href attribute using hpricot

I think I need a combo of hpricot and regex here. I need to search for 'a' tags with an 'href' attribute that starts with 'abc/', and returns the text following that until the next forward slash '/'.

So, given:

<a href="/abc/12345/xyz123/">One</a>
<a href="/abc/67890/xyzabc/">Two</a>

I need to get back: '12345' and '67开发者_开发知识库890'

Can anyone lend a hand? I've been struggling with this.


You don't need regex but you can use it. Here's two examples, one with regex and the other without, using Nokogiri, which should be compatible with Hpricot for your use, and uses CSS accessors:

require 'nokogiri'

html = %q[
  <a href="/abc/12345/xyz123/">One</a>
  <a href="/abc/67890/xyzabc/">Two</a>
]

doc = Nokogiri::HTML(html)
doc.css('a[@href]').map{ |h| h['href'][/(\d+)/, 1] } # => ["12345", "67890"]
doc.css('a[@href]').map{ |h| h['href'].split('/')[2] } # => ["12345", "67890"]


or use regex:

s = '<a href="/abc/12345/xyz123/">One</a>'
s =~ /abc\/([^\/]*)/
return $1


What about splitting the string by /?

(I don't know Hpricot, but according to the docs):

doc.search("a[@href]").each do |a|
    return a.somemethodtogettheattribute("href").split("/")[2]; // 2, because the string starts with '/'
end
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜