Get all elements by partial match of class attribute

2023-03-07 06:36 问答作者：

I'm trying to use Nokogiri to display results from a URL. (essentially scraping a URL).

I have some HTML which is similar to:

<p class="mattFacer">Matty</p>
<p class="mattSmith">Matthew</p>
<p class="suzieSmith">Suzie</p>

So I need to then find all the elements which begin with the word "matt". What I need to do is save the value of the element and the element name so I can reference it next time.. so I need to capture

"Matty" and "<p class='mattFacer'>"
"Matthew" and "<p class='mattSmith'>"

I haven't worked out how to capture the element HTML, but here's what I have so far for the element (It doesnt work!)

doc = Nokogiri::HTML(open(url))
tmp = ""
doc.xpath("[class*=matt").each do |it开发者_运维知识库em|
    tmp += item.text
end

@testy2 = tmp

This should get you started:

doc.xpath('//p[starts-with(@class, "matt")]').each do |el|
  p [el.attributes['class'].value, el.children[0].text]
end
["mattFacer", "Matty"]
["mattSmith", "Matthew"]

Use:

/*/p[starts-with(@class, 'matt')] | /*/p[starts-with(@class, 'matt')]/text()

This selects any p elements that is a child of the top element of the XML document and the value of whose class attribute starts with "matt" and any text-node child of any such p element.

When evaluated against this XML document (none was provided!):

<html>
    <p class="mattFacer">Matty</p>
    <p class="mattSmith">Matthew</p>
    <p class="suzieSmith">Suzie</p>
</html>

the following nodes are selected (each on a separate line) and can be accessed by position:

<p class="mattFacer">Matty</p>
Matty
<p class="mattSmith">Matthew</p>
Matthew

Here is a quick XSLT verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:for-each select=
  "/*/p[starts-with(@class, 'matt')]
  |
   /*/p[starts-with(@class, 'matt')]/text()
  ">
  <xsl:copy-of select="."/>
  <xsl:text>&#xA;</xsl:text>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

The result of this transformation, when applied on the same XML document (above) is the expected, correct sequence of selected nodes:

<p class="mattFacer">Matty</p>
Matty
<p class="mattSmith">Matthew</p>
Matthew

doc = Nokogiri::HTML(open(url))
tmp = ""
items = doc.css("p[class*=matt]").map(&:text).join

The accepted answer is great, but another approach would be to use Nikkou, which lets you match via regular expressions (without needing to be familiar with XPATH functions):

doc.attr_matches('class', /^matt/).collect do |item|
  [item.attributes['class'].value, item.text]
end

继续阅读：nokogiri ruby

Get all elements by partial match of class attribute

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？