capturing specific text between tags

2023-03-07 11:18 问答作者：

The explanation is in the comment. I put it there because is interpreted as bold or something, and it screws up the post.

# I need to capture text that is
# enclosed in tags that are both <b> and
# <i>, but if there is more than one
# text enclosed in <i> in the same <b>
# block, then I only want the text
# enclosed in the first <i> tag, For
# example, for the following line:
# 
# <b> <i> Important text here </i>
# irrelevant text everywhere else <i>
# irrelevant text here </i> </b>  <b>
# <i> Also Important </i> not important
# <i> not important </i> </b> 
# 
# I want to retrieve only: 
# - Important text here 
# - Also Important
# 
# I also must not retrieve text inside 开发者_StackOverflow中文版an
# <h2> block. I have been trying to
# delete the block with nodes.delete(nodes. search('h2')), 
# but it doesn't actually delete the h2 block 


require "rubygems"
require "nokogiri"

html = <<EOT
  <b><i> Important text here </i> more text <i> not important text here </i> </b>
  <b> <i> Also Important </i> more text <i> not important </i> </b> 

  <h2><b> <i> I don't want this text either</i></b></h2> 
EOT


doc = Nokogiri::HTML(html)

nodes = doc.search('b i')

nodes.each { |e| puts e }

# Expected output:
# Important text here
# Also Important

require "nokogiri"
require 'pp'
html = <<EOT
  <b><i>Important text here</i>more text<i>not important text here</i></b>
  <b><i>Also Important</i>more text<i>not important</i></b> 

  <h2><b><i>I don't want this text either</i></b></h2> 
EOT


doc = Nokogiri::HTML(html)
nodes = doc.search('b')
nodes.each { |e| puts e.children.children.first unless e.parent.name == "h2" }

or with xpath:

nodes = doc.xpath("//../*[local-name() != 'h2']/b/i[1]")
nodes.each { |e| puts e.children.first}

继续阅读：nokogiri ruby

capturing specific text between tags

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？