开发者

How do I cut phrases off a string in ruby?

I wasn't sure about my questions name. I have an HTML page I开发者_Go百科 got using nokogiri. Now I want to cut some tags off that page. I tried using ruby's delete method after converting the html to a string - Though it deletes all the letters I entered. The best result i got was using .gsub('<stuff>', '') though it still leaves some space. Is it possible to actually cut stuff of a string? specific pharses? Another question - Can I remove spaces?

what I done so far :

doc = Nokogiri::HTML(open("http://www.example.com/"))
tester = doc.css(".example").to_s.gsub('<div class="example">', '')


I'd suggest trying to do it at the xml tree level rather than string editing. I think the nokogiri api gives you some tools for doing this.

Another approach might be selecting the data you want, with css or xpath, rather than deleting the parts you don't want?

There's also an xpath function for normalising space in strings, there's an example in this question

Some nokogiri help:

  • Intro article on Engineyard
  • Railscast/Asciicasts
  • Official tutorials


Check out Nokogiri's Tutorials. In particular, you want to read "Modifying an HTML / XML Document", Changing text contents.

Nokogiri's XML accessors are very friendly, because you don't need to use XPath. You can use CSS accessors also, and for people who aren't in XML all day long they can help a lot.

In that particular example, they're using the at_css method, which searches for the first occurrence of the target. You have many alternate methods, which are synonyms: at, %, at_css and at_xpath handle "find the first one" cases. search, css, xpath, / similarly handle "find all occurrences".

For instance:

require 'nokogiri'

html = '<h1>Snap, Crackle and Pop</h1>'

doc = Nokogiri::HTML(html)
h1 = doc.at('h1') 
h1.content = h1.content[0, h1.content.length - 3] + '...'

puts doc.to_html

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body><h1>Snap, Crackle and ...</h1></body></html>

That creates a new HTML document in Nokogiri, searches for the first H1, and trims the trailing three characters in its contents, replacing them with an ellipsis.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜