Word Count with Ruby
I am trying to figure out a way to count a words in a particular string that contains html.
Example String:
开发者_StackOverflow社区<p>Hello World</p>
Is there a way in Ruby to count the words in between the p tags? Or any tag for that matter?
Examples:
<p>Hello World</p>
<h2>Hello World</h2>
<li>Hello World</li>
Thanks in advance!
Edit (here is my working code)
Controller:
class DashboardController < ApplicationController
def index
@pages = Page.find(:all)
@word_count = []
end
end
View:
<% @pages.each do |page| %>
<% page.current_state.elements.each do |el| %>
<% @count = Hpricot(el.description).inner_text.split.uniq.size %>
<% @word_count << @count %>
<% end %>
<li><strong>Page Name: <%= page.slug %> (Word Count: <%= @word_count.inject(0){|sum,n| sum+n } %>)</strong></li>
<% end %>
Here's how you can do it:
require 'hpricot'
content = "<p>Hello World...."
doc = Hpricot(content)
doc.inner_text.split.uniq
Will give you:
[
[0] "Hello",
[1] "World"
]
(sidenote: the output is formatted with awesome_print that I warmly recommend)
Sure
- Use Nokogiri to parse the HTML/XML and XPath to find the element and its text value.
- Split on whitespace to count the words
You'll want to use something like Hpricot to remove the HTML, then it's just a case of counting words in plain text.
Here is an example of stripping the HTML: http://underpantsgnome.com/2007/01/20/hpricot-scrub/
First start with something able to parse HTML like Hpricot, then use simple regular expression to do what you want (you can merely split over spaces and then count for example)
精彩评论