开发者

Word Count with Ruby

I am trying to figure out a way to count a words in a particular string that contains html.

Example String:

开发者_StackOverflow社区<p>Hello World</p>

Is there a way in Ruby to count the words in between the p tags? Or any tag for that matter?

Examples:

<p>Hello World</p>
<h2>Hello World</h2>
<li>Hello World</li>

Thanks in advance!

Edit (here is my working code)

Controller:

class DashboardController < ApplicationController
  def index
    @pages = Page.find(:all)
    @word_count = []
  end

end

View:

<% @pages.each do |page| %>

        <%  page.current_state.elements.each do |el| %>
            <% @count = Hpricot(el.description).inner_text.split.uniq.size  %>
            <% @word_count << @count %>
        <% end %>

            <li><strong>Page Name: <%= page.slug %> (Word Count: <%= @word_count.inject(0){|sum,n| sum+n } %>)</strong></li>

<% end %>


Here's how you can do it:

require 'hpricot'
content = "<p>Hello World...."
doc = Hpricot(content)
doc.inner_text.split.uniq

Will give you:

[
  [0] "Hello",
  [1] "World"
]

(sidenote: the output is formatted with awesome_print that I warmly recommend)


Sure

  1. Use Nokogiri to parse the HTML/XML and XPath to find the element and its text value.
  2. Split on whitespace to count the words


You'll want to use something like Hpricot to remove the HTML, then it's just a case of counting words in plain text.

Here is an example of stripping the HTML: http://underpantsgnome.com/2007/01/20/hpricot-scrub/


First start with something able to parse HTML like Hpricot, then use simple regular expression to do what you want (you can merely split over spaces and then count for example)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜