Ruby: Extracting Words From String
I'm trying to parse words out of a string and put them into an array. I've tried the following thing:
@string1 = "oriented design, decomposition, encapsulation, and testing. Uses "
puts @string1.scan(/\s([^\,\.\s]*)/)
It seems to do the trick, but it's a bit shaky (I should include more special characters for example). Is there a better way to do so in ruby?
Optional: I have a cs course description. I intend to extract all the words out of it and place them in a开发者_运维问答 string array, remove the most common word in the English language from the array produced, and then use the rest of the words as tags that users can use to search for cs courses.
The split command.
words = @string1.split(/\W+/)
will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.
For me the best to spliting sentences is:
line.split(/[^[[:word:]]]+/)
Even with multilingual words and punctuation marks work perfectly:
line = 'English words, Polski Żurek!!! crème fraîche...'
line.split(/[^[[:word:]]]+/)
=> ["English", "words", "Polski", "Żurek", "crème", "fraîche"]
Well, you could split the string on spaces if that's your delimiter of interest
@string1.split(' ')
Or split on word boundaries
\W # Any non-word character
\b # Any word boundary character
Or on non-words
\s # Any whitespace character
Hint: try testing each of these on http://rubular.com
And note that ruby 1.9 has some differences from 1.8
For Rails you can use something like this:
@string1.split(/\s/).delete_if(&:blank?)
I would write something like this:
@string
.split(/,+|\s+/) # any ',' or any whitespace characters(space, tab, newline)
.reject(&:empty?)
.map { |w| w.gsub(/\W+$|^\W+^*/, '') } # \W+$ => any trailing punctuation; ^\W+^* => any leading punctuation
irb(main):047:0> @string1 = "oriented design, 'with', !!qwe, and testing. can't rubyisgood#)(*#%)(*, and,rails,is,good"
=> "oriented design, 'with', !!qwe, and testing. can't rubyisgood#)(*#%)(*, and,rails,is,good"
irb(main):048:0> @string1.split(/,+|\s+/).reject(&:empty?).map { |w| w.gsub(/\W+$|^\W+^*/, '')}
=> ["oriented", "design", "with", "qwe", "and", "testing", "can't", "rubyisgood", "and", "rails", "is", "good"]
精彩评论