Strip words beginning with a specific letter from a sentence using regex

2022-12-19 03:07 问答作者：

I'm not sure how to use regular expressions in a function so that I could grab all the words in a sentence starting with a particular 开发者_开发技巧letter. I know that I can do:

word =~ /^#{letter}/

to check if the word starts with the letter, but how do I go from word to word. Do I need to convert the string to an array and then iterate through each word or is there a faster way using regex? I'm using ruby so that would look like:

matching_words = Array.new
sentance.split(" ").each do |word|
  matching_words.push(word) if word =~ /^#{letter}/ 
end

Scan may be a good tool for this:

#!/usr/bin/ruby1.8

s = "I think Paris in the spring is a beautiful place"
p s.scan(/\b[it][[:alpha:]]*/i)
# => ["I", "think", "in", "the", "is"]

\b means 'word boundary."
[:alpha:] means upper or lowercase alpha (a-z).

You can use \b. It matches word boundaries--the invisible spot just before and after a word. (You can't see them, but oh they're there!) Here's the regex:

/\b(a\w*)\b/

The \w matches a word character, like letters and digits and stuff like that.

You can see me testing it here: http://rubular.com/regexes/13347

Similar to Anon.'s answer:

/\b(a\w*)/g

and then see all the results with (usually) $n, where n is the n-th hit. Many libraries will return /g results as arrays on the $n-th set of parenthesis, so in this case $1 would return an array of all the matching words. You'll want to double-check with whatever library you're using to figure out how it returns matches like this, there's a lot of variation on global search returns, sadly.

As to the \w vs [a-zA-Z], you can sometimes get faster execution by using the built-in definitions of things like that, as it can easily have an optimized path for the preset character classes.

The /g at the end makes it a "global" search, so it'll find more than one. It's still restricted by line in some languages / libraries, though, so if you wish to check an entire file you'll sometimes need /gm, to make it multi-line

If you want to remove results, like your title (but not question) suggests, try:

    /\ba\w*//g

which does a search-and-replace in most languages (/<search>/<replacement>/). Sometimes you need a "s" at the front. Depends on the language / library. In Ruby's case, use:

string.gsub(/(\b)a\w*(\b)/, "\\1\\2")

to retain the non-word characters, and optionally put any replacement text between \1 and \2. gsub for global, sub for the first result.

/\ba[a-z]*\b/i

will match any word starting with 'a'.

The \b indicates a word boundary - we want to only match starting from the beginning of a word, after all.

Then there's the character we want our word to start with.

Then we have as many as possible letter characters, followed by another word boundary.

To match all words starting with t, use:

\bt\w+

That will match test but not footest; \b means "word boundary".

Personally i think that regex is overkill for this application, simply running a select is more than capable of solving this particular problem.

"this is a test".split(' ').select{ |word| word[0,1] == 't' } 

result => ["this", "test"]

or if you are determined to use regex then go with grep

"this is a test".split(' ').grep(/^t/)

result => ["this", "test"]

Hope this helps.

继续阅读：regex ruby

Strip words beginning with a specific letter from a sentence using regex

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？