开发者

quick question for regex

I have a word list, but it has some 开发者_运维问答words like East's

I need to find the words, those only contain a-z and A-Z, from a word list. How to do that.

I am using grep. What should I put after grep

grep *** myfile.txt

Thanks!


The regexp you want is ^[a-zA-Z]+$

For grep:

vinko@parrot:~$ more a.txt
Hi
Hi Dude
Hi's
vinko@parrot:~$ egrep ^[a-zA-Z]+$ a.txt
Hi

In pseudocode:

 regexp = "^[a-zA-Z]+$";
 foreach word in list
      if regexp.matches(word)
          do_something_with(word)


The grep syntax is:

grep '^[[:alpha:]]\+$' input.txt

Documentation for grep's pattern syntax is here.


Use fgrep if you want to match against a word list.

fgrep word_list_file myfile.txt


[a-z]+

using the case insensitive option, or

[A-Za-z]+

without the case insensitive option.

Post the data and the langage for more help.

for grep

egrep -i '^[a-z]+$' wordlist.dat

i can't remember what metachars need escaping and not if it doesn't work, try \[a-z\]\+ or any similar combination!


GNU grep

grep -wEo "[[:alpha:]]+" file


Or filter out all words that contain funnies

grep -v '[^a-zA-Z]'
Is there a prize for the shortest answer? :)

Note that there are portability differences between [[:alpha:]] and [A-Za-z]. [A-Za-z] works in more versions of grep, but [[:alpha:]] takes account of wide character environments and internationalization (accented characters for example when they are included in the locale).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜