Can grep show only words that match search pattern?

2022-12-08 08:21 问答作者：

Is there a way to make grep output "words" from files that match the search expression?

If I want to find all the instances of, say, "th" in a number of files, I can do:

grep "th" *

but the output will be something like (bold is by me);

some-text-file : the c开发者_运维技巧at sat on the mat  
some-other-text-file : the quick brown fox  
yet-another-text-file : i hope this explains it thoroughly

What I want it to output, using the same search, is:

the
the
the
this
thoroughly

Is this possible using grep? Or using another combination of tools?

Try grep -o:

grep -oh "\w*th\w*" *

Edit: matching from Phil's comment.

From the docs:

-h, --no-filename
    Suppress the prefixing of file names on output. This is the default
    when there is only  one  file  (or only standard input) to search.
-o, --only-matching
    Print  only  the matched (non-empty) parts of a matching line,
    with each such part on a separate output line.

Cross distribution safe answer (including windows minGW?)

grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"

If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.

Linux cross distribution safe answer

grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'

To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)

More from the manual for grep

-o      Print each match, but only the match, not the entire line.
-h      Never print filename headers (i.e. filenames) with output lines.
-w      The expression is searched for as a word (as if surrounded by
         `[[:<:]]' and `[[:>:]]';

The reason why the original answer does not work for everyone

The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more

Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep

As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.

(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)

Credit for the "-o" workaround from @AdamRosenfield answer

It's more simple than you think. Try this:

egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)

egrep -iwo 'th.[a-z]*' filename.txt  ### (Case Insensitive)

Where,

 egrep: Grep will work with extended regular expression.
 w    : Matches only word/words instead of substring.
 o    : Display only matched pattern instead of whole line.
 i    : If u want to ignore case sensitivity.

You could translate spaces to newlines and then grep, e.g.:

cat * | tr ' ' '\n' | grep th

Just awk, no need combination of tools.

# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly

grep command for only matching and perl

grep -o -P 'th.*? ' filename

I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.

It seems like ack (or ack-grep if you use Ubuntu) can do this easily:

# ack-grep -ho "\bth.*?\b" *

the
the
the
this
thoroughly

If you omit the -h flag you get:

# ack-grep -o "\bth.*?\b" *

some-other-text-file
1:the

some-text-file
1:the
the

yet-another-text-file
1:this
thoroughly

As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:

# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file

1, 5, 12/27/2010

cat *-text-file | grep -Eio "th[a-z]+"

You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.

From Wikipedia:

cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple

grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple

To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.

ack -oh --type=html "\w*icon-\w*" | sort | uniq

I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.

At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o

so, I think that could be something similar to (I'm NOT a regex Master) :

egrep -o "the*|this{1}|thoroughly{1}" filename

grep --color -o -E "Begin.{0,}?End" file.txt

? - Match as few as possible until the End

Tested on macos terminal

You could pipe your grep output into Perl like this:

grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'

$ grep -w

Excerpt from grep man page:

-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.

`ripgrep`

Here are the example using ripgrep:

rg -o "(\w+)?th(\w+)?"

It'll match all words matching th.

继续阅读：grep words

Can grep show only words that match search pattern?

`ripgrep`

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

ripgrep

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

`ripgrep`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？