How to grep for two words existing on the same line? [duplicate]

2023-03-15 21:33 问答作者：

This question already has answers here: Match two strings in one line with grep (24 answers) Closed 3 years ago.

How do I grep for lines that contain two input words on the line? I'm looking for lines that contain both words, how do I do that? I tried pipe like this:

grep -c "word1" | grep -r "word2" logs开发者_运维百科

It just stucks after the first pipe command.

Why?

Why do you pass -c? That will just show the number of matches. Similarly, there is no reason to use -r. I suggest you read man grep.

To grep for 2 words existing on the same line, simply do:

grep "word1" FILE | grep "word2"

grep "word1" FILE will print all lines that have word1 in them from FILE, and then grep "word2" will print the lines that have word2 in them. Hence, if you combine these using a pipe, it will show lines containing both word1 and word2.

If you just want a count of how many lines had the 2 words on the same line, do:

grep "word1" FILE | grep -c "word2"

Also, to address your question why does it get stuck : in grep -c "word1", you did not specify a file. Therefore, grep expects input from stdin, which is why it seems to hang. You can press Ctrl+D to send an EOF (end-of-file) so that it quits.

Prescription

One simple rewrite of the command in the question is:

grep "word1" logs | grep "word2"

The first grep finds lines with 'word1' from the file 'logs' and then feeds those into the second grep which looks for lines containing 'word2'.

However, it isn't necessary to use two commands like that. You could use extended grep (grep -E or egrep):

grep -E 'word1.*word2|word2.*word1' logs

If you know that 'word1' will precede 'word2' on the line, you don't even need the alternatives and regular grep would do:

grep 'word1.*word2' logs

The 'one command' variants have the advantage that there is only one process running, and so the lines containing 'word1' do not have to be passed via a pipe to the second process. How much this matters depends on how big the data file is and how many lines match 'word1'. If the file is small, performance isn't likely to be an issue and running two commands is fine. If the file is big but only a few lines contain 'word1', there isn't going to be much data passed on the pipe and using two command is fine. However, if the file is huge and 'word1' occurs frequently, then you may be passing significant data down the pipe where a single command avoids that overhead. Against that, the regex is more complex; you might need to benchmark it to find out what's best — but only if performance really matters. If you run two commands, you should aim to select the less frequently occurring word in the first grep to minimize the amount of data processed by the second.

Diagnosis

The initial script is:

grep -c "word1" | grep -r "word2" logs

This is an odd command sequence. The first grep is going to count the number of occurrences of 'word1' on its standard input, and print that number on its standard output. Until you indicate EOF (e.g. by typing Control-D), it will sit there, waiting for you to type something. The second grep does a recursive search for 'word2' in the files underneath directory logs (or, if it is a file, in the file logs). Or, in my case, it will fail since there's neither a file nor a directory called logs where I'm running the pipeline. Note that the second grep doesn't read its standard input at all, so the pipe is superfluous.

With Bash, the parent shell waits until all the processes in the pipeline have exited, so it sits around waiting for the grep -c to finish, which it won't do until you indicate EOF. Hence, your code seems to get stuck. With Heirloom Shell, the second grep completes and exits, and the shell prompts again. Now you have two processes running, the first grep and the shell, and they are both trying to read from the keyboard, and it is not determinate which one gets any given line of input (or any given EOF indication).

Note that even if you typed data as input to the first grep, you would only get any lines that contain 'word2' shown on the output.

Footnote:

At one time, the answer used:

grep -E 'word1.*word2|word2.*word1' "$@"
grep 'word1.*word2' "$@"

This triggered the comments below.

you could use awk. like this...

cat <yourFile> | awk '/word1/ && /word2/'

Order is not important. So if you have a file and...

a file named , file1 contains:

word1 is in this file as well as word2
word2 is in this file as well as word1
word4 is in this file as well as word1
word5 is in this file as well as word2

then,

/tmp$ cat file1| awk '/word1/ && /word2/'

will result in,

word1 is in this file as well as word2
word2 is in this file as well as word1

yes, awk is slower.

The main issue is that you haven't supplied the first grep with any input. You will need to reorder your command something like

grep "word1" logs | grep "word2"

If you want to count the occurences, then put a '-c' on the second grep.

`git grep`

Here is the syntax using git grep combining multiple patterns using Boolean expressions:

git grep -e pattern1 --and -e pattern2 --and -e pattern3

^{The above command will print lines matching all the patterns at once.}

If the files aren't under version control, add --no-index param.

Search files in the current directory that is not managed by Git.

Check man git-grep for help.

How to grep for two words existing on the same line? [duplicate]

Prescription

Diagnosis

`git grep`

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Prescription

Diagnosis

git grep

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

`git grep`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？