开发者

Using grep to find all emails

How to properly construct regular expression for "grep" linux program, to find all email in, say /etc directory ? Currently, my script is following:

grep -srhw "[[:alnum:]]*@[[:alnum:]]*" /etc
开发者_如何学C

It working OK - a see some of the emails, but when i modify it, to catch the one-or-more charactes before- and after the "@" sign ...

grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc

.. it stops working at all

Also, it does't catches emails of form "Name.LastName@site.com"

Help !


Here is another example

grep -Eiorh '([[:alnum:]_.-]+@[[:alnum:]_.-]+?\.[[:alpha:].]{2,6})' "$@" * | sort | uniq > emails.txt

This variant works with 3 level domains.


grep requires most of the regular expression special characters to be escaped - including +. You'll want to do one of these two:

grep -srhw "[[:alnum:]]\+@[[:alnum:]]\+" /etc

egrep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc


I modified your regex to include punctuation (like .-_ etc) by changing it to

egrep -ho "[[:graph:]]+@[[:graph:]]+"

This still is pretty clean and matches... well, most anything with an @ in it, of course. Also 3rd level domains, also addresses with '%' or '+' in them. See http://www.delorie.com/gnu/docs/grep/grep_8.html for a good documentation on the character class used.

In my example, the addresses were surrounded by white space, making matching quite easy. If you grep through a mail server log for example, you can add < > to make it match only the addresses:

egrep -ho "<[[:graph:]]+@[[:graph:]]+>"

@thomas, @glowcoder and @oedo all are right. The RFC that defines how an eMail address can look is quite a fun read. (I've been using GNU grep 2.9 above, included in Ubuntu).

Also check out zpea's version below, it should make for a less trigger-happy matcher.


I have used this one in order to filter email address identified by 'at' symbol and isolated by white spaces within a text:

egrep -o "[^[:space:]]+@[^[:space:]]+" | tr -d "<>"

Of course, you can use grep -E instead egrep (extended grep). Note that tr command is used to remove typical email delimiters.


grep -E -o -r "[A-Za-z0-9][A-Za-z0-9._%+-]+@[A-Za-z0-9][A-Za-z0-9.-]+\.[A-Za-z]{2,6}" /etc

This is adapted from an answer that is not mine originally, but I found it super helpful. It's from here:

http://www.shellhacks.com/en/RegEx-Find-Email-Addresses-in-a-File-using-Grep

They suggest:

grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" /etc

But it has certain false positives, like '+person..@example.com' or 'person@..com', and the whitespace constraints miss things like "mailto:person@example.com" (not technically an email but contains one); so I tweaked it a little bit.

(Do what you want with the options to grep, I don't know them very well)


This recursive one works great for me :

grep -rIhEo "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" /etc/*


Just wanted to mention that a slight variation of this works great for grabbing mentions from things like twitter tweets:

grep -Eiorh '(@[[:alnum:]_.-]+)' "$@" * | sort | uniq -c


Seems to work but picks up file names with @

egrep -osrwh "[[:alnum:]._%+-]+@[[:alnum:]]+\.[a-zA-Z]{2,6}" ~/.thunderbird/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜