Using grep to find all emails
How to properly construct regular expression for "grep" linux program, to find all email in, say /etc directory ? Currently, my script is following:
grep -srhw "[[:alnum:]]*@[[:alnum:]]*" /etc
开发者_如何学C
It working OK - a see some of the emails, but when i modify it, to catch the one-or-more charactes before- and after the "@" sign ...
grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc
.. it stops working at all
Also, it does't catches emails of form "Name.LastName@site.com"
Help !
Here is another example
grep -Eiorh '([[:alnum:]_.-]+@[[:alnum:]_.-]+?\.[[:alpha:].]{2,6})' "$@" * | sort | uniq > emails.txt
This variant works with 3 level domains.
grep
requires most of the regular expression special characters to be escaped - including +
. You'll want to do one of these two:
grep -srhw "[[:alnum:]]\+@[[:alnum:]]\+" /etc
egrep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc
I modified your regex to include punctuation (like .-_ etc) by changing it to
egrep -ho "[[:graph:]]+@[[:graph:]]+"
This still is pretty clean and matches... well, most anything with an @ in it, of course. Also 3rd level domains, also addresses with '%' or '+' in them. See http://www.delorie.com/gnu/docs/grep/grep_8.html for a good documentation on the character class used.
In my example, the addresses were surrounded by white space, making matching quite easy. If you grep through a mail server log for example, you can add < > to make it match only the addresses:
egrep -ho "<[[:graph:]]+@[[:graph:]]+>"
@thomas, @glowcoder and @oedo all are right. The RFC that defines how an eMail address can look is quite a fun read. (I've been using GNU grep 2.9 above, included in Ubuntu).
Also check out zpea's version below, it should make for a less trigger-happy matcher.
I have used this one in order to filter email address identified by 'at' symbol and isolated by white spaces within a text:
egrep -o "[^[:space:]]+@[^[:space:]]+" | tr -d "<>"
Of course, you can use grep -E instead egrep (extended grep). Note that tr command is used to remove typical email delimiters.
grep -E -o -r "[A-Za-z0-9][A-Za-z0-9._%+-]+@[A-Za-z0-9][A-Za-z0-9.-]+\.[A-Za-z]{2,6}" /etc
This is adapted from an answer that is not mine originally, but I found it super helpful. It's from here:
http://www.shellhacks.com/en/RegEx-Find-Email-Addresses-in-a-File-using-Grep
They suggest:
grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" /etc
But it has certain false positives, like '+person..@example.com' or 'person@..com', and the whitespace constraints miss things like "mailto:person@example.com" (not technically an email but contains one); so I tweaked it a little bit.
(Do what you want with the options to grep, I don't know them very well)
This recursive one works great for me :
grep -rIhEo "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" /etc/*
Just wanted to mention that a slight variation of this works great for grabbing mentions from things like twitter tweets:
grep -Eiorh '(@[[:alnum:]_.-]+)' "$@" * | sort | uniq -c
Seems to work but picks up file names with @
egrep -osrwh "[[:alnum:]._%+-]+@[[:alnum:]]+\.[a-zA-Z]{2,6}" ~/.thunderbird/
精彩评论