Remove email address from string
How can I remove email address from a string? And all other digits and special characters?
Sample String can be
"Hello world my # is 123 mail me @ test@test.com"
开发者_运维百科Out put string should be
"Hello world my is mail me"
I googled this and found that I can use following regular expressions
"[^A-Za-z0-9\\.\\@_\\-~#]+"
but that example was more to check valid email ids not removing it. I am new to java!
As pointed out by others, you could use regular expressions to clean up your String and replace unwanted part by an empty string ""
. To do so, have a look at the replaceAll(String regex, String replacement)
method of the String
class and at the Pattern
class for the syntax of regular expressions in Java.
Below, some code demonstrating one way to clean the provided sample String (maybe not the most elegant though):
String input = "Hello world my # is 123 mail me @ test@test.com";
String EMAIL_PATTERN = "([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)";
String output = input.replaceAll(EMAIL_PATTERN, "") // Replace emails
// by an empty string
.replaceAll("\\p{Punct}", "") // Replace all punctuation. One of
// !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
.replaceAll("\\d", "") // Replace any digit by an empty string
.replaceAll("\\p{Blank}{2,}+", " "); // Replace any Blank (a space or
// a tab) repeated more than once
// by a single space.
System.out.println(output);
Running this code produces the following output:
Hello world my is mail me
If you need to remove more garbage (or less, like punctuation), well, you've got the principle. Adapt it to suit your needs.
You can use String#replaceAll()
for this. Just let it replace any regex matches by an empty string ""
. The regex you mentioned is however not very robust. A better one is this (copied from here and slightly changed for use in plain vanilla text):
string = string.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "");
Hope this helps.
Check out the Java regular expression Pattern class and its uses. There's a useful tutorial here which includes replacement methods.
An aside: this is a particularly robust regexp to use for RFC822-compliant email addresses :-) You should be able to come up with something more concise for your needs! There's a discussion of email regexps and trade-offs here.
From your example, it looks like it's not just email addresses you're interested in removing, it's all non-alpha characters, so this is trivial:
str = str.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "")
.replaceAll("[^\\p{Alpha} ]", "")
.replaceAll("[ ]{2,}+", " ");
See the Pattern
JavaDocs for information about what the special character class \p{Alpha}
means...
精彩评论