How to clean an file from invalid e-mail adresses
I'm looking for a script or something that would get a csv file as imput. It would parse the file line by line and check if the current line contains a valid email (e.g. : user@domain.ext)
I think this must already exist somewhere.
A local html file with some开发者_运维技巧 javascript/jquery would be perfect.
I need this to check lists with manually entered emails with no verification.
Thanks Michel
You wont be able to read or write local files with javascript, I wrote this in ruby. If the code isn't useful to you maybe the regular expression will.
#!/usr/bin/ruby
File.open("somefile.csv").each{ |line|
if line =~ /\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6}/
puts "Good email!"
else
puts "FAIL"
end
}
1) Be warned that validating email addresses is extremely hard. In fact it's impossible to do perfectly. It's a trade-off between expression complexity, coverage, and accuracy (false positives). See the zillions of other questions on SO about validating e-mail addresses, and also see here: http://www.regular-expressions.info/email.html
2) You have a flat file (.csv). You can't read this in with javascript and process in a browser. You'll need to look at some other language. Perl and Java to randomly mention two languages have good regex support.
Since you looked for a Javascript solution, here is some plain JS assuming you have some file containing the csv data like this:
<pre id="csv">
a,b,567@noe.invalid!tld
1,2,me@example.net
4,5,so@ex.com
</pre>
Here is the script where you might want to replace the seperator, the linebreaks used or the regular expression for checking the addresses. Depending on your validation requirements have look for other regular expressions here: Validate email address in JavaScript?
var separator = ',',
linebreak = '\n',
regex = /^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/,
csv = document.getElementById('csv'),
lines = csv.innerHTML.split(linebreak),
fields,
i;
for(i = 0; i < lines.length; i++)
{
fields = lines[i].split(separator);
if (regex.test(fields[mailColumn]))
{
document.write(fields[mailColumn] + ' is valid<br/>');
}
}
Hello and thank you for your answers. I eventually managed to do it with a local html file and javascript. Here is the method:
- Have a local webserver (like xampp)
- Create an html file that will use jquery (for example) to load the csv file in AJAX
- Split the loaded file on the \n (newline) character to an array
- Process each array element according to what is needed (split again on ";" etc.)
- Fill two textarea fields : one with valid email adress, the other with invalid ones
- Manually fix or delete adresses from invalid textarea
- Copy/Past the valid textarea to a new clean file
In order to have it working, I place the csv file next to the html file, then I use a "input type=file" to load it
And voila :-)
Raw code :
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Validation email</title>
<script src="jquery.js"></script>
<script>
function isValidEmailAddress(emailAddress) {
var pattern = new RegExp(/^(("[\w-\s]+")|([\w-]+(?:\.[\w-]+)*)|("[\w-\s]+")([\w-]+(?:\.[\w-]+)*))(@((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(@\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\]?$)/i);
return pattern.test(emailAddress);
};
function no_accent (my_string) {
var new_string = "";
var pattern_accent = new Array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','Þ','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ð','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ý','þ','ÿ');
var pattern_replace_accent = new Array('A','A','A','A','A','A','A','C','E','E','E','E','I','I','I','I','D','N','O','O','O','0','O','O','U','U','U','U','Y','b','s','a','a','a','a','a','a','a','c','e','e','e','e','i','i','i','i','d','n','o','o','o','o','o','o','u','u','u','u','y','y','b','y');
if (my_string && my_string!= "") {
new_string = preg_replace (pattern_accent, pattern_replace_accent, my_string);
}
return new_string;
}
$(document).ready(function() {
$('#checkMail').click(function() {
$('#invalid').val('');
$('#valid').val('');
$.ajax({
type: "GET",
url: $('#fileName').val(),
dataType: "text",
cache:false,
success: function(text) {
alert("Start process");
var reg=new RegExp("\r\n", "g");
var monTab = text.split(reg);
for (cpt=0;cpt<monTab.length;cpt++){
//do some custom check here if needed
if (isValidEmailAddress(monTab[cpt])){
//add to valid textarea
document.getElementById('valid').value += monTab[cpt] + "\r\n";
} else {
//add to invalid textarea
document.getElementById('invalid').value += monTab[cpt] + "\r\n";
}
}
alert("Process over!");
}
});//close $.ajax
});
});
</script>
</head>
<body>
<input type="file" name="myfile" size="50" id="fileName"> (put csv file next to this html file)<br/>
<input type="button" value="Process" id="checkMail">
<br/>
Invalid adresses : <br/>
<textarea id="invalid" cols="80" rows="20"></textarea>
<br/>
Valid adresses : <br/>
<textarea id="valid" cols="80" rows="20"></textarea>
</body>
</html>
精彩评论