How to remove duplicates from List<string> without LINQ? [duplicate]
Possible Duplicate:
Remove duplicates from a List<T> in C#
i have a List like below (so big email list):
source list :item 0 : jumper@yahoo.com|32432
item 1 : goodzila@yahoo.com|32432|test23
item 2 : alibaba@yahoo.com|32432|test65
item 3 : blabla@yahoo.com|32432|test32
the important part of each item is email address and the other parts(separated with pipes are not important) but i want to keep them in final list.
as i said my list is to big and i think it's not recommended to use ano开发者_JAVA技巧ther list.how can i remove duplicate emails (entire item) form that list without using LINQ ?
my codes are like below :private void WorkOnFile(UploadedFile file, string filePath)
{
File.SetAttributes(filePath, FileAttributes.Archive);
FileSecurity fSecurity = File.GetAccessControl(filePath);
fSecurity.AddAccessRule(new FileSystemAccessRule(@"Everyone",
FileSystemRights.FullControl,
AccessControlType.Allow));
File.SetAccessControl(filePath, fSecurity);
string[] lines = File.ReadAllLines(filePath);
List<string> list_lines = new List<string>(lines);
var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
List<string> new_list_lines = new List<string>(new_lines);
int Duplicate_Count = 0;
RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
File.WriteAllLines(filePath, new_list_lines.ToArray());
}
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
char[] splitter = { '|' };
list_lines.ForEach(delegate(string line)
{
// ??
});
}
EDIT :
some duplicate email addrresses in that list have different parts -> what can i do about them : meangoodzila@yahoo.com|32432|test23
and
goodzila@yahoo.com|asdsa|324234
Thanks in advance.
say you have a list of possible duplicates:
List<string> emailList ....
Then the unique list is the set of that list:
HashSet<string> unique = new HashSet<string>( emailList )
private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
Duplicate_Count = 0;
List<string> list_lines2 = new List<string>();
HashSet<string> hash = new HashSet<string>();
foreach (string line in list_lines)
{
string[] split = line.Split('|');
string firstPart = split.Length > 0 ? split[0] : string.Empty;
if (hash.Add(firstPart))
{
list_lines2.Add(line);
}
else
{
Duplicate_Count++;
}
}
list_lines = list_lines2;
}
The easiest thing to do is to iterate through the lines in the file and add them to a HashSet. HashSets won't insert the duplicate entries and it won't generate an exception either. At the end you'll have a unique list of items and no exceptions will be generated for any duplicates.
1 - Get rid of your pipe separated string (create an dto class corresponding to the data it's representing)
2 - which rule do you want to apply to select two object with the same id ?
Or maybe this code can be useful for you :) It's using the same method as the one in @xanatos answer
string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;
foreach (var line in lines )
{
var key = line.Split('|').ElementAt(0);
if (!items.ContainsKey(key))
items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();
First, I suggest to you load the file via stream. Then, create a type that represent your rows and load them into a HashSet(for performance considerations).
Look (Ive removed some of your code to make it simple):
public struct LineType
{
public string Email { get; set; }
public string Others { get; set; }
public override bool Equals(object obj)
{
return this.Email.Equals(((LineType)obj).Email);
}
}
private static void WorkOnFile(string filePath)
{
StreamReader stream = File.OpenText(filePath);
HashSet<LineType> hashSet = new HashSet<LineType>();
while (true)
{
string line = stream.ReadLine();
if (line == null)
break;
string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
LineType lineType = new LineType()
{
Email = new_line.Split('|')[3],
Others = new_line
};
if (!hashSet.Contains(lineType))
hashSet.Add(lineType);
}
}
精彩评论