开发者

How to remove duplicates from List<string> without LINQ? [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

Remove duplicates from a List<T> in C#

i have a List like below (so big email list):

source list :

item 0 : jumper@yahoo.com|32432  
item 1 : goodzila@yahoo.com|32432|test23  
item 2 : alibaba@yahoo.com|32432|test65  
item 3 : blabla@yahoo.com|32432|test32 

the important part of each item is email address and the other parts(separated with pipes are not important) but i want to keep them in final list.

as i said my list is to big and i think it's not recommended to use ano开发者_JAVA技巧ther list.

how can i remove duplicate emails (entire item) form that list without using LINQ ?

my codes are like below :

private void WorkOnFile(UploadedFile file, string filePath)
{
    File.SetAttributes(filePath, FileAttributes.Archive);

    FileSecurity fSecurity = File.GetAccessControl(filePath);
    fSecurity.AddAccessRule(new FileSystemAccessRule(@"Everyone",
                                                    FileSystemRights.FullControl,
                                                    AccessControlType.Allow));
    File.SetAccessControl(filePath, fSecurity);

    string[] lines = File.ReadAllLines(filePath);
    List<string> list_lines = new List<string>(lines);
    var new_lines = list_lines.Select(line => string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)));
    List<string> new_list_lines = new List<string>(new_lines);
    int Duplicate_Count = 0;
    RemoveDuplicates(ref new_list_lines, ref Duplicate_Count);
    File.WriteAllLines(filePath, new_list_lines.ToArray());
}

private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
    char[] splitter = { '|' };
    list_lines.ForEach(delegate(string line)
    {
        // ??
    });
}

EDIT :

some duplicate email addrresses in that list have different parts ->

what can i do about them :

mean

goodzila@yahoo.com|32432|test23   
and   
goodzila@yahoo.com|asdsa|324234

Thanks in advance.


say you have a list of possible duplicates:

List<string> emailList ....

Then the unique list is the set of that list:

HashSet<string> unique = new HashSet<string>( emailList )


private void RemoveDuplicates(ref List<string> list_lines, ref int Duplicate_Count)
{
    Duplicate_Count = 0;
    List<string> list_lines2 = new List<string>();
    HashSet<string> hash = new HashSet<string>();

    foreach (string line in list_lines)
    {
        string[] split = line.Split('|');
        string firstPart = split.Length > 0 ? split[0] : string.Empty;

        if (hash.Add(firstPart)) 
        {
            list_lines2.Add(line);
        }
        else
        {
            Duplicate_Count++;
        }
    }

    list_lines = list_lines2;
}


The easiest thing to do is to iterate through the lines in the file and add them to a HashSet. HashSets won't insert the duplicate entries and it won't generate an exception either. At the end you'll have a unique list of items and no exceptions will be generated for any duplicates.


1 - Get rid of your pipe separated string (create an dto class corresponding to the data it's representing)

2 - which rule do you want to apply to select two object with the same id ?


Or maybe this code can be useful for you :) It's using the same method as the one in @xanatos answer

string[] lines= File.ReadAllLines(filePath);
Dictionary<string, string> items;

foreach (var line in lines )
{
    var key = line.Split('|').ElementAt(0);
    if (!items.ContainsKey(key))
        items.Add(key, line);
}
List<string> list_lines = items.Values.ToList();


First, I suggest to you load the file via stream. Then, create a type that represent your rows and load them into a HashSet(for performance considerations).

Look (Ive removed some of your code to make it simple):

public struct LineType
{
    public string Email { get; set; }
    public string Others { get; set; }

    public override bool Equals(object obj)
    {
        return this.Email.Equals(((LineType)obj).Email);
    }
}
private static void WorkOnFile(string filePath)
{
    StreamReader stream = File.OpenText(filePath);

    HashSet<LineType> hashSet = new HashSet<LineType>();

    while (true)
    {
        string line = stream.ReadLine();
        if (line == null)
            break;

        string new_line = string.Join("|", line.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));


        LineType lineType = new LineType()
        {
            Email = new_line.Split('|')[3],
            Others = new_line
        };

        if (!hashSet.Contains(lineType))
            hashSet.Add(lineType);
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜