Counting occurrences of a string in an array and then removing duplicates

2022-12-15 02:58 问答作者：

I am fairly new to C# programming and I am stuck on my little ASP.NET project.

My website currently examines Twitter statuses for URLs and then adds those URLs to an array, all via a regular expression pattern matching procedure. Clearly more than one person will update a with a specific URL so I do not want to list duplicates, and I want to count the number of ti开发者_开发问答mes a particular URL is mentioned in, say, 100 tweets.

Now I have a List<String> which I can sort so that all duplicate URLs are next to each other. I was under the impression that I could compare list[i] with list[i+1] and if they match, for a counter to be added to (count++), and if they don't match, then for the URL and the count value to be added to a new array, assuming that this is the end of the duplicates.

This would remove duplicates and give me a count of the number of occurrences for each URL. At the moment, what I have is not working, and I do not know why (like I say, I am not very experienced with it all).

With the code below, assume that a JSON feed has been searched for using a keyword into srchResponse.results. The results with URLs in them get added to sList, a string List type, which contains only the URLs, not the message as a whole.

I want to put one of each URL (no duplicates), a count integer (to string) for the number of occurrences of a URL, and the username, message, and user image URL all into my jagged array called 'urls[100][]'. I have made the array 100 rows long to make sure everything can fit but generally, this is too big. Each 'row' will have 5 elements in them.

The debugger gets stuck on the line: if (sList[i] == sList[i + 1]) which is the crux of my idea, so clearly the logic is not working. Any suggestions or anything will be seriously appreciated!

Here is sample code:

  var sList = new ArrayList();

    string[][] urls = new string[100][];

    int ctr = 0;
    int j = 1;

    foreach (Result res in srchResponse.results)
    {           

        string content = res.text;
        string pattern = @"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)";
        MatchCollection matches = Regex.Matches(content, pattern);

      foreach (Match match in matches)
      {

        GroupCollection groups = match.Groups;

                    sList.Add(groups[0].Value.ToString());
      }
    }

    sList.Sort();    
    foreach (Result res in srchResponse.results)
    {
        for (int i = 0; i < 100; i++)
        {
            if (sList[i] == sList[i + 1])
            {
                j++;
            }
            else
            {
                urls[ctr][0] = sList[i].ToString();
                urls[ctr][1] = j.ToString();
                urls[ctr][2] = res.text;
                urls[ctr][3] = res.from_user;
                urls[ctr][4] = res.profile_image_url;
                ctr++;
                j = 1;
            }
        }



    }

The code then goes on to add each result into a StringBuilder method with the HTML.

Is now edite

The description of your algorithm seems fine. I don't know what's wrong with the implementation; I haven't read it that carefully. (The fact that you are using an ArrayList is an immediate red flag; why aren't you using a more strongly typed generic collection?)

However, I have a suggestion. This is exactly the sort of problem that LINQ was intended to solve. Instead of writing all that error-prone code yourself, just describe the transformation you're interested in, and let the compiler work it out for you.

Suppose you have a list of strings and you wish to determine the number of occurrences of each:

var notes = new []{ "Do", "Fa", "La", "So", "Mi", "Do", "Re" };

var counts = from note in notes 
             group note by note into g
             select new { Note = g.Key, Count = g.Count() }

foreach(var count in counts)
    Console.WriteLine("Note {0} occurs {1} times.", count.Note, count.Count);

Which I hope you agree is much easier to read than all that array logic you wrote. And of course, now you have your sequence of unique items; you have a sequence of counts, and each count contains a unique Note.

I'd recommend using a more sophisticated data structure than an array. A Set will guarantee that you have no duplicates.

Looks like C# collections doesn't include a Set, but there are 3rd party implementations available, like this one.

Your loop fails because when i == 99, (i + 1) == 100 which is outside the bounds of your array.

But as other have pointed out, .Net 3.5 has ways of doing what you want more elegantly.

If you don't need to know how many duplicates a specific entry has you could do the following:

LINQ Extension Methods

.Count()   
.Distinct()  
.Count()

继续阅读：.net asp.net twitter

Counting occurrences of a string in an array and then removing duplicates

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？