List<T> -- Proper Collection to Use for Large Amount of Items?

2023-01-14 14:19 问答作者：

Edit: Ok, since it's clear I'm taking the wrong approach with this, I'll explain what I was intending to do. The overall intent is to (as an exercise) verify all valid email addresses according to spec. This portion was to generate a portion of the data-set to verify the algorithm against.

As an exercise, I'm writing a program that will generate all possible email addresses. This will result in ~~₈₀^81⁶⁵~~ ≈ 1.4e122 possible items. I'm currently using List<T>s to store the generated items but my understanding is that it has a maximum capacity of Int32.MaxValue. I'm guessing a proper solution isn't going to involve Lists of Lists of Lists. This is what I have so far.

private void GenerateLocalPart()
{
    List<string> validLocalSymbols = new List<string>()
    {
        ".", "!", "#", "$", "%", "&", "*", "+", "-",
        "/", "^", "_", "`", "{", "|", "}", "~", "\"",
    };
    List<string> validLocalNumbers = new List<string>()
    {
        "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
    };
    List<string> validLocalLowercase = new List<string>()
    {
        "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
        "k", "l", "m", "n", "o", "p", "q", "r", "s", "t",
        "u", "v", "w", "x", "y", "z",
    };
    List<string> validLocalUppercase = new List<string>()
    {
        "A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
        "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T",
        "U", "V", "W", "X", "Y", "Z",
    };

    List<string> validLocalPartCharacters = new List<string>();
    validLocalPartCharacters.AddRange(validLocalSymbols);
    validLocalPartCharacters.AddRange(validLocalNumbers);
    validLocalPar开发者_高级运维tCharacters.AddRange(validLocalLowercase);
    validLocalPartCharacters.AddRange(validLocalUppercase);

    List<string> targetSequence           = validLocalLowercase;
    int lengthOfStringToGenerate          = 5;
    int numberOfDifferentSourceCharacters = targetSequence.Count;
    List<List<string>> localPart          = new List<List<string>>();
    List<string> localPartSeed            = new List<string>();

    localPart.Add(localPartSeed);
    foreach (string character in targetSequence)
        localPartSeed.Add(character);

    for (int i = 1; i < lengthOfStringToGenerate; i++)
    {
        List<string> bufferList = new List<string>();
        localPart.Add(bufferList);
        foreach (string lastListString in localPart[i - 1])
            foreach (string character in targetSequence)
                bufferList.Add(lastListString + character);
    }

    Console.WriteLine("Break here.");
}

lengthOfStringToGenerate is a maximum length of the strings (so it generates all combinations from 1 to lengthOfStringToGenerate). localPart will end up with an amount of Lists equivalent to the lengthOfStringToGenerate. Is there a different type of collection that I should be using? Is there a different overall approach I should be taking?

Where were you expecting to store all this data? List<T> will always store its values in memory... but even if you write something to store the results to disk, you're still not going to be able to hold 1.4e122 items. Have you really taken in just how big that number is? Even at a single bit per item, that's way more than the capacity of the universe, if the whole of the universe was one big hard disk.

The largest unit of data I've ever heard of being talked about in a meaningful way is an exabyte, which is 10¹⁸ bytes. For most people, a petabyte (10¹⁵ bytes) is a pretty huge amount of data. What you're considering makes those quantities seem microscopically small.

What are you trying to do with the data afterwards? And when would you expect such an algorithm to ever actually finish?

Your project might even be a bigger threat to reality than the LHC and possible micro black-holes, see Arthur C. Clarke’s ‘The Nine Billion Names of God’

Actually, given that your algorithm is entirely deterministic why store the values in the first place? Why not represent them as a in-memory IEnumerable stream that generates the next possible email address each time Next is called. You could store your progress so far (good luck with this) and use the “current position” (or rather “position reached”) as an input to the generator for subsequent runs. As Jon points out, it’s not just a space problem – there’s the total runtime to contend with too (another reason perhaps to use a non-persistent, pull based architecture – assuming you can use the results as they become available then at least this way you’ll derive some useful effect early).

Cute project, one does naturally wonder: why.

List<T> -- Proper Collection to Use for Large Amount of Items?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？