开发者

What is the fastest way to parse this string

I have a string, that is in the following format:

[Season] [Year] [Vendor] [Geography]

so an example might be: Spring 2009 Nielsen MSA

I need to be able to parse out Season and Year in the fastest way possible. I don't care about prettiness or cleverness. Just raw speed. The language is C# using VS2008, but the assemb开发者_Go百科ly is being built for .NET 2.0


If you only need the season and year, then:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int secondSpace = text.IndexOf(' ', firstSpace + 1);
int year = int.Parse(text.Substring(firstSpace + 1, 
                                    secondSpace - firstSpace - 1));

If you can assume the year is always four digits, this is even faster:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = int.Parse(text.Substring(firstSpace + 1, 4));

If additionally you know that all years are in the 21st century, it can get stupidly optimal:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 2000 + 10 * (text[firstSpace + 3] - '0') 
                + text[firstSpace + 4] - '0';

which becomes even less readable but possibly faster (depending on what the JIT does) as:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 1472 + 10 * text[firstSpace + 3] + text[firstSpace + 4];

Personally I think that's at least one step too far though :)

EDIT: Okay, taking this to extremes... you're only going to have a few seasons, right? Suppose they're "Spring", "Summer", "Fall", "Winter" then you can do:

string season;
int yearStart;
if (text[0] == 'S')
{
    season = text[1] == 'p' ? "Spring" : "Summer";
    yearStart = 7;
}
else if (text[0] == 'F')
{
    season = "Fall";
    yearStart = 5;
}
else
{
    season = "Winter";
    yearStart = 7;
}

int year = 1472 + 10 * text[yearStart + 2] + text[yearStart + 3];

This has the advantage that it will reuse the same string objects. Of course, it assumes that there's never anything wrong with the data...

Using Split as shown in Spidey's answer is certainly simpler than any of this, but I suspect it'll be slightly slower. To be honest, I'd at least try that first... have you measured the simplest code and found that it's too slow? The difference is likely to be very slight - certainly compared with whatever network or disk access you've got reading in the data in the first place.


To add to the other answers, if you are expecting them to be in this format:

Spring xxxx
Summer xxxx
Autumn xxxx
Winter xxxx

then an even faster way would be:

string season = text.Substring(0, 6);
int year = int.Parse(text.Substring(7, 4);

That is rather nasty, though. :)

I wouldn't even consider coding like this.


Try this.

        string str = "Spring 2009 Nielsen MSA";
        string[] words = str.Split(' ');
        str = words[0] + " " + words[1];


string input = "Spring 2009 Nielsen MSA";

int seasonIndex = input.IndexOf(' ') + 1;

string season = input.SubString(0, seasonIndex - 2);
string year = input.SubString(seasonIndex, input.IndexOf(' ', seasonIndex) - seasonIndex);


string[] split = stringName.Split(' ');
split[0]+" "+split[1];


Class Parser:

public class Parser : StringReader {

    public Parser(string s) : base(s) {
    }

    public string NextWord() {
        while ((Peek() >= 0) && (char.IsWhiteSpace((char) Peek())))
            Read();
        StringBuilder sb = new StringBuilder();
        do {
            int next = Read();
            if (next < 0)
                break;
            char nextChar = (char) next;
            if (char.IsWhiteSpace(nextChar))
                break;
            sb.Append(nextChar);
        } while (true);
        return sb.ToString();
    }
}

Use:

    string str = "Spring 2009 Nielsen MSA";
    Parser parser = new Parser(str);
    string season = parser.NextWord();
    string year = parser.NextWord();
    string vendor = parser.NextWord();
    string geography = parser.NextWord();


I'd got with Spidey's suggestion, which should be decent enough performance, but with simple, easy to follow, easy to maintain code.

But if you really need to push the perf. envelope (and C# is the only tool available) then probably a couple of loops in series that search for the spaces, then pull the strings out using substr would marginally outdo it.

You could do the same with IndexOf instead of the loops, but rolling your own may be slightly faster (but you'd have to profile that).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜