Parsing URL string to remove unwanted stuff (C++)
Was asked this in an i开发者_如何学Gonterview, my solution kinda sucked so I am wondering if anyone can do better.
Given a URL string in this form:
http://www.foo.com?key1=value1&key2=value2&key3=value3 and given a key
I want to create a function that takes a key value and returns the original string WITHOUT the key and value.
Example:
input:
http://www.foo.com?key1=value1&key2=value2&key3=value3
remove: key2 and its value
output:
http://www.foo.com?key1=value1&key3=value3
My solution was something like this:
void parseURL(string str, string key)
{
int i;
i = str.find_first_of("?");
string s = str.substr(i);
int start = s.find(key);
int end = 0;
if (start !=string::npos)
end = s.find_first_of("&", start);
string news = str.substr(0, i) + s.substr(0, start-1) + s.substr(end);
cout << news;
}
But it's ugly and it will fail a couple of test cases. I know someone has a more clever way to do this. Anyone?
Your solution's biggest conceptual problem is that it is assuming that they given key doesn't occur anywhere else in the query part of the URL, either as part of a value or as part of another key. In other words, given input http://www.example.com?keystone=value1&key=value2
, looking for key
will delete keystone=value1
by accident. Or given input http://www.example.com?key1=key2&key2=value2
, looking for key2
will return http://www.example.com?key1=&key2=value2
, which is again not what you want.
Assuming you can't/don't want to use a regular expressions library for this, the best improvement you can make is to extract the entirety of each key (by extracting everything between a ?
or &
and the subsequent =
) until one of them matches the key you're looking for, and then delete as before.
And depending on the assumptions made in the question, you may want to consider how you might parse URL-encoded characters (e.g. looking for "multi word key" should match multi%20word%20key
).
I would have certainly tried using std::tr1::regex (TR1 standard regex library, in std::regex if you're having a C++0x implementation in your recent compiler), but I would have taken too much time on the regex syntaxe I guess.
Sounds like it is begging for regexps. In Perl it would be something like
$url =~ s/^((.*)\?(.*))[\?&]$key=[^&]*&?(.*)$/$1$4
i.e. match the stuff before the "?" which introduces the parameters, match parameters preceding your key, there's the key (which has to be between "?" or "&" and "=" so you can't get partial matches in another part of the string) and its parameter, then match what (if anything) comes after it.
You can translate these regexps into .NET fairly straightforwardly I think, but I learned them in vi and in Perl so that's where I start from.
精彩评论