Regular expression to remove one parameter from query string
I'm looking for a regular expression to remove a single parameter from a query string, and I want to do it in a single regular expression if possible.
Say I want to remove the foo
parameter. Right now I use this:
/&?foo\=[^&]+/
That works as long as foo
is not the first parameter in the query string. If it is, then my new query string starts with an ampersand. (For example, "foo=123&bar=456
" gives a result of "&bar=456
".) Right now, I'm just checking after the regex if the query string starts with ampersand, and chopping it off if it does.
Example edge cases:
Input | Expected Output
-------------------------+--------------------
foo=123 | (empty string)
foo=123&bar=456 | bar=456
bar=456&foo=123 | bar=456
abc=789&foo=123&bar=456 | abc=789&bar=456
Edit
OK as pointed out in comments there are there are way more edge cases than I originally considered. I got the following regex to work with all of them:
/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/
This is modified from Mark Byers's answer, which is why I'm accepting that one, but Roger Pate's input helped a lot too.
Here is the full suite of test cases I'm using, and a Javascript snippet which tests them:
$(function() {
var regex = /&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/;
var escapeHtml = function (str) {
var map = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
"'": '''
};
return str.replace(/[&<>"']/g, function(m) { return map[m]; });
};
//test cases
var tests = [
'foo' , 'foo&bar=456' , 'bar=456&foo' , 'abc=789&foo&bar=456'
,'foo=' , 'foo=&bar=456' , 'bar=456&foo=' , 'abc=789&foo=&bar=456'
,'foo=123' , 'foo=123&bar=456' , 'bar=456&foo=123' , 'abc=789&foo=123&bar=456'
,'xfoo' , 'xfoo&bar=456' , 'bar=456&xfoo' , 'abc=789&xfoo&bar=456'
,'xfoo=' , 'xfoo=&bar=456' , 'bar=456&xfoo=' , 'abc=789&xfoo=&bar=456'
,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
,'foox' , 'foox&bar=456' , 'bar=456&foox' , 'abc=789&foox&bar=456'
,'foox=' , 'foox=&bar=456' , 'bar=456&foox=' , 'abc=789&foox=&bar=456'
,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
];
//expected results
var expected = [
'' , 'bar=456' , 'bar=456' , 'abc=789&bar=456'
,'' , 'bar=456' , 'bar=456' , 'abc=789&bar=456'
,'' , 'bar=456' , 'bar=456' , 'abc=789&bar=456'
,'xfoo' , 'xfoo&bar=456' , 'bar=456&xfoo' , 'abc=789&xfoo&bar=456'
,'xfoo=' , 'xfoo=&bar=456' , 'bar=456&xfoo=' , 'abc=789&xfoo=&bar=456'
,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xf开发者_运维问答oo=123&bar=456'
,'foox' , 'foox&bar=456' , 'bar=456&foox' , 'abc=789&foox&bar=456'
,'foox=' , 'foox=&bar=456' , 'bar=456&foox=' , 'abc=789&foox=&bar=456'
,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
];
for(var i = 0; i < tests.length; i++) {
var output = tests[i].replace(regex, '');
var success = (output == expected[i]);
$('#output').append(
'<tr class="' + (success ? 'passed' : 'failed') + '">'
+ '<td>' + (success ? 'PASS' : 'FAIL') + '</td>'
+ '<td>' + escapeHtml(tests[i]) + '</td>'
+ '<td>' + escapeHtml(output) + '</td>'
+ '<td>' + escapeHtml(expected[i]) + '</td>'
+ '</tr>'
);
}
});
#output {
border-collapse: collapse;
}
#output tr.passed { background-color: #af8; }
#output tr.failed { background-color: #fc8; }
#output td, #output th {
border: 1px solid black;
padding: 2px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table id="output">
<tr>
<th>Succ?</th>
<th>Input</th>
<th>Output</th>
<th>Expected</th>
</tr>
</table>
If you want to do this in just one regular expression, you could do this:
/&foo(=[^&]*)?|^foo(=[^&]*)?&?/
This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.
To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.
/(?<=&|\?)foo(=[^&]*)?(&|$)/
Uses lookbehind and the last group to "anchor" the match, and allows a missing value. Change the \?
to ^
if you've already stripped off the question mark from the query string.
Regex is still not a substitute for a real parser of the query string, however.
Update: Test script: (run it at codepad.org)
import re
regex = r"(^|(?<=&))foo(=[^&]*)?(&|$)"
cases = {
"foo=123": "",
"foo=123&bar=456": "bar=456",
"bar=456&foo=123": "bar=456",
"abc=789&foo=123&bar=456": "abc=789&bar=456",
"oopsfoo=123": "oopsfoo=123",
"oopsfoo=123&bar=456": "oopsfoo=123&bar=456",
"bar=456&oopsfoo=123": "bar=456&oopsfoo=123",
"abc=789&oopsfoo=123&bar=456": "abc=789&oopsfoo=123&bar=456",
"foo": "",
"foo&bar=456": "bar=456",
"bar=456&foo": "bar=456",
"abc=789&foo&bar=456": "abc=789&bar=456",
"foo=": "",
"foo=&bar=456": "bar=456",
"bar=456&foo=": "bar=456",
"abc=789&foo=&bar=456": "abc=789&bar=456",
}
failures = 0
for input, expected in cases.items():
got = re.sub(regex, "", input)
if got != expected:
print "failed: input=%r expected=%r got=%r" % (input, expected, got)
failures += 1
if not failures:
print "Success"
It shows where my approach failed, Mark has the right of it—which should show why you shouldn't do this with regex.. :P
The problem is associating the query parameter with exactly one ampersand, and—if you must use regex (if you haven't picked up on it :P, I'd use a separate parser, which might use regex inside it, but still actually understand the format)—one solution would be to make sure there's exactly one ampersand per parameter: replace the leading ?
with a &
.
This gives /&foo(=[^&]*)?(?=&|$)/
, which is very straight forward and the best you're going to get. Remove the leading &
in the final result (or change it back into a ?
, etc.). Modifying the test case to do this uses the same cases as above, and changes the loop to:
failures = 0
for input, expected in cases.items():
input = "&" + input
got = re.sub(regex, "", input)
if got[:1] == "&":
got = got[1:]
if got != expected:
print "failed: input=%r expected=%r got=%r" % (input, expected, got)
failures += 1
if not failures:
print "Success"
Having a query string that starts with &
is harmless--why not leave it that way? In any case, I suggest that you search for the trailing ampersand and use \b
to match the beginning of foo w/o taking in a previous character:
/\bfoo\=[^&]+&?/
It's a bit silly but I started trying to solve this with a regexp and wanted to finally get it working :)
$str[] = 'foo=123';
$str[] = 'foo=123&bar=456';
$str[] = 'bar=456&foo=123';
$str[] = 'abc=789&foo=123&bar=456';
foreach ($str as $string) {
echo preg_replace('#(?:^|\b)(&?)foo=[^&]+(&?)#e', "'$1'=='&' && '$2'=='&' ? '&' : ''", $string), "\n";
}
the replace part is messed up because apparently it gets confused if the captured characters are '&'
s
Also, it doesn't match afoo
and the like.
Thanks. Yes it uses backslashes for escaping, and you're right, I don't need the /'s.
This seems to work, though it doesn't do it in one line as requested in the original question.
public static string RemoveQueryStringParameter(string url, string keyToRemove)
{
//if first parameter, leave ?, take away trailing &
string pattern = @"\?" + keyToRemove + "[^&]*&?";
url = Regex.Replace(url, pattern, "?");
//if subsequent parameter, take away leading &
pattern = "&" + keyToRemove + "[^&]*";
url = Regex.Replace(url, pattern, "");
return url;
}
I based myself on your implementation to get a Java impl that seems to work:
public static String removeParameterFromQueryString(String queryString,String paramToRemove) {
Preconditions.checkArgument(queryString != null,"Empty querystring");
Preconditions.checkArgument(paramToRemove != null,"Empty param");
String oneParam = "^"+paramToRemove+"(=[^&]*)$";
String begin = "^"+paramToRemove+"(=[^&]*)(&?)";
String end = "&"+paramToRemove+"(=[^&]*)$";
String middle = "(?<=[&])"+paramToRemove+"(=[^&]*)&";
String removedMiddleParams = queryString.replaceAll(middle,"");
String removedBeginParams = removedMiddleParams.replaceAll(begin,"");
String removedEndParams = removedBeginParams.replaceAll(end,"");
return removedEndParams.replaceAll(oneParam,"");
}
I had troubles in some cases with your implementation because sometimes it did not delete a &
, and did it with multiple steps which seems easier to understand.
I had a problem with your version, particularly when a param was in the query string multiple times (like param1=toto¶m2=xxx¶m1=YYY¶m3=ZZZ¶m1....)
it's never too late right
did the thing using conditional lookbehind to be sure it doesn't mess up &
s
/(?(?<=\?)(foo=[^&]+)&*|&(?1))/g
if ?
is behind we catch foo=bar
and trailing &
if it exists
if not ?
is behind we catch &foo=bar
(?1)
represents 1st cathing group, in this example it's the same as (foo=[^&]+)
actually i needed a oneliner for two similar parameters page and per-page
so i altered this expression a bit
/(?(?<=\?)((per-)?page=[^&]+)&*|&(?1))/g
works like charm
You can use the following regex:
[\?|&](?<name>.*?)=[^&]*&?
If you want to do exact match you can replace (?<name>.*?)
with a url parameter.
e.g.:
[\?|&]foo=[^&]*&?
to match any variable like foo=xxxx
in any URL.
For anyone interested in replacing GET request parameters:
The following regex works for also for more general GET method queries (starting with ?) where the marked answer fails if the parameter to be removed is the first one (after ?)
This (JS flavour) regex can be used to remove the parameter regardless of position (first, last, or in between) leaving the query in well formated state.
So just use a regex replace with an empty string.
/&s=[^&]*()|\?s=[^&]*$|s=[^&]*&/
Basically it matches one of the three cases mentioned above (hence the 2 pipes)
精彩评论