Cant quite seem to nail this regular expression whats up with it?
Im trying to strip out keyvalue pairs from a string. For example using:
key=cat key2=dog
I use the expression:
([^=])([\w-\s]*)\s
Which gives me:
cat dog
However in reality the string to search is likely to contain other non alphabet characters like this:
192.168.20.31 Url=/flash/56553550_hi.mp4 Log=SESSION开发者_StackOverflow社区START
[16/Dec/2010:13:44:17 +0000] flash/56553550_hi.mp4 0 192.168.20.31 1
[16/Dec/2010:13:44:17 +0000] 0 0 0 [0 No Error]
[0 No Error [rtmp://helix.pete.videolibraryserver.com/flash/56553550_hi.mp4]
And I need to be able to pluck out the URL from it. However im not sure how I inject a catch all for all the character types into my original regexp. could someone show me?
Try this out. Works like a beauty for me:
((?<=key[0-9]?=)[^\s]*(\s|$))+
(?<=regex)
is a zero-width (non-consuming) look-behind. This ensures that the value is preceded by key[0-9]?=
. You can adjust the [0-9]
to suit your exact needs, but the ?
make that digit optional anyways. The value part is matching anything that is not a space: [^\s]
. It keeps consuming, *
, and terminates the value when it finds a space or end-of-string (\s|$)
.
Update
I started looking at the blob of data you gave as what you're actually searching over and modified the expression thus:
([^\s]+)=(.+?(?=([^\s]+=|$)))
Works great on the header data you provided (if you're copy/pasting into a tester, remember to remove the hard returns).
Matches:
Url
,/flash/56553550_hi.mp4
Log
,SESSIONSTART [16/Dec/2010:13:44:17 +0000] flash/56553550_hi.mp4 0 192.168.20.31 1 [16/Dec/2010:13:44:17 +0000] 0 0 0 [0 No Error] [0 No Error [rtmp://helix.pete.videolibraryserver.com/flash/56553550_hi.mp4]
To not match the key (only the value):
[^\s]+=(.+?(?=([^\s]+=|$)))
RegEx Reference
RegEx Tester
Try this, to capture non-space characters following Url=
:
\bUrl=(\S*)
Or, if you want something more general to match all key/value pairs, try this:
\b(\S*)=(\S*)
Assuming your Url value allows only: alphanumeric, '.', and '_'; this regex should extract the value of the url.
Url=(?(\w|/|.)*)
The code to extract the value is:
Regex regex = new Regex(@"Url=(?(\w|/|\.)*)");
MatchCollection matchCollection = regex.Matches(inputString);
foreach(Match match in matchCollection)
{
Console.WriteLine(match.Groups["url"].Value);
}
精彩评论