开发者

Cant quite seem to nail this regular expression whats up with it?

Im trying to strip out keyvalue pairs from a string. For example using:

key=cat key2=dog 

I use the expression:

([^=])([\w-\s]*)\s

Which gives me:

cat dog

However in reality the string to search is likely to contain other non alphabet characters like this:

192.168.20.31 Url=/flash/56553550_hi.mp4 Log=SESSION开发者_StackOverflow社区START 
[16/Dec/2010:13:44:17 +0000] flash/56553550_hi.mp4 0 192.168.20.31 1 
[16/Dec/2010:13:44:17 +0000] 0 0 0 [0 No Error] 
[0 No Error [rtmp://helix.pete.videolibraryserver.com/flash/56553550_hi.mp4] 

And I need to be able to pluck out the URL from it. However im not sure how I inject a catch all for all the character types into my original regexp. could someone show me?


Try this out. Works like a beauty for me:

((?<=key[0-9]?=)[^\s]*(\s|$))+

(?<=regex) is a zero-width (non-consuming) look-behind. This ensures that the value is preceded by key[0-9]?=. You can adjust the [0-9] to suit your exact needs, but the ? make that digit optional anyways. The value part is matching anything that is not a space: [^\s]. It keeps consuming, *, and terminates the value when it finds a space or end-of-string (\s|$).


Update

I started looking at the blob of data you gave as what you're actually searching over and modified the expression thus:

([^\s]+)=(.+?(?=([^\s]+=|$)))

Works great on the header data you provided (if you're copy/pasting into a tester, remember to remove the hard returns).

Matches:

Url,/flash/56553550_hi.mp4

Log,SESSIONSTART [16/Dec/2010:13:44:17 +0000] flash/56553550_hi.mp4 0 192.168.20.31 1 [16/Dec/2010:13:44:17 +0000] 0 0 0 [0 No Error] [0 No Error [rtmp://helix.pete.videolibraryserver.com/flash/56553550_hi.mp4]

To not match the key (only the value):

[^\s]+=(.+?(?=([^\s]+=|$)))

RegEx Reference

RegEx Tester


Try this, to capture non-space characters following Url=:

\bUrl=(\S*)

Or, if you want something more general to match all key/value pairs, try this:

\b(\S*)=(\S*)


Assuming your Url value allows only: alphanumeric, '.', and '_'; this regex should extract the value of the url.

Url=(?(\w|/|.)*)

The code to extract the value is:


Regex regex = new Regex(@"Url=(?(\w|/|\.)*)");
MatchCollection matchCollection = regex.Matches(inputString);

foreach(Match match in matchCollection)
{
    Console.WriteLine(match.Groups["url"].Value);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜