开发者

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:

I need to match everything up until the character #

I don't want the stri开发者_开发知识库ng following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.

Here's an example string:

topics/install.xml#id_install

I want only topics/install.xml. And for the second question (separate expression) I want id_install


First expression:

^([^#]*)

Second expression:

#(.*)$


[a-zA-Z0-9]*[\#]

If your string contains any other special characters you need to add them into the first square bracket escaped.


I don't use C#, but i will assume that it uses pcre... if so,

"([^#]*)#.*"

with a call to 'match'. A call to 'search' does not need the trailing ".*"

The parens define the 'keep group'; the [^#] means any character that is not a '#'

You probably tried something like

"(.*)#.*"

and found that it fails when multiple '#' signs are present (keeping the leading '#'s)? That is because ".*" is greedy, and will match as much as it can.

Your matcher should have a method that looks something like 'group(...)'. Most matchers return the entire matched sequence as group(0), the first paren-matched group as group(1), and so forth.

PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.


Use look ahead and look behind:

  • To get all characters up to, but not including the pound (#): .*?(?=\#)
  • To get all characters following, but not including the pound (#): (?<=\#).*

If you don't mind using groups, you can do it all in one shot:

  • (.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
  • If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.

Honestly though, for you situation, it is probably easier to use the Split method provided in String.

More on lookahead and lookbehind


first: /[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/

second: /(?<=\#).*/


For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:

string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜