开发者

Regular Expression to find a strings between two tokens, while EXCLUDING the tokens AND the start token is the same as the end token [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 11 years ago.

An extension of Regular Expression to find a string included between two characters, while EXCLUDING the delimiters

The solution to that question modified a tiny bit:

(?<=\#)(.*?)(?=\#)

Given a string "The #iPhone 4# is made by #apple#." that solution returns:

["iP开发者_如何学Chone 4", " is made by ", "apple"]

Now I'm not sure if this is possible using only a regex, but in this case " is made by " is not supposed to be returned. It simply happens to be squashed between the other two ## wrapped strings, and so is wrapped itself.

Clarification: The regex needs to support a variable number of #foo# strings in the parent string. There will not always be only 2.

Update

Due to the varied responses, and the realization that this problem is more simply solved without regex, I'm voting to close the question. Answer: do this without regex, in the language of your choice.


Very close to @Gerben, but for me working: (there should be an odd amount of '#' before the token (incl. the '#' that starts the token))

(?<=^[^#]*#([^#]*#[^#]*#)*)([^#]*)(?=#)

You can't just take (?<=\#)(.*?)(?=\#) and ignore every other match in the match list before processing on...?


The zero-width assertions cause the match to include text between all delimiters instead of continuing after each "consumed" delimiter. You have to change the code which does the matching so that it extracts, for instance, the first capture group, rather than the whole matched expression. It would help if you posted the code you are using now so we could tell you how to modify it, but your example is formatted in a Pythonesque way, so something like this;

stringlist = re.findall("#([^#]*)#", string)

Sorry, not at my computer, and my Python is not very good, so I'll probably have to get back to you with corrections.

Update: fixed and substantially simplified the code


The solution doesn't return what you say it does (it's working on square brackets rather than hash marks), but it's a question of what you put into parentheses; the parentheses are what direct the capturing.

#([^#]*)#[^#]*#([^#]*)#


Instead of .* use [^\]*] (in case when ] is dellimeter

EDITED

So you have a list #text#,#text#,.. and want to resolve items of list

(\#[^\#]*\#[,$])+


not sure if this works, but the idea would be that it only matches the first # if there are an even amount of #-characters before it.

(?<=(?:^[^#]*#[^#]*#)*#)([^#]*)(?=#)

But what language are you using? Because it would be a lot easier to do without using just regex


I am not familiar enough with regular expressions to give you a regular expression answer. But it seems that every second item of your list is to be discarded. Why not iterate the list and do that?

This is how I would do it:

text = "The #iPhone 4# is made by #apple#" 
cleanlist = list(match.strip('#') for match in re.findall('#.*?#', text, re.UNICODE))
print cleanlist
>>> ['iPhone 4', 'apple']
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜