开发者

finding and returning a string with a specified prefix

I am close but I am not sure what to do with the restuling match object. If I do

p = re.search('[/@.* /]', str)

I'll get any words that start with @ and end up with a space. This is what I want. However this returns a Match object that I dont' know what to do with. What's the most computationally efficient way of finding and returning a string which is prefixed with a @?

For example,

"Hi there @guy"

After doing the proper calculations, I wou开发者_JAVA百科ld be returned

guy


The following regular expression do what you need:

import re
s = "Hi there @guy"
p = re.search(r'@(\w+)', s)
print p.group(1)

It will also work for the following string formats:

  • s = "Hi there @guy " # notice the trailing space
  • s = "Hi there @guy," # notice the trailing comma
  • s = "Hi there @guy and" # notice the next word
  • s = "Hi there @guy22" # notice the trailing numbers
  • s = "Hi there @22guy" # notice the leading numbers


That regex does not do what you think it does.

s = "Hi there @guy"
p = re.search(r'@([^ ]+)', s) # this is the regex you described
print p.group(1) # first thing matched inside of ( .. )

But as usually with regex, there are tons of examples that break this, for example if the text is s = "Hi there @guy, what's with the comma?" the result would be guy,.

So you really need to think about every possible thing you want and don't want to match. r'@([a-zA-Z]+)' might be a good starting point, it literally only matches letters (a .. z, no unicode etc).


p.group(0) should return guy. If you want to find out what function an object has, you can use the dir(p) method to find out. This will return a list of attributes and methods that are available for that object instance.


As it's evident from the answers so far regex is the most efficient solution for your problem. Answers differ slightly regarding what you allow to be followed by the @:

[^ ] anything but space
\w   in python-2.x is equivalent to [A-Za-z0-9_], in py3k is locale dependent

If you have better idea what characters might be included in the user name you might adjust your regex to reflect that, e.g., only lower case ascii letters, would be:

[a-z]

NB: I skipped quantifiers for simplicity.


(?<=@)\w+

will match a word if it's preceded by a @ (without adding it to the match, a so-called positive lookbehind). This will match "words" that are composed of letters, numbers, and/or underscore; if you don't want those, use (?<=@)[^\W\d_]+

In Python:

>>> strg = "Hi there @guy!"
>>> p = re.search(r'(?<=@)\w+', strg)
>>> p.group()
'guy'


You say: """If I do p = re.search('[/@.* /]', str) I'll get any words that start with @ and end up with a space."" But this is incorrect -- that pattern is a character class which will match ONE character in the set @/.* and space. Note: there's a redundant second / in the pattern. For example:

>>> re.findall('[/@.* /]', 'xxx@foo x/x.x*x xxxx')
['@', ' ', '/', '.', '*', ' ']
>>>

You say that you want "guy" returned from "Hi there @guy" but that conflicts with "and end up with a space".

Please edit your question to include what you really want/need to match.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜