Python matching some characters into a string
I'm trying to extract/match data from a string usi开发者_如何学Gong regular expression but I don't seem to get it.
I wan't to extract from the following string the i386 (The text between the last - and .iso):
/xubuntu/daily/current/lucid-alternate-i386.iso
This should also work in case of:
/xubuntu/daily/current/lucid-alternate-amd64.iso
And the result should be either i386 or amd64 given the case.
Thanks a lot for your help.
You could also use split in this case (instead of regex):
>>> str = "/xubuntu/daily/current/lucid-alternate-i386.iso"
>>> str.split(".iso")[0].split("-")[-1]
'i386'
split
gives you a list of elements on which your string got 'split'. Then using Python's slicing syntax you can get to the appropriate parts.
If you will be matching several of these lines using re.compile() and saving the resulting regular expression object for reuse is more efficient.
s1 = "/xubuntu/daily/current/lucid-alternate-i386.iso"
s2 = "/xubuntu/daily/current/lucid-alternate-amd64.iso"
pattern = re.compile(r'^.+-(.+)\..+$')
m = pattern.match(s1)
m.group(1)
'i386'
m = pattern.match(s2)
m.group(1)
'amd64'
r"/([^-]*)\.iso/"
The bit you want will be in the first capture group.
First off, let's make our life simpler and only get the file name.
>>> os.path.split("/xubuntu/daily/current/lucid-alternate-i386.iso")
('/xubuntu/daily/current', 'lucid-alternate-i386.iso')
Now it's just a matter of catching all the letters between the last dash and the '.iso'.
The expression should be without the leading trailing slashes.
import re
line = '/xubuntu/daily/current/lucid-alternate-i386.iso'
rex = re.compile(r"([^-]*)\.iso")
m = rex.search(line)
print m.group(1)
Yields 'i386'
reobj = re.compile(r"(\w+)\.iso$")
match = reobj.search(subject)
if match:
result = match.group(1)
else:
result = ""
Subject contains the filename and path.
>>> import os
>>> path = "/xubuntu/daily/current/lucid-alternate-i386.iso"
>>> file, ext = os.path.splitext(os.path.split(path)[1])
>>> processor = file[file.rfind("-") + 1:]
>>> processor
'i386'
精彩评论