cross-platform splitting of path in python
I'd like something that has the same effect as this:
>>> path 开发者_如何学Python= "/foo/bar/baz/file"
>>> path_split = path.rsplit('/')[1:]
>>> path_split
['foo', 'bar', 'baz', 'file']
But that will work with Windows paths too. I know that there is an os.path.split()
but that doesn't do what I want, and I didn't see anything that does.
Python 3.4 introduced a new module pathlib
. pathlib.Path
provides file system related methods, while pathlib.PurePath
operates completely independent of the file system:
>>> from pathlib import PurePath
>>> path = "/foo/bar/baz/file"
>>> path_split = PurePath(path).parts
>>> path_split
('\\', 'foo', 'bar', 'baz', 'file')
You can use PosixPath and WindowsPath explicitly when desired:
>>> from pathlib import PureWindowsPath, PurePosixPath
>>> PureWindowsPath(path).parts
('\\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(path).parts
('/', 'foo', 'bar', 'baz', 'file')
And of course, it works with Windows paths as well:
>>> wpath = r"C:\foo\bar\baz\file"
>>> PurePath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PureWindowsPath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(wpath).parts
('C:\\foo\\bar\\baz\\file',)
>>>
>>> wpath = r"C:\foo/bar/baz/file"
>>> PurePath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PureWindowsPath(wpath).parts
('C:\\', 'foo', 'bar', 'baz', 'file')
>>> PurePosixPath(wpath).parts
('C:\\foo', 'bar', 'baz', 'file')
Huzzah for Python devs constantly improving the language!
The OP specified "will work with Windows paths too". There are a few wrinkles with Windows paths.
Firstly, Windows has the concept of multiple drives, each with its own current working directory, and 'c:foo'
and 'c:\\foo'
are often not the same. Consequently it is a very good idea to separate out any drive designator first, using os.path.splitdrive(). Then reassembling the path (if required) can be done correctly by
drive + os.path.join(*other_pieces)
Secondly, Windows paths can contain slashes or backslashes or a mixture. Consequently, using os.sep
when parsing an unnormalised path is not useful.
More generally:
The results produced for 'foo'
and 'foo/'
should not be identical.
The loop termination condition seems to be best expressed as "os.path.split() treated its input as unsplittable".
Here's a suggested solution, with tests, including a comparison with @Spacedman's solution
import os.path
def os_path_split_asunder(path, debug=False):
parts = []
while True:
newpath, tail = os.path.split(path)
if debug: print repr(path), (newpath, tail)
if newpath == path:
assert not tail
if path: parts.append(path)
break
parts.append(tail)
path = newpath
parts.reverse()
return parts
def spacedman_parts(path):
components = []
while True:
(path,tail) = os.path.split(path)
if not tail:
return components
components.insert(0,tail)
if __name__ == "__main__":
tests = [
'',
'foo',
'foo/',
'foo\\',
'/foo',
'\\foo',
'foo/bar',
'/',
'c:',
'c:/',
'c:foo',
'c:/foo',
'c:/users/john/foo.txt',
'/users/john/foo.txt',
'foo/bar/baz/loop',
'foo/bar/baz/',
'//hostname/foo/bar.txt',
]
for i, test in enumerate(tests):
print "\nTest %d: %r" % (i, test)
drive, path = os.path.splitdrive(test)
print 'drive, path', repr(drive), repr(path)
a = os_path_split_asunder(path)
b = spacedman_parts(path)
print "a ... %r" % a
print "b ... %r" % b
print a == b
and here's the output (Python 2.7.1, Windows 7 Pro):
Test 0: ''
drive, path '' ''
a ... []
b ... []
True
Test 1: 'foo'
drive, path '' 'foo'
a ... ['foo']
b ... ['foo']
True
Test 2: 'foo/'
drive, path '' 'foo/'
a ... ['foo', '']
b ... []
False
Test 3: 'foo\\'
drive, path '' 'foo\\'
a ... ['foo', '']
b ... []
False
Test 4: '/foo'
drive, path '' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False
Test 5: '\\foo'
drive, path '' '\\foo'
a ... ['\\', 'foo']
b ... ['foo']
False
Test 6: 'foo/bar'
drive, path '' 'foo/bar'
a ... ['foo', 'bar']
b ... ['foo', 'bar']
True
Test 7: '/'
drive, path '' '/'
a ... ['/']
b ... []
False
Test 8: 'c:'
drive, path 'c:' ''
a ... []
b ... []
True
Test 9: 'c:/'
drive, path 'c:' '/'
a ... ['/']
b ... []
False
Test 10: 'c:foo'
drive, path 'c:' 'foo'
a ... ['foo']
b ... ['foo']
True
Test 11: 'c:/foo'
drive, path 'c:' '/foo'
a ... ['/', 'foo']
b ... ['foo']
False
Test 12: 'c:/users/john/foo.txt'
drive, path 'c:' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False
Test 13: '/users/john/foo.txt'
drive, path '' '/users/john/foo.txt'
a ... ['/', 'users', 'john', 'foo.txt']
b ... ['users', 'john', 'foo.txt']
False
Test 14: 'foo/bar/baz/loop'
drive, path '' 'foo/bar/baz/loop'
a ... ['foo', 'bar', 'baz', 'loop']
b ... ['foo', 'bar', 'baz', 'loop']
True
Test 15: 'foo/bar/baz/'
drive, path '' 'foo/bar/baz/'
a ... ['foo', 'bar', 'baz', '']
b ... []
False
Test 16: '//hostname/foo/bar.txt'
drive, path '' '//hostname/foo/bar.txt'
a ... ['//', 'hostname', 'foo', 'bar.txt']
b ... ['hostname', 'foo', 'bar.txt']
False
Someone said "use os.path.split
". This got deleted unfortunately, but it is the right answer.
os.path.split(path)
Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ).
So it's not just splitting the dirname and filename. You can apply it several times to get the full path in a portable and correct way. Code sample:
dirname = path
path_split = []
while True:
dirname, leaf = split(dirname)
if leaf:
path_split = [leaf] + path_split #Adds one element, at the beginning of the list
else:
#Uncomment the following line to have also the drive, in the format "Z:\"
#path_split = [dirname] + path_split
break
Please credit the original author if that answer gets undeleted.
Use the functionality provided in os.path
, e.g.
os.path.split(path)
Like written elsewhere you can call it multiple times to split longer paths.
Here's an explicit implementation of the approach that just iteratively
uses os.path.split
; uses a slightly different loop termination condition than the accepted answer.
def splitpath(path):
parts=[]
(path, tail)=os.path.split( path)
while path and tail:
parts.append( tail)
(path,tail)=os.path.split(path)
parts.append( os.path.join(path,tail) )
return map( os.path.normpath, parts)[::-1]
This should satisfy os.path.join( *splitpath(path) )
is path
in the sense that they both indicate the same file/directory.
Tested in linux:
In [51]: current='/home/dave/src/python'
In [52]: splitpath(current)
Out[52]: ['/', 'home', 'dave', 'src', 'python']
In [53]: splitpath(current[1:])
Out[53]: ['.', 'dave', 'src', 'python']
In [54]: splitpath( os.path.join(current, 'module.py'))
Out[54]: ['/', 'home', 'dave', 'src', 'python', 'module.py']
In [55]: splitpath( os.path.join(current[1:], 'module.py'))
Out[55]: ['.', 'dave', 'src', 'python', 'module.py']
I hand checked a few of the DOS paths, using the by replacing os.path
with ntpath
module, look OK to me, but I'm not too familiar with the ins and outs of DOS paths.
Use the functionality provided in os.path, e.g.
os.path.split(path)
(This answer was by someone else and was mysteriously and incorrectly deleted, since it's a working answer; if you want to split each part of the path apart, you can call it multiple times, and each call will pull a component off of the end.)
One more try with maxplit option, which is a replacement for os.path.split()
def pathsplit(pathstr, maxsplit=1):
"""split relative path into list"""
path = [pathstr]
while True:
oldpath = path[:]
path[:1] = list(os.path.split(path[0]))
if path[0] == '':
path = path[1:]
elif path[1] == '':
path = path[:1] + path[2:]
if path == oldpath:
return path
if maxsplit is not None and len(path) > maxsplit:
return path
So keep using os.path.split until you get to what you want. Here's an ugly implementation using an infinite loop:
import os.path
def parts(path):
components = []
while True:
(path,tail) = os.path.split(path)
if tail == "":
components.reverse()
return components
components.append(tail)
Stick that in parts.py, import parts, and voila:
>>> parts.parts("foo/bar/baz/loop")
['foo', 'bar', 'baz', 'loop']
Probably a nicer implementation using generators or recursion out there...
精彩评论