How to split a dos path into its components in Python
I have a string variable which represents a dos path e.g:
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
I want to split this string into:
[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]
I have tried using split()
and replace()
but they either only process the first backslash or they insert hex numbers into the string.
I need to convert this string variable into a raw string somehow so that I can parse it.
What's the best way to do this?
I should also add that the contents of var
i.e. the path that I'm trying to parse, is actually the return value of a command line query. It's not path data that I generate mysel开发者_运维知识库f. Its stored in a file, and the command line tool is not going to escape the backslashes.
I would do
import os
path = os.path.normpath(path)
path.split(os.sep)
First normalize the path string into a proper string for the OS. Then os.sep
must be safe to use as a delimiter in string function split.
I've been bitten loads of times by people writing their own path fiddling functions and getting it wrong. Spaces, slashes, backslashes, colons -- the possibilities for confusion are not endless, but mistakes are easily made anyway. So I'm a stickler for the use of os.path
, and recommend it on that basis.
(However, the path to virtue is not the one most easily taken, and many people when finding this are tempted to take a slippery path straight to damnation. They won't realise until one day everything falls to pieces, and they -- or, more likely, somebody else -- has to work out why everything has gone wrong, and it turns out somebody made a filename that mixes slashes and backslashes -- and some person suggests that the answer is "not to do that". Don't be any of these people. Except for the one who mixed up slashes and backslashes -- you could be them if you like.)
You can get the drive and path+file like this:
drive, path_and_file = os.path.splitdrive(path)
Get the path and the file:
path, file = os.path.split(path_and_file)
Getting the individual folder names is not especially convenient, but it is the sort of honest middling discomfort that heightens the pleasure of later finding something that actually works well:
folders = []
while 1:
path, folder = os.path.split(path)
if folder != "":
folders.append(folder)
elif path != "":
folders.append(path)
break
folders.reverse()
(This pops a "\"
at the start of folders
if the path was originally absolute. You could lose a bit of code if you didn't want that.)
In Python >=3.4 this has become much simpler. You can now use pathlib.Path.parts
to get all the parts of a path.
Example:
>>> from pathlib import Path
>>> Path('C:/path/to/file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> Path(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
On a Windows install of Python 3 this will assume that you are working with Windows paths, and on *nix it will assume that you are working with posix paths. This is usually what you want, but if it isn't you can use the classes pathlib.PurePosixPath
or pathlib.PureWindowsPath
as needed:
>>> from pathlib import PurePosixPath, PureWindowsPath
>>> PurePosixPath('/path/to/file.txt').parts
('/', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'C:\path\to\file.txt').parts
('C:\\', 'path', 'to', 'file.txt')
>>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts
('\\\\host\\share\\', 'path', 'to', 'file.txt')
Edit: There is also a backport to python 2 available: pathlib2
You can simply use the most Pythonic approach (IMHO):
import os
your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
path_list = your_path.split(os.sep)
print path_list
Which will give you:
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
The clue here is to use os.sep
instead of '\\'
or '/'
, as this makes it system independent.
To remove colon from the drive letter (although I don't see any reason why you would want to do that), you can write:
path_list[0] = path_list[0][0]
For a somewhat more concise solution, consider the following:
def split_path(p):
a,b = os.path.split(p)
return (split_path(a) if len(a) and len(b) else []) + [b]
The problem here starts with how you're creating the string in the first place.
a = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
Done this way, Python is trying to special case these: \s
, \m
, \f
, and \T
. In your case, \f
is being treated as a formfeed (0x0C) while the other backslashes are handled correctly. What you need to do is one of these:
b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt" # doubled backslashes
c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt" # raw string, no doubling necessary
Then once you split either of these, you'll get the result you want.
I can't actually contribute a real answer to this one (as I came here hoping to find one myself), but to me the number of differing approaches and all the caveats mentioned is the surest indicator that Python's os.path module desperately needs this as a built-in function.
The stuff about about mypath.split("\\")
would be better expressed as mypath.split(os.sep)
. sep
is the path separator for your particular platform (e.g., \
for Windows, /
for Unix, etc.), and the Python build knows which one to use. If you use sep
, then your code will be platform agnostic.
The functional way, with a generator.
def split(path):
(drive, head) = os.path.splitdrive(path)
while (head != os.sep):
(head, tail) = os.path.split(head)
yield tail
In action:
>>> print([x for x in split(os.path.normpath('/path/to/filename'))])
['filename', 'to', 'path']
You can recursively os.path.split
the string
import os
def parts(path):
p,f = os.path.split(path)
return parts(p) + [f] if f else [p]
Testing this against some path strings, and reassembling the path with os.path.join
>>> for path in [
... r'd:\stuff\morestuff\furtherdown\THEFILE.txt',
... '/path/to/file.txt',
... 'relative/path/to/file.txt',
... r'C:\path\to\file.txt',
... r'\\host\share\path\to\file.txt',
... ]:
... print parts(path), os.path.join(*parts(path))
...
['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt
['/', 'path', 'to', 'file.txt'] /path\to\file.txt
['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt
['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt
['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt
The first element of the list may need to be treated differently depending on how you want to deal with drive letters, UNC paths and absolute and relative paths. Changing the last [p]
to [os.path.splitdrive(p)]
forces the issue by splitting the drive letter and directory root out into a tuple.
import os
def parts(path):
p,f = os.path.split(path)
return parts(p) + [f] if f else [os.path.splitdrive(p)]
[('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
[('', '/'), 'path', 'to', 'file.txt']
[('', ''), 'relative', 'path', 'to', 'file.txt']
[('C:', '\\'), 'path', 'to', 'file.txt']
[('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt']
Edit: I have realised that this answer is very similar to that given above by user1556435. I'm leaving my answer up as the handling of the drive component of the path is different.
really easy and simple way to do it:
var.replace('\\', '/').split('/')
It works for me:
>>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> a.split("\\")
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
Sure you might need to also strip out the colon from the first component, but keeping it makes it possible to re-assemble the path.
The r
modifier marks the string literal as "raw"; notice how embedded backslashes are not doubled.
I use the following as since it uses the os.path.basename function it doesn't add any slashes to the returned list. It also works with any platform's slashes: i.e window's \\\\
or unix's /
. And furthermore, it doesn't add the \\\\\\\\
that windows uses for server paths :)
def SplitPath( split_path ):
pathSplit_lst = []
while os.path.basename(split_path):
pathSplit_lst.append( os.path.basename(split_path) )
split_path = os.path.dirname(split_path)
pathSplit_lst.reverse()
return pathSplit_lst
So for:
\\\\\\\server\\\\folder1\\\\folder2\\\\folder3\\\\folder4
You get:
['server','folder1','folder2','folder3','folder4']
Just like others explained - your problem stemmed from using \
, which is escape character in string literal/constant. OTOH, if you had that file path string from another source (read from file, console or returned by os function) - there wouldn't have been problem splitting on '\\' or r'\'.
And just like others suggested, if you want to use \
in program literal, you have to either duplicate it \\
or the whole literal has to be prefixed by r
, like so r'lite\ral'
or r"lite\ral"
to avoid the parser converting that \
and r
to CR (carriage return) character.
There is one more way though - just don't use backslash \
pathnames in your code! Since last century Windows recognizes and works fine with pathnames which use forward slash as directory separator /
! Somehow not many people know that.. but it works:
>>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt"
>>> var.split('/')
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
This by the way will make your code work on Unix, Windows and Mac... because all of them do use /
as directory separator... even if you don't want to use the predefined constants of module os
.
Let assume you have have a file filedata.txt
with content:
d:\stuff\morestuff\furtherdown\THEFILE.txt
d:\otherstuff\something\otherfile.txt
You can read and split the file paths:
>>> for i in open("filedata.txt").readlines():
... print i.strip().split("\\")
...
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
['d:', 'otherstuff', 'something', 'otherfile.txt']
re.split() can help a little more then string.split()
import re
var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
re.split( r'[\\/]', var )
['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
If you also want to support Linux and Mac paths, just add filter(None,result), so it will remove the unwanted '' from the split() since their paths starts with '/' or '//'. for example '//mount/...' or '/var/tmp/'
import re
var = "/var/stuff/morestuff/furtherdown/THEFILE.txt"
result = re.split( r'[\\/]', var )
filter( None, result )
['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt']
Below line of code can handle:
- C:/path/path
- C://path//path
- C:\path\path
- C:\path\path
path = re.split(r'[///\]', path)
One recursive for the fun.
Not the most elegant answer, but should work everywhere:
import os
def split_path(path):
head = os.path.dirname(path)
tail = os.path.basename(path)
if head == os.path.dirname(head):
return [tail]
return split_path(head) + [tail]
Adapted the solution of @Mike Robins avoiding empty path elements at the beginning:
def parts(path):
p,f = os.path.split(os.path.normpath(path))
return parts(p) + [f] if f and p else [p] if p else []
os.path.normpath()
is actually required only once and could be done in a separate entry function to the recursion.
I'm not actually sure if this fully answers the question, but I had a fun time writing this little function that keeps a stack, sticks to os.path-based manipulations, and returns the list/stack of items.
def components(path):
ret = []
while len(path) > 0:
path, crust = split(path)
ret.insert(0, crust)
return ret
from os import path as os_path
and then
def split_path_iter(string, lst):
head, tail = os_path.split(string)
if head == '':
return [string] + lst
else:
return split_path_iter(head, [tail] + lst)
def split_path(string):
return split_path_iter(string, [])
or, inspired by the above answers (more elegant):
def split_path(string):
head, tail = os_path.split(string)
if head == '':
return [string]
else:
return split_path(head) + [tail]
It is a shame! python os.path doesn't have something like os.path.splitall
anyhow, this is what works for me, credit: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s16.html
import os
a = '/media//max/Data/'
def splitall(path):
# https://www.oreilly.com/library/view/python-cookbook/0596001673/ch04s16.html
allparts = []
while 1:
parts = os.path.split(path)
if parts[0] == path: # sentinel for absolute paths
allparts.insert(0, parts[0])
break
elif parts[1] == path: # sentinel for relative paths
allparts.insert(0, parts[1])
break
else:
path = parts[0]
allparts.insert(0, parts[1])
return allparts
x = splitall(a)
print(x)
z = os.path.join(*x)
print(z)
output:
['/', 'media', 'max', 'Data', '']
/media/max/Data/
use ntpath.split()
精彩评论