Is Python's os.path.join slow?
I've been told os.path.join
is horribly slow in python and I should use string concatenation ('%s/%s' % (x, y)
) instead. Is there really that big a difference and if so how ca开发者_StackOverflown I track it?
$ python -mtimeit -s 'import os.path' 'os.path.join("/root", "file")'
1000000 loops, best of 3: 1.02 usec per loop
$ python -mtimeit '"/root" + "file"'
10000000 loops, best of 3: 0.0223 usec per loop
So yes, it's nearly 50 times slower. 1 microsecond is still nothing though, so I really wouldn't factor the difference in. Use os.path.join
: it's cross-platform, more readable and less bug-prone.
EDIT: Two people have now commented that the import
explains the difference. This is not true, as -s
is a setup flag thus the import
is not factored into the reported runtime. Read the docs.
I don't know who told you not to use it, but they're wrong.
- Even if it were slow, it would never be slow to a program-breaking extent. I've never noticed it being remotely slow.
- It's key to cross-platform programming. Line separators etc. differ by platform, and
os.path.join
will always join paths correctly regardless of platform. - Readability. Everyone knows what
join
is doing. People might have to do a double take for string concatenation for paths.
Also be aware that periods in function calls are known to be slow. Compare:
python -mtimeit -s "import os.path;x=range(10)" "os.path.join(x)"
1000000 loops, best of 3: 0.405 usec per loop
python -mtimeit -s "from os.path import join;x=range(10)" "join(x)"
1000000 loops, best of 3: 0.29 usec per loop
So that's a slowdown of 40% just by having periods in your function invocation syntax.
Curiously, these two are different speeds:
$ python -mtimeit -s "from os.path import sep;join=sep.join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.253 usec per loop
$ python -mtimeit -s "from os.path import join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.285 usec per loop
It may be nearly 50 times faster, but unless you're doing it in a CPU bound tight inner loop, the speed difference isn't going to matter at all. The portability difference on the other hand will make the difference between whether or not your program can be easily ported to a non-Unix platform or not.
So, please use os.path.join
unless you've profiled and discovered that it really is a major impediment to your program's performance.
You should use os.path.join
simply for portability.
I don't get the point of comparing os.path.join
(which works for any number or parts, on any platform) with something as trivial as string formatting two paths.
To answer the question in the title, "Is Python's os.path.join slow?" you have to at least compare it with a remotely similar function to find out what speed you can expect from a function like this.
As you can see below, compared to a similar function, there is nothing slow about os.path.join
:
python -mtimeit -s "x = tuple(map(str, range(10)))" "'/'.join(x)"
1000000 loops, best of 3: 0.26 usec per loop
python -mtimeit -s "from os.path import join;x = tuple(range(10))" "join(x)"
1000000 loops, best of 3: 0.27 usec per loop
python -mtimeit -s "x = tuple(range(3))" "('/%s'*len(x)) % x"
1000000 loops, best of 3: 0.456 usec per loop
python -mtimeit -s "x = tuple(map(str, range(3)))" "'/'.join(x)"
10000000 loops, best of 3: 0.178 usec per loop
In this hot controversy, I dare to propose:
(I know, I know , there is timeit, but I'm not so trained with timeit, and clock() seems to me to be sufficient for the case)
import os
from time import clock
separ = os.sep
ospath = os.path
ospathjoin = os.path.join
A,B,C,D,E,F,G,H = [],[],[],[],[],[],[],[]
n = 1000
for essays in xrange(100):
te = clock()
for i in xrange(n):
xa = os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
A.append(clock()-te)
te = clock()
for i in xrange(n):
xb = ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
B.append(clock()-te)
te = clock()
for i in xrange(n):
xc = ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
C.append(clock()-te)
te = clock()
for i in xrange(n):
xd = 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
D.append(clock()-te)
te = clock()
for i in xrange(n):
xe = '%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
E.append(clock()-te)
te = clock()
for i in xrange(n):
xf = 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
F.append(clock()-te)
te = clock()
for i in xrange(n):
xg = os.sep.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
G.append(clock()-te)
te = clock()
for i in xrange(n):
xh = separ.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
H.append(clock()-te)
print min(A), "os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(B), "ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(C), "ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(D), "'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'"
print min(E), "'%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(F), "'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'"
print min(G), "os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(H), "separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print 'xa==xb==xc==xd==xe==xf==xg==xh==',xa==xb==xc==xd==xe==xf==xg==xh
result
0.0284533369465 os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.0277652606686 ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.0272489939364 ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00398598145854 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
0.00375075603184 '%s\%s\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00330824168994 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
0.00292467338726 os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00261401937956 separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
True
with
separ = os.sep
ospath = os.path
ospathjoin = os.path.join
Everyone sholud know one inevident feature of os.path.join()
os.path.join( 'a', 'b' ) == 'a/b'
os.path.join( 'a', '/b' ) == '/b'
精彩评论