Returning the lowest index for the first non whitespace character in a string in Python
What's the shortest way to do this in Python?
开发者_如何学Pythonstring = " xyz"
must return index = 3
>>> s = " xyz"
>>> len(s) - len(s.lstrip())
3
>>> next(i for i, j in enumerate(' xyz') if j.strip())
3
or
>>> next(i for i, j in enumerate(' xyz') if j not in string.whitespace)
3
in versions of Python < 2.5 you'll have to do:
(...).next()
Looks like the "regexes can do anything" brigade have taken the day off, so I'll fill in:
>>> tests = [u'foo', u' foo', u'\xA0foo']
>>> import re
>>> for test in tests:
... print len(re.match(r"\s*", test, re.UNICODE).group(0))
...
0
1
1
>>>
FWIW: time taken is O(the_answer), not O(len(input_string))
Many of the previous solutions are iterating at several points in their proposed solutions. And some make copies of the data (the string). re.match(), strip(), enumerate(), isspace()are duplicating behind the scene work. The
next(idx for idx, chr in enumerate(string) if not chr.isspace())
next(idx for idx, chr in enumerate(string) if not chr.whitespace)
are good choices for testing strings against various leading whitespace types such as vertical tabs and such, but that adds costs too.
However if your string uses just a space characters or tab charachers then the following, more basic solution, clear and fast solution also uses the less memory.
def get_indent(astr):
"""Return index of first non-space character of a sequence else False."""
try:
iter(astr)
except:
raise
# OR for not raising exceptions at all
# if hasattr(astr,'__getitem__): return False
idx = 0
while idx < len(astr) and astr[idx] == ' ':
idx += 1
if astr[0] <> ' ':
return False
return idx
Although this may not be the absolute fastest or simpliest visually, some benefits with this solution are that you can easily transfer this to other languages and versions of Python. And is likely the easiest to debug, as there is little magic behavior. If you put the meat of the function in-line with your code instead of in a function you'd remove the function call part and would make this solution similar in byte code to the other solutions.
Additionally this solution allows for more variations. Such as adding a test for tabs
or astr[idx] == '\t':
Or you can test the entire data as iterable once instead of checking if each line is iterable. Remember things like ""[0] raises an exception whereas ""[0:] does not.
If you wanted to push the solution to inline you could go the non-Pythonic route:
i = 0
while i < len(s) and s[i] == ' ': i += 1
print i
3
. .
import re
def prefix_length(s):
m = re.match('(\s+)', s)
if m:
return len(m.group(0))
return 0
>>> string = " xyz"
>>> next(idx for idx, chr in enumerate(string) if not chr.isspace())
3
>>> string = " xyz"
>>> map(str.isspace,string).index(False)
3
精彩评论