why doesn't line.split('\s') do the same as line.split()?
I have a very very simple program that parses a csv file that has rows of text records whose columns are separated by a single tab character.
I understand split() by default splits on whitespace so explicitly specifying a whitespace pattern isn't needed, but my question is why won't an explicitly specified pattern for whitespace work? Or is '\s' or r'\s' not the right pattern/regex? I searched on stackoverflow and found mentioning of string split() being an older method, which I don't really understand since I am very new to python. Does string split() not support regex?
Here is my code:
#!/usr/bin/env python
import os
import re
import sys
f = open(sys.argv[1])
for line in f:
field = line.split()
field2 = line.split('\s')
print field[1], field2[1开发者_运维知识库]
f.close
I tried doing line.split(r'\s') and that doesn't work either, but line.split('\t') works.
Because \t
really represents a tab character in a string (like \n
is the new line character, see here a list of valid escape sequences), but \s
is a special regular expression character class for white spaces.
str.split
[docs] does not accept regular expressions. If you want to split with regular expressions, you have to use re.split
[docs].
Demonstration:
>>> import re
>>> str = "This\sis a weird\sstring"
>>> str.split("\s") # treated literally
['This', 'is a weird', 'string']
>>> re.split("\s", str) # regex
['This\\sis', 'a', 'weird\\sstring']
string.split()
takes a string as it's argument, and splits based on that string. That's all. \t
is a, ASCII tab character, while \s
is simply \
+s
in this case.
For a regex split, you want to import re
and use re.split()
.
The string.split()
function does not take a regular expression parameter. Try re.split()
:
>>> import re
>>> re.split(r"\s+", "a b")
['a', 'b']
精彩评论