开发者

why doesn't line.split('\s') do the same as line.split()?

I have a very very simple program that parses a csv file that has rows of text records whose columns are separated by a single tab character.

I understand split() by default splits on whitespace so explicitly specifying a whitespace pattern isn't needed, but my question is why won't an explicitly specified pattern for whitespace work? Or is '\s' or r'\s' not the right pattern/regex? I searched on stackoverflow and found mentioning of string split() being an older method, which I don't really understand since I am very new to python. Does string split() not support regex?

Here is my code:

#!/usr/bin/env python
import os
import re
import sys

f = open(sys.argv[1])
for line in f:
    field = line.split()
    field2 = line.split('\s')
    print field[1], field2[1开发者_运维知识库]
f.close

I tried doing line.split(r'\s') and that doesn't work either, but line.split('\t') works.


Because \t really represents a tab character in a string (like \n is the new line character, see here a list of valid escape sequences), but \s is a special regular expression character class for white spaces.

str.split[docs] does not accept regular expressions. If you want to split with regular expressions, you have to use re.split[docs].

Demonstration:

>>> import re
>>> str = "This\sis a weird\sstring"
>>> str.split("\s")                    # treated literally
['This', 'is a weird', 'string'] 
>>> re.split("\s", str)                # regex
['This\\sis', 'a', 'weird\\sstring']   


string.split() takes a string as it's argument, and splits based on that string. That's all. \t is a, ASCII tab character, while \s is simply \+s in this case.

For a regex split, you want to import re and use re.split().


The string.split() function does not take a regular expression parameter. Try re.split():

>>> import re
>>> re.split(r"\s+", "a  b")
['a', 'b']
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜