开发者

Python - Ignore FIRST character (tab) every line when reading

This is a continuation of my former questions (check them if you are curious).

I can already see the light at the end of the tunnel, but I've got a开发者_高级运维 last problem.

For some reason, every line starts with a TAB character.

How can I ignore that first character ("tab" (\t) in my case)?

filename = "terem.txt"

OraRend = collections.namedtuple('OraRend', 'Nap, OraKezdese, OraBefejezese, Azonosito, Terem, OraNeve, Emelet')


csv.list_dialects()
for line in csv.reader(open(filename, "rb"), delimiter='\t', lineterminator='\t\t', doublequote=False, skipinitialspace=True):
    print line  
    orar = OraRend._make(line) # Here comes the trouble!

The text file:

http://pastebin.com/UYg4P4J1

(Can't really paste it here with all the tabs.)

I have found lstrip, strip and other methods, all of them would eat all the chars, so the filling of the tuple would fail.


You could do line = line[1:] to just strip the first character. But if you do this, you should add an assertion that the first character is indeed a tab, to avoid mangling data without leading tab.

There is an easier alternative that also handles several other cases and doesn't break things if the things to be removed aren't there. You can strip all leading and trailing whitespace with line = line.strip(). Alternatively, use .lstrip() to strip only leading whitespace, and add '\t' as argument to either method call if you want to leave other whitespace in place and just remove tabs.


To remove the first character from a string:

>>> s = "Hello"
>>> s
'Hello'
>>> s[1:]
'ello'


From the docs:

str.lstrip([chars])

Return a copy of the string with leading characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix; rather, all combinations of its values are stripped

If you want to only remove the tab at the beginning of a line, use

str.lstrip("\t")

This has the benefit that you don't have to check to make sure the first character is, in fact, a tab. However, if there are cases when there are more than one tab, and you want to keep the second tab and on, you're going to have to use str[1:].


Consider this. You don't need to pass a "file" to csv.reader. A file-line object that is a sequence of string values works nicely.

filename = "terem.txt"

OraRend = collections.namedtuple('OraRend', 'Nap, OraKezdese, OraBefejezese, Azonosito, Terem, OraNeve, Emelet')

with open(filename, "rb") as source:
    cleaned = ( line.lstrip() for line in source )
    rdr= csv.reader( cleaned, delimiter='\t', lineterminator='\t\t', doublequote=False, skipinitialspace=True)
    for line in rdr
        print line  
        orar = OraRend._make(line) # Here comes the trouble!
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜