开发者

Formatted Input in Python

I have a peculiar problem. I need to read (from a txt file) using python only those substrings that are present at predefined range of offsets. Let's say 5-8 and 12-16.

For example, if a line in the file is something like:

abcdefghi akdhflskdhfhglskdjfhghsldk

then I would like to read the two words - "efgh" and "kdhfl". 开发者_如何学编程Because, in the word "efgh", the offset of character "e" is 5 and that of "h" is 8. Similarly, the other word "kdhfl".

Please note that the whitespaces also add to the offset. Infact, the white spaces in my file are not "consistenty occurring" in every line and cannot be depended upon to extract the words of interest. Which is why, I have to bank on the offsets.

I hope I've been able to make the question clear.

Awaiting answers!

Edit -

yes, the whitespace amount in each line can change and accounts for the offsets also. For example, consider these two lines -

abcz d 
a bc d 

In both cases, I view the offset of the final character "d" as the same. As I said, the white spaces in the file are not consistent and I cannot rely on them. I need to pick up the characters based on their offsets. Does your answer still hold?


assuming its a file,

for line in open("file"):
    print line[4:8] , line[11:16]


To extract pieces from offsets simply read each line into a string and then access a substring with a slice ([from:to]).

It's unclear what you're saying about the inconsistent whitespace. If whitespace adds to the offset, it must be consistent to be meaningful. If the whitespace amount can change but actually accounts for the offsets, you can't reliably extract your data.

In your added example, as long as d's offset stays the same, you can extract it with slicing.

>>> s = 'a bc d'
>>> s[5:6]
'd'
>>> s = 'abc  d'
>>> s[5:6]
'd'


What's to stop you from using a regular expression? Besides the whitespace do the offsets vary?

/.{4}(.{4}).{4}(.{4})/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜