开发者

split a file based on string

I am trying to split one big file into individual entries. Each entry ends with the character “//”. So when I try to use

#!/usr/bin/python
import sys,os   
uniprotFile=open("UNIPROT-data.txt") #read original alignment file  
uniprotFileContent=uniprotFile.read() 
uniprotFileList=uniprotFileContent.split("//")
for items in uniprotFileList:
        seqInfoFile=open('%s.dat'%items[5:14],'w')
        seqInfoFile.write(str(items))

But I realised that there is another string with “//“(http://www.uniprot.org/terms) hence it splits there as well and eventually I d开发者_Go百科on’t get the result I want. I tried using regex but was not abler to figure it out.


Use a regex that only splits on // if it's not preceded by :

import re
myre = re.compile("(?<!:)//")
uniprotFileList = myre.split(uniprotFileContent)


I am using the code with modified split pattern and it works fine for me:

#!/usr/bin/python
import sys,os   
uniprotFile = open("UNIPROT-data.txt")   
uniprotFileContent = uniprotFile.read()
uniprotFileList = uniprotFileContent.split("//\n")
for items in uniprotFileList:
    seqInfoFile = open('%s.dat' % items[5:17], 'w')
    seqInfoFile.write(str(items))


You're confusing \ (backslash) and / (slash). You don't need to escape a slash, just use "/". For a backslash, you do need to escape it, so use "\\".

Secondly, if you split with a backslash it will not split on a slash or vice-versa.


Split using a regular exception that doesn't permit the "http:" part before your // marker. For example: "([^:])\/\/"


You appear to be splitting on the wrong characters. Based on your question, you should split on r"\", not "//". Open a prompt and inspect the strings you're using. You'll see something like:

>>> "\\"
'\\'
>>> "\"
SyntaxError
>>> r"\"
'\\'
>>> "//"
'//'

So, you can use "\" or r"\" (I recommend r"\" for clarity in splitting and regex operations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜