split a file based on string

2023-03-02 04:43 问答作者：

I am trying to split one big file into individual entries. Each entry ends with the character “//”. So when I try to use

#!/usr/bin/python
import sys,os   
uniprotFile=open("UNIPROT-data.txt") #read original alignment file  
uniprotFileContent=uniprotFile.read() 
uniprotFileList=uniprotFileContent.split("//")
for items in uniprotFileList:
        seqInfoFile=open('%s.dat'%items[5:14],'w')
        seqInfoFile.write(str(items))

But I realised that there is another string with “//“(http://www.uniprot.org/terms) hence it splits there as well and eventually I d开发者_Go百科on’t get the result I want. I tried using regex but was not abler to figure it out.

Use a regex that only splits on // if it's not preceded by :

import re
myre = re.compile("(?<!:)//")
uniprotFileList = myre.split(uniprotFileContent)

I am using the code with modified split pattern and it works fine for me:

#!/usr/bin/python
import sys,os   
uniprotFile = open("UNIPROT-data.txt")   
uniprotFileContent = uniprotFile.read()
uniprotFileList = uniprotFileContent.split("//\n")
for items in uniprotFileList:
    seqInfoFile = open('%s.dat' % items[5:17], 'w')
    seqInfoFile.write(str(items))

You're confusing \ (backslash) and / (slash). You don't need to escape a slash, just use "/". For a backslash, you do need to escape it, so use "\\".

Secondly, if you split with a backslash it will not split on a slash or vice-versa.

Split using a regular exception that doesn't permit the "http:" part before your // marker. For example: "([^:])\/\/"

You appear to be splitting on the wrong characters. Based on your question, you should split on r"\", not "//". Open a prompt and inspect the strings you're using. You'll see something like:

>>> "\\"
'\\'
>>> "\"
SyntaxError
>>> r"\"
'\\'
>>> "//"
'//'

So, you can use "\" or r"\" (I recommend r"\" for clarity in splitting and regex operations.

继续阅读：python

split a file based on string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？