string matching replace with case sensitive in python

2023-03-24 00:35 问答作者：

I am quite new in python & trying to do some new stuff.I have two list in a dictionary.Let's say,

List开发者_如何转开发1:                              List2:
Anterior                            cord
cuneate nucleus                     Medulla oblongata
nucleus                             Spinal cord
Intermediolateral nucleus           Spinal 
                                    sksdsj
british                             7

And I have some text lines as below:

<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>

I have to get return those line whose belongs string both from list1 & list2.So,I have tried with the following code:

result = ""
if list1 in line and list2 in line:
    i1 = re.sub('(?i)(\s+)(%s)(\s+)'%list1, '\\1<e1>\\2</e1>\\3', line)
    i2 = re.sub('(?i)(\s+)(%s)(\s+)'%list2, '\\1<e2>\\2</e2>\\3', i1)
    result = result + i2 + "\n"
    continue

But I am getting the following result:

<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate <e1>nucleus</e1> is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>

Here,Only the result line-4, I got that matches string from both list that is what I want.But, I don't want to get those line which match only one or no string(eg. result line-1 & 3).Also,if matches string from both list , should it tag them(eg. result line-2).

Any kind of help will be greatly appreciated.

Basically, you want to put some words in <e1> tags and other words in <e2> tags. Is that right?

If so, then something like this will do:

#!/usr/bin/python

from __future__ import print_function
import re

text = '''\
<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal cord.</s>
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>'''

list1 = ('Anterior', 'cuneate nucleus', 'Intermediolateral nucleus')
list2 = ('cord', 'Medulla oblongata', 'Spinal cord')

# put phrases in \b so that they match the whole words
re1 = re.compile("(%s)" % "|".join(r"\b%s\b" % i for i in list1), re.IGNORECASE)
re2 = re.compile("(%s)" % "|".join(r"\b%s\b" % i for i in list2), re.IGNORECASE)

for line in text.split("\n"):
    line = re1.sub(r"<e1>\1</e1>", line)
    line = re2.sub(r"<e2>\1</e2>", line)
    print(line)

Output:

<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the <e2>spinal cord</e2> and the <e2>medulla oblongata</e2>.</s>
<s id="3691284-1">In the <e2>medulla oblongata</e2>, the arcuate nucleus is a group of neurons located on the <e1>anterior</e1> surface of the medullary pyramids.</s>
<s id="21120-99"><e1>Anterior</e1> horn cells, motoneurons located in the <e2>spinal cord</e2>.</s>
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the <e2>spinal cord</e2>, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the <e2>spinal cord</e2>.</s>

How about this:

result = ""
lines = ['<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>',
'<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>',
'<s id="21120-99">Anterior horn cells, motoneurons located in the spinal cord.</s>',
'<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>']

for line in lines:
    for item1 in list1:
        if line.find(item1) != -1:
            for item2 in list2:
                if line.find(item2) != -1:
                      result = result + line + '\n'
                      break
            break
print result

继续阅读：python string-matching

string matching replace with case sensitive in python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？