How can I move forward to a subsequent portion of text once I've already printed a searched portion of text in Python?

2023-03-03 00:31 问答作者：

I would like to search through a text file and print out a line and its subsequent 3 lines if a keyword is found in the line AND a different keyword is found within the subsequent 3 lines.

My code right now prints too much information. Is there a way to move forward to the next section of text once a portion is already printed?

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
"""

text2 = open("tmp.txt","w")
text2.write(text)
text2.close()

searchlines = open("tmp.txt").readlines()

data = []

for m, line in enumerate(searchlines):
    line = line.lower()
    if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):
        for line2 in searchlines[m:m+4]:
            data.append(line2)
print ''.join(data)

The output right now is:

I want to print out this line and the following 3 lines only once keyword 2
print this line 开发者_JAVA技巧since it has a keyword2 3
print this line keyword 4
print this line 5
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14

I would like it to print out only:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12

So, as someone else has pointed out, your first keyword keyword is a substring of your second keyword keyword2. So I've implemented this using regexp objects, so that you can use the word boundary anchor \b.

import re
from StringIO import StringIO

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
"""


def my_scan(data,search1,search2):
  buffer = []
  for line in data:
    buffer.append(line)
    if len(buffer) > 4:
      buffer.pop(0)
    if len(buffer) == 4: # Valid search block
      if search1.search(buffer[0]) and search2.search("\n".join(buffer[1:3])):
        for item in buffer:
          yield item
        buffer = []

# First search term
s1 = re.compile(r'\bkeyword\b')
s2 = re.compile(r'\bkeyword2\b')

for row in my_scan(StringIO(text),s1,s2):
  print row.rstrip()

Produces:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12

So you want to print out all blocks of 4 lines containing more than 2 keywords?

Anyway, thats what I've just came up with. Maybe you can use it:

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
""".splitlines()

keywords = ['keyword', 'keyword2']

buffer, kw = [], set()
for line in text:
    if len(buffer) == 0:                 # first line of a block
        for k in keywords:
            if k in line:
                kw.add(k)
                buffer.append(line)
                continue
    else:                                # continuous lines
        buffer.append(line)
        for k in keywords:
            if k in line:
                kw.add(k)
        if len(buffer) > 3:
            if len(kw) >= 2:             # just print blocks with enough keywords
                print '\n'.join(buffer)
            buffer, kw = [], set()

Your keywords are overlapping: "keyword" is a subset of "keyword2".

Also, your data implies you don't want to see line 13 but acc. to the problem statement it should be printed.

I changed your first keyword from "keyword" to "firstkey" like this and your code works (except for line 13).

$ diff /tmp/q /tmp/q2
4c4
< I want to print out this line and the following 3 lines only once keyword 2
---
> I want to print out this line and the following 3 lines only once firstkey 2
6c6
< print this line keyword 4
---
> print this line firstkey 4
11,12c11,12
< I want to print out this line again and the following 3 lines only once keyword 9
< please print this line keyword 10
---
> I want to print out this line again and the following 3 lines only once firstkey 9
> please print this line firstkey 10
30c30
<     if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):
---
>     if "firstkey" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):

First, you could correct your code like that:

text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""
searchlines = map(str.lower,text.splitlines(1))
# splitlines(1) with argument 1 keeps the newlines

data,again = [],-1

for m, line in enumerate(searchlines):
    if "keyword" in line and m>again and "keyword2" in ''.join(searchlines[m:m+4]):
        data.extend(searchlines[m:m+4])
        again = m+4

print ''.join(data)

Second, a short regex solution is

text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""

import re

regx = re.compile('(^.*?(?<=[ \t]){0}(?=[ \t]).*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?(?(1)|(?(2)|{1})).*)'.\
                  format('keyword','keyword2'),re.MULTILINE|re.IGNORECASE)

print '\n'.join(m.group(1) for m in regx.finditer(text))

result

A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13

继续阅读：printing python search text

How can I move forward to a subsequent portion of text once I've already printed a searched portion of text in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？