Extract substructure from a text file using bash or python

2022-12-22 22:28 问答作者：

I have a huge text file, which follows the structure:

SET
TAG1
...
...
SET
...
SET
TAG2
...
...
SET
...
...

I would like to extract for a specific TAG, (i.e. TAG54) its individual "substructure", which would be

SET
TAG54
..开发者_运维百科.
...
SET

Each substructure, for a given TAG_i contains always:

first line:SET second line:TAG_i (in this case TAG54) an arbitrary number of lines last line:SET

I wonder what would be the best way to do this, whether in bash or python, so for a given TAG, one can "extract" this substructure.

Thanks

Here's a Python approach: you pass in the open file handle as the first argument, the tag number as second argument, and get back as the result a list of the relevant lines (including newline characters), or an empty line if the tag is not found in the file:

def lookfor(f, tagnum):
  tag = 'TAG%s\n' % tagnum
  for line in f:
    if line == tag:
       break
  else: # file finished, tag not found
    return []
  result = ['SET\n', tag]
  for line in f:
    result.append(line)
    if line == 'SET\n':
        break
  return result

This should be reasonably well-performing. If you want other forms of arguments and/or results, it shouldn't be hard to tweak accordingly, of course.

If your system's grep supports -P for perl regexp:

grep -P 'SET\nTAG54\n[.\n]*\nSET' file.txt

gawk:

BEGIN {
  state=0
}

state==0 && $0=="TAG54" {
  print "SET"
  state=1
}

state==1 {
  print
}

state==1 && $0=="SET" {
  exit
}

csplit -f tags input.txt '%^TAG54$%-1' '/^SET$/+1' '%.*%' '{*}'

$ awk -vRS="SET" '/TAG54/{print RT$0RT}' file
SET
TAG54
...
...
SET

if you are doing it with shell scripting, pass your shell variable to awk using -v. eg

#!/bin/bash
read -r -p "what's your tag? " tag
awk -vRS="SET" -vt="$tag" '$0~tag{print RT$0RT}' file

继续阅读：bash python

Extract substructure from a text file using bash or python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？