find line number of last occurance of pattern before specific line number

2023-03-06 12:37 问答作者：

I have a large file of events like so:

<event>
...
...multiple lines describing the event
...
</event>
<event>
...
... 
<event>

When a error occurs I get the line number where the error has occurred which always ends off being somewhere within the event tags. I want to split the file on events processed before the error occurred and from the error onwards. I know that I can do the split using

csplit -k filename line_number_to_split_on

What I need to do is find the line number of the previous event tag to the error line. The files are quite large. For example I has an error listed on line 1007425 and from looking at the file the event tag was on l开发者_如何转开发ine 1007397. I'd like a way to do this in shell script. Any ideas?

Given $LINE as the line number where the error occurs, and $FILE as the input file, you can do:

$ nl -ba $FILE | sed -n -e '/<event>/p' -e ${LINE}q | tail -1

(You can use the '=' operator in sed to get line numbers instead of nl, but I like nl better and = is not very portable. Also, it inserts additional newlines that are a bit of a pain.)

As an alternative to piping to tail, you could do:

$ nl -ba $FILE | sed -n -e '/<event>/h' -e$LINE'{x; p; q;}'

I'm not sure about performance on large files but it works.

#!/bin/sh
total=$(cat EVENTFILE |wc -l)
error=$1 ### Line number where error occurred
from=$((total-error))
num=$(tac EVENTFILE|awk '/<event>/{print NR}'|while read n; do
    echo ${n};
    if test ${n} -ge ${from}; then
        break;
    fi;
    done|tail -1)
echo $((total-num+1))

Test data.

 1  <event>
 2  .
 3  .
 4  .
 5  </event>
 6  <event>
 7  ..
 8  ..
 9  ..
10  </event>
11  <event>
12  ...
13  ...
14  ...
15  </event>

Output

foo@ell:/tmp/test$ ./test.sh 3
1
foo@ell:/tmp/test$ ./test.sh 8
6
foo@ell:/tmp/test$ ./test.sh 14
11

Your input looks like XML. The best way to do it would be to use an XML parser. Parsing XML by hand is not so much fun. Depending on the XML-Parser the start line numbers are part of the element metadata. (For example for SAX theres the Locator.)

Update:

It thought that using the right tool is a good idea. If you can't use a XML parser you have to write your own for your XML subset. You should start by looking at the XML standard and see which features you actually need. It would remove a lot of complexity if you did not have to support recursion, XML entities and XML CDATA. After you got this information your question can be answered.

继续阅读：bash sed shell xml

find line number of last occurance of pattern before specific line number

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？