find line number of last occurance of pattern before specific line number
I have a large file of events like so:
<event>
...
...multiple lines describing the event
...
</event>
<event>
...
...
<event>
When a error occurs I get the line number where the error has occurred which always ends off being somewhere within the event tags. I want to split the file on events processed before the error occurred and from the error onwards. I know that I can do the split using
csplit -k filename line_number_to_split_on
What I need to do is find the line number of the previous event tag to the error line. The files are quite large. For example I has an error listed on line 1007425 and from looking at the file the event tag was on l开发者_如何转开发ine 1007397. I'd like a way to do this in shell script. Any ideas?
Given $LINE as the line number where the error occurs, and $FILE as the input file, you can do:
$ nl -ba $FILE | sed -n -e '/<event>/p' -e ${LINE}q | tail -1
(You can use the '=' operator in sed to get line numbers instead of nl, but I like nl better and = is not very portable. Also, it inserts additional newlines that are a bit of a pain.)
As an alternative to piping to tail, you could do:
$ nl -ba $FILE | sed -n -e '/<event>/h' -e$LINE'{x; p; q;}'
I'm not sure about performance on large files but it works.
#!/bin/sh
total=$(cat EVENTFILE |wc -l)
error=$1 ### Line number where error occurred
from=$((total-error))
num=$(tac EVENTFILE|awk '/<event>/{print NR}'|while read n; do
echo ${n};
if test ${n} -ge ${from}; then
break;
fi;
done|tail -1)
echo $((total-num+1))
Test data.
1 <event>
2 .
3 .
4 .
5 </event>
6 <event>
7 ..
8 ..
9 ..
10 </event>
11 <event>
12 ...
13 ...
14 ...
15 </event>
Output
foo@ell:/tmp/test$ ./test.sh 3
1
foo@ell:/tmp/test$ ./test.sh 8
6
foo@ell:/tmp/test$ ./test.sh 14
11
Your input looks like XML. The best way to do it would be to use an XML parser. Parsing XML by hand is not so much fun. Depending on the XML-Parser the start line numbers are part of the element metadata. (For example for SAX theres the Locator.)
Update:
It thought that using the right tool is a good idea. If you can't use a XML parser you have to write your own for your XML subset. You should start by looking at the XML standard and see which features you actually need. It would remove a lot of complexity if you did not have to support recursion, XML entities and XML CDATA. After you got this information your question can be answered.
精彩评论