Extracting text in between strings
How do I extract text in between strings with very specific pattern from a file full of these lines? Ex:
input:a_log.gz:make=BMW&ye开发者_StackOverflow社区ar=2000&owner=Peter
I want to essentially capture the part make=BMW&year=2000
. I know for a fact that the line can start out as "input:(any number of characters).gz:" and end with "owner=Peter"
Use the regex: input:.*?\.gz:(.*?)&?owner=Peter
. The capture will contain the things between the second colon and "owner=Peter", trimming the ampersand.
Give this a try:
sed -n 's/.*:\([^&]*&[^&]*\)&.*/\1/p' file
This will extract everything between the second colon and the second ampersand regardless of what's before and after (if there are more colons or ampersands it may not work properly).
you can use the shell(bash/ksh)
$ s="input:a_log.gz:make=BMW&year=2000&owner=Peter"
$ s=${s##*gz:}
$ echo ${s%%owner=Peter*}
make=BMW&year=2000&
if you want sed
$ echo ${s} | sed 's/input.*gz://;s/owner=Peter//'
make=BMW&year=2000&
>echo "input:a_log.gz:make=BMW&year=2000&owner=Peter"|sed -e "s/input:.*.gz://g" -e "s/&owner.*//g"
make=BMW&year=2000
I didn't see an answer using awk
:
awk '{ match($0, /input:.*\.gz:/);
m = RSTART+RLENGTH;
n = index($0, "&owner=Peter") - m;
print substr($0,m,n)
}'
The method is sort of a mix between the sh
version (substring by parameter expansions) and the sed
(regular expressions) versions. This is because awk
RE's lack backreferences.
精彩评论