sed - how to remove everything but a defined pattern?
I have to remove everything but 1, 2, or 3 digits (0-9, or 10-99, or 100) preceding % (I don't want to see the %, though) from another command's output and pipe it forward to another command. I know that
sed -n '/%/p'
will show only the line(s) containing %, but that's not what I wa开发者_如何学Gont. How can I get rid of the rest of the unwanted text and leave only these numbers to then pipe them to another command?
If you're not completely tied to sed, this is exactly what grep -o
does:
grep -o '[0-9]\{1,3\}%'
EDIT: I have misunderstood the OP and posted an invalid answer. I changed it to an answer that, I believe, would solve the problem in the more general scenario.
For a file such as the one below:
$ cat input
abc
123%
123
abc%
this is 456% and nothing more
456
Use sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input
$ sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input
123
456
The -n
flag makes sed to suppress automatic output of the lines. Then, we use the -E
flag which will allow us to use extended regular expressions. (In GNU sed, the flag is not -E
but instead is -r
).
Now comes the s///
command. The group (^|.*[^0-9])
matchs either a beginning of line (^
) or a series of zero or more chars (.*
) ending in a non-digit char ([^0-9]
).
[0-9]\{1,3\}
just matches one to three digits and is bound to a group (by the (
and )
group delimiters) if the group is preceded by (^|.*[^0-9])
and followed by %
. Then .*
matches everything before and after this pattern. After this, we replace everything by the second group (([0-9]{1,3})
) using the backreference \2
. Since we passed -n
to sed, nothing would be printed but we passed the p
flag to the s///
command. The result is that if the replacement is executed then the resulted line is printed. Note the p
is a flag of s///
, not the p
command, because it comes just after the last /
.
sed -e 's/[^0-9]*\([0-9]*\)%.*/\1/'
captures the digits in a group and because the pattern matches everything (the leading and trailing .*
) it all gets discarded.
(my pattern matches any number of digits since sed
regular expressions don't support handy shortcuts like [0-9]{1,3}
that you see in perlre and others so I elected to keep it simple to illustrate the principle you cared about)
Edit: to fix quoting and replace leading .*
with [^0-9]*
to avoid the greedy match consuming the numbers. Once again more straightforward with perlre where you can use a non-greedy .?*
Here's my shot:
sed "/^[0-9]{1,3}%$/ bnum; d; :num s/%//"
If the line is 1-3 digits followed by a %, it removes the %-sign. Otherwise, it removes the entire line. So, for input such as
adsf
50
52%
1
12%
test%
1234%
%%%
85%
bye
It yields
52
85
Use awk
instead of sed
.
$ cat file
one two 100% three
10% four 1% five
$ awk '{
for(i=1;i<=NF;i++)
if ($i ~/%$/) { print $i+0} }
'file
100
10
1
For each field, check to see if there is %
sign at the end. If yes, print the number. ($i+0 means to convert to integer). Minimal Regular expression used.
sed -n "/[0-9]\{1,2\}%/ s/^[^0-9]*\([0-9]\{1,2\}\)%.*/\1/p
/100%/ s/.*/100/p
"
the 100% is to be extracted because otherwise number of kind 987% (or 123% if filtered on 1 at 1st position) are also send to output
精彩评论