开发者

sed - how to remove everything but a defined pattern?

I have to remove everything but 1, 2, or 3 digits (0-9, or 10-99, or 100) preceding % (I don't want to see the %, though) from another command's output and pipe it forward to another command. I know that

sed -n '/%/p'

will show only the line(s) containing %, but that's not what I wa开发者_如何学Gont. How can I get rid of the rest of the unwanted text and leave only these numbers to then pipe them to another command?


If you're not completely tied to sed, this is exactly what grep -o does:

grep -o '[0-9]\{1,3\}%'


EDIT: I have misunderstood the OP and posted an invalid answer. I changed it to an answer that, I believe, would solve the problem in the more general scenario.

For a file such as the one below:

$ cat input
abc
123%
123
abc%
this is 456% and nothing more
456

Use sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input

$  sed  -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input
123
456

The -n flag makes sed to suppress automatic output of the lines. Then, we use the -E flag which will allow us to use extended regular expressions. (In GNU sed, the flag is not -E but instead is -r).

Now comes the s/// command. The group (^|.*[^0-9]) matchs either a beginning of line (^) or a series of zero or more chars (.*) ending in a non-digit char ([^0-9]). [0-9]\{1,3\} just matches one to three digits and is bound to a group (by the ( and ) group delimiters) if the group is preceded by (^|.*[^0-9]) and followed by %. Then .* matches everything before and after this pattern. After this, we replace everything by the second group (([0-9]{1,3})) using the backreference \2. Since we passed -n to sed, nothing would be printed but we passed the p flag to the s/// command. The result is that if the replacement is executed then the resulted line is printed. Note the p is a flag of s///, not the p command, because it comes just after the last /.


sed -e 's/[^0-9]*\([0-9]*\)%.*/\1/' captures the digits in a group and because the pattern matches everything (the leading and trailing .*) it all gets discarded.

(my pattern matches any number of digits since sed regular expressions don't support handy shortcuts like [0-9]{1,3} that you see in perlre and others so I elected to keep it simple to illustrate the principle you cared about)

Edit: to fix quoting and replace leading .* with [^0-9]* to avoid the greedy match consuming the numbers. Once again more straightforward with perlre where you can use a non-greedy .?*


Here's my shot:

sed "/^[0-9]{1,3}%$/ bnum; d; :num s/%//"

If the line is 1-3 digits followed by a %, it removes the %-sign. Otherwise, it removes the entire line. So, for input such as

adsf
50
52%
 1
 12%
test%
1234%
%%%
85%
bye

It yields

52
85


Use awk instead of sed.

$ cat file
one two 100% three
10% four 1% five

$ awk '{
  for(i=1;i<=NF;i++) 
   if ($i ~/%$/) { print $i+0} }
  'file
100
10
1

For each field, check to see if there is % sign at the end. If yes, print the number. ($i+0 means to convert to integer). Minimal Regular expression used.


sed -n "/[0-9]\{1,2\}%/ s/^[^0-9]*\([0-9]\{1,2\}\)%.*/\1/p
/100%/ s/.*/100/p
"

the 100% is to be extracted because otherwise number of kind 987% (or 123% if filtered on 1 at 1st position) are also send to output

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜