sed - how to remove everything but a defined pattern?

2023-03-29 04:24 问答作者：

I have to remove everything but 1, 2, or 3 digits (0-9, or 10-99, or 100) preceding % (I don't want to see the %, though) from another command's output and pipe it forward to another command. I know that

sed -n '/%/p'

will show only the line(s) containing %, but that's not what I wa开发者_如何学Gont. How can I get rid of the rest of the unwanted text and leave only these numbers to then pipe them to another command?

If you're not completely tied to sed, this is exactly what grep -o does:

grep -o '[0-9]\{1,3\}%'

EDIT: I have misunderstood the OP and posted an invalid answer. I changed it to an answer that, I believe, would solve the problem in the more general scenario.

For a file such as the one below:

$ cat input
abc
123%
123
abc%
this is 456% and nothing more
456

Use sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input

$  sed  -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input
123
456

The -n flag makes sed to suppress automatic output of the lines. Then, we use the -E flag which will allow us to use extended regular expressions. (In GNU sed, the flag is not -E but instead is -r).

Now comes the s/// command. The group (^|.*[^0-9]) matchs either a beginning of line (^) or a series of zero or more chars (.*) ending in a non-digit char ([^0-9]). [0-9]\{1,3\} just matches one to three digits and is bound to a group (by the ( and ) group delimiters) if the group is preceded by (^|.*[^0-9]) and followed by %. Then .* matches everything before and after this pattern. After this, we replace everything by the second group (([0-9]{1,3})) using the backreference \2. Since we passed -n to sed, nothing would be printed but we passed the p flag to the s/// command. The result is that if the replacement is executed then the resulted line is printed. Note the p is a flag of s///, not the p command, because it comes just after the last /.

sed -e 's/[^0-9]*$[0-9]*$%.*/\1/' captures the digits in a group and because the pattern matches everything (the leading and trailing .*) it all gets discarded.

(my pattern matches any number of digits since sed regular expressions don't support handy shortcuts like [0-9]{1,3} that you see in perlre and others so I elected to keep it simple to illustrate the principle you cared about)

Edit: to fix quoting and replace leading .* with [^0-9]* to avoid the greedy match consuming the numbers. Once again more straightforward with perlre where you can use a non-greedy .?*

Here's my shot:

sed "/^[0-9]{1,3}%$/ bnum; d; :num s/%//"

If the line is 1-3 digits followed by a %, it removes the %-sign. Otherwise, it removes the entire line. So, for input such as

adsf
50
52%
 1
 12%
test%
1234%
%%%
85%
bye

It yields

52
85

Use awk instead of sed.

$ cat file
one two 100% three
10% four 1% five

$ awk '{
  for(i=1;i<=NF;i++) 
   if ($i ~/%$/) { print $i+0} }
  'file
100
10
1

For each field, check to see if there is % sign at the end. If yes, print the number. ($i+0 means to convert to integer). Minimal Regular expression used.

sed -n "/[0-9]\{1,2\}%/ s/^[^0-9]*\([0-9]\{1,2\}\)%.*/\1/p
/100%/ s/.*/100/p
"

the 100% is to be extracted because otherwise number of kind 987% (or 123% if filtered on 1 at 1st position) are also send to output

继续阅读：sed

sed - how to remove everything but a defined pattern?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？