开发者

Using sed command for range of numbers

I have a file with city and numbers. It's a csv file

New York , 23456      
chicago, 123,456,789,889981(2-6)    
phoenix  123,76(0-3)    

Range开发者_运维技巧 number in the file i want to replace it with each number. For example i want to change 889981(2-6) to 8899812,8899813,8899814,8899815,8899816 and insert in the same line. Will i be able to it in sed. It needs to scan the entire file and do the replacement.


sed is not very good with arithmetic; I suppose it is not impossible, but also not very simple. My recommendation would be to use a proper scripting language, such as awk, perl, or python (if you are not familiar with any of them, perhaps Python; if you want the smallest possible memory footprint, use awk; if you already know Perl, by all means, use Perl).

perl -pe 's/(\d+)\((\d+)-(\d+)\)$/ join (",", 
          (join ("", $1, $2) .. join ("", $1, $3))) /ge' file


No, this is beyond what you can do with just a regular expression. You will need to add something more powerful, like perl, python or awk, or whatever you feel most at home with.


Requires gawk for the 3-argument match() function:

gawk '
    BEGIN {OFS = FS = ","}
    match($NF, /([0-9]+)\(([0-9]+)-([0-9]+)\)/, ary) {
        NF--
        for (n=ary[2]; n <= ary[3]; n++) {
            $(NF+1) = 10 * ary[1] + n
        }
    }
    {print}
' 

I assume (based on the sample) that the range only occurs in the last comma-separated field.


Solution using awk (@glenn jackman will probably post something that does this in less than 5 lines):

# join.awk --- join an array into a string
function join(array, start, end, sep,    result, i)
{
    if (sep == "")
       sep = " "
    else if (sep == SUBSEP) # magic value
       sep = ""
    result = array[start]
    for (i = start + 1; i <= end; i++)
        result = result sep array[i]
    return result
}


function range(input) {
    split(input, a, "[(-)]")
    # [1] is startvalue, [2] is start and stop for range
    split(a[2], b, "-")
    # [1] is start range, [2] is stop range
    # create 1st number by appending start range to start value
    c[1] = a[1] b[1]
    n=2
    for(i=b[1]; i<=b[2]; i++){
        c[n] = c[n-1] + 1
        n++
    }
    return join(c, 1, b[2], ",")

}

# a line containing a -
/-/ {
    for(i=1;i<=NF;i++){
        if ($i ~ /-/) {
        printf("%s,", range($i))
        }
        printf("%s,", $i)
    }
    print ""
}
!/-/{print}


This might work for you (GNU sed only):

sed 's/^\(.*\)\b\([0-9]\+\)(\([0-9]\)-\([0-9]\))/echo "\1" {\2\3..\2\4}/e;s/\([0-9]\),\? \([0-9]\)/\1,\2/g' file
New York , 23456      
chicago, 123,456,789,8899812,8899813,8899814,8899815,8899816
phoenix  123,760,761,762,763
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜