开发者

Is my MIPS compiler crazy, or am I crazy for choosing MIPS?

I am using a MIPS CPU (PIC32) in an embedded project, but I am starting to question my choice. I understand that a RISC CPU like MIPS will generate more instructions than one might expect, but I didn't think it would be like this. Here is a snippet from the disassembly listing:

225:                         LATDSET = 0x0040;
    sw          s1,24808(s2)
    sw          s4,24808(s2)
    sw          s4,24808(s2)
    sw          s1,24808(s2)
    sw          s4,24808(s3)
    sw          s4,24808(s3)
    sw          s1,24808(s3)

226:           开发者_开发问答              {

227:                             porte = PORTE;
    lw          t1,24848(s4)
    andi        v0,t1,0xffff
    lw          v1,24848(s6)
    andi        ra,v1,0xffff
    lw          v1,24848(s6)
    andi        ra,v1,0xffff
    lw          v0,24848(s6)
    andi        t2,v0,0xffff
    lw          a2,24848(s5)
    andi        v1,a2,0xffff
    lw          t2,24848(s5)
    andi        v1,t2,0xffff
    lw          v0,24848(s5)
    andi        t2,v0,0xffff

228:                             if (porte & 0x0004)
    andi        t2,v0,0x4
    andi        s8,ra,0x4
    andi        s8,ra,0x4
    andi        ra,t2,0x4
    andi        a1,v1,0x4
    andi        a2,v1,0x4
    andi        a2,t2,0x4

229:                                 pst_bytes_somi[0] |= sliding_bit;
    or          t3,t4,s0
    xori        a3,t2,0x0
    movz        t3,s0,a3
    addu        s0,t3,zero
    or          t3,t4,s1
    xori        a3,s8,0x0
    movz        t3,s1,a3
    addu        s1,t3,zero
    or          t3,t4,s1
    xori        a3,s8,0x0
    movz        t3,s1,a3
    addu        s1,t3,zero
    or          v1,t4,s0
    xori        a3,ra,0x0
    movz        v1,s0,a3
    addu        s0,v1,zero
    or          a0,t4,s2
    xori        a3,a1,0x0
    movz        a0,s2,a3
    addu        s2,a0,zero
    or          t3,t4,s2
    xori        a3,a2,0x0
    movz        t3,s2,a3
    addu        s2,t3,zero
    or          v1,t4,s0
    xori        a3,a2,0x0
    movz        v1,s0,a3

This seems like a crazy number of instructions for simple reading / writing and testing variables at fixed addresses. On a different CPU, I could probably get each C statement down to about 1..3 instructions, without resorting to hand-written asm. Obviously the clock rate is fairly high, but it's not 10x higher than what I would have in a different CPU (e.g. dsPIC).

I have optimisation set to maximum. Is my C compiler terrible (It's gcc 3.4.4)? Or is this typical of MIPS?


Finally figured out the answer. The disassembly listing is totally misleading. The compiler is doing loop unrolling, and what we're seeing under each C statement is actually 8x the number of instructions, because it's unrolling the loop 8x. The instructions are not at consecutive addresses! Turning off loop unrolling in the compiler options produces this:

225:                         LATDSET = 0x0040;
    sw          s3,24808(s2)
226:                         {
227:                             porte = PORTE;
    lw          t1,24848(s5)
    andi        v0,t1,0xffff
228:                             if (porte & 0x0004)
    andi        t2,v0,0x4
229:                                 pst_bytes_somi[0] |= sliding_bit;
    or          t3,t4,s0
    xori        a3,t2,0x0
    movz        t3,s0,a3
    addu        s0,t3,zero
230:                 

Panic over everyone.


I think your compiler is misbehaving... Check for example this statement:

228:                             if (porte & 0x0004)
    andi        t2,v0,0x4  (1)
    andi        s8,ra,0x4  (2)
    andi        s8,ra,0x4  (3)
    andi        ra,t2,0x4  (4)
    andi        a1,v1,0x4  (5)
    andi        a2,v1,0x4  (6)
    andi        a2,t2,0x4  (7)

It is obvious that there are instructions that basically do nothing. Instruction (3) does nothing as new as stores in s8 the same result computed by instruction (2). Instruction (6) also has no effect, as it is overriden by the next instruction (7), I believe any compiler which does some static analysis phase would at least remove instructions (3) and (6).

Similar analysis would apply to other portions of your code. For example in the first statement you can see some registers (v0 and v0) is loaded with the same value twice.

I think your compiler is not doing a good job at optimizing the compiled code.


MIPS is basically the embodiment of everything that was stupid about RISC design. These days x86 (and x86_64) have absorbed pretty much all the worthwhile ideas out of RISC, and ARM has evolved to be much more efficient than traditional RISC while still staying true to the RISC concept of keeping a small, systematic instruction set.

To answer the question, I'd say you're crazy for choosing MIPS, or perhaps more importantly, for choosing it without first learning a bit about the MIPS ISA and why it's so bad and how much inefficiency you need to put up with if you want to use it. I'd choose ARM for low-power/embedded systems in most situations, or better yet Intel Atom if you can afford a bit more power consumption.

Edit: Actually, a second reason you may be crazy... From the comments, it seems you're using 16-bit integers. You should never use smaller-than-int types in C except in arrays or in a structure that will be allocated in large numbers (either in an array or some other way such as a linked list/tree/etc.). Using small types will never give any benefit except for saving space (which is irrelevant until you have a large number of values of such type) and is almost surely less efficient than using "normal" types. In the case of MIPS, the difference is extreme. Switch to int and see if your problem goes away.


The only thing that I can think of is perhaps, perhaps, the compiler might be injecting extra nonsense instructions to mate up the speed of the CPU with a much slower data bus speed. Even that explanation isn't quite sufficient, as the store / load instructions similarly have redundancy.

As the compiler is suspect, don't forget that focusing efforts into the compiler can blind you to a type of tunnel vision. Perhaps errors are latent in other parts of the tool chain too.

Where did you get the compiler? I find that some of the "easy" sources often ship some pretty horrible tools. Embedded development friends of mine typically compile their own tool chain with sometimes much better results.


I tried compiling the following code with CodeSourcery MIPS GCC 4.4-303 with -O4. I tried it with uint32_t and uint16_t:

#include <stdint.h>
void foo(uint32_t PORTE, uint32_t pst_bytes_somi[], uint32_t sliding_bit) {
    uint32_t LATDSET = 0x0040;
    {
        uint32_t porte = PORTE;
        if (porte & 0x0004)
            pst_bytes_somi[0] |= sliding_bit;
        if (porte & LATDSET)
            pst_bytes_somi[1] |= sliding_bit;
    }
}

Here is the disassembly with uint32_t integers:

        uint32_t porte = PORTE;
        if (porte & 0x0004)
   0:   30820004    andi    v0,a0,0x4
   4:   10400004    beqz    v0,18 <foo+0x18>
   8:   00000000    nop
./foo32.c:7
            pst_bytes_somi[0] |= sliding_bit;
   c:   8ca20000    lw  v0,0(a1)
  10:   00461025    or  v0,v0,a2
  14:   aca20000    sw  v0,0(a1)
./foo32.c:8
        if (porte & LATDSET)
  18:   30840040    andi    a0,a0,0x40
  1c:   10800004    beqz    a0,30 <foo+0x30>
  20:   00000000    nop
./foo32.c:9
            pst_bytes_somi[1] |= sliding_bit;
  24:   8ca20004    lw  v0,4(a1)
  28:   00463025    or  a2,v0,a2
  2c:   aca60004    sw  a2,4(a1)
  30:   03e00008    jr  ra
  34:   00000000    nop

Here is the disassembly with uint16_t integers:

        if (porte & 0x0004)
   4:   30820004    andi    v0,a0,0x4
   8:   10400004    beqz    v0,1c <foo+0x1c>
   c:   30c6ffff    andi    a2,a2,0xffff
./foo16.c:7
            pst_bytes_somi[0] |= sliding_bit;
  10:   94a20000    lhu v0,0(a1)
  14:   00c21025    or  v0,a2,v0
  18:   a4a20000    sh  v0,0(a1)
./foo16.c:8
        if (porte & LATDSET)
  1c:   30840040    andi    a0,a0,0x40
  20:   10800004    beqz    a0,34 <foo+0x34>
  24:   00000000    nop
./foo16.c:9
            pst_bytes_somi[1] |= sliding_bit;
  28:   94a20002    lhu v0,2(a1)
  2c:   00c23025    or  a2,a2,v0
  30:   a4a60002    sh  a2,2(a1)
  34:   03e00008    jr  ra
  38:   00000000    nop

As you can see each C statement maps into two to three instructions. Using 16 bit integers makes the function only one instruction longer.


Have you turned on compiler optimizations? Unoptimzied code has much redundancy in it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜