Best practice for function to handle 1-256 bytes

2023-01-12 12:42 问答作者：

I have some functions that are designed to handle 1-256 bytes, running on an embedded C platform where passing a byte is much faster and more compact than passing an int (one instruction versus three), what is the preferred way of coding it:

Accept an int, early-exit if zero, and otherwise copy the LSB of the count value t开发者_JS百科o an unsigned char and use that in a do {} while(--count); loop (a parameter value of 256 will get converted to 0, but will run 256 times)
Accept an unsigned char, early-exit if zero, and have a special version of the function for 256 bytes (those cases will be known in advance).
Accept an unsigned char, and run 256 times if it's zero.
Have a function like the above, but call it via wrappers functions that behave as (0-255) and (256 only).
Have a function like the above, but call it via wrapper macros that behave as (0-255) and (256 only).

It is expected that the inner loop of the function will probably represent 15%-30% of processor execution time when the system is busy; it will sometimes be used for small numbers of bytes, and sometimes for large ones. The memory chip used by the function has a per-transaction overhead, and I prefer to have my memory-access function do the start-transaction/do-stuff/end-transaction sequence internally.

The most efficient code would be to simply accept an unsigned char and regard a parameter value of 0 as a request to do 256 bytes, relying on the caller to avoid any accidental attempts to read 0 bytes. That seems a bit dangerous, though. Have others dealt with such issues on embedded systems? How were they handled?

EDIT The platform is a PIC18Fxx (128K code space; 3.5K RAM), connecting to an SPI flash chip; reading 256 bytes when fewer are expected would potentially overrun read buffers in the PIC. Writing 256 bytes instead of 0 would corrupt data in the flash chip. The PIC's SPI port is limited to one byte every 12 instruction times if one doesn't check busy status; it will be slower if one does. A typical write transaction requires sending 4 bytes in addition to the data to be received; a read requires an extra byte for "SPI turnaround" (the fastest way to access the SPI port is to read the last byte just before sending the next one).

The compiler is HiTech PICC-18std.

I've generally liked the HiTech's PICC-16 compilers; HiTech seems to have diverted their energies away from the PICC-18std product toward their PICC-18pro line which has even slower compilation times, seems to require the use of 3-byte 'const' pointers rather than two-byte pointers, and has its own ideas about memory allocation. Maybe I should look more at the PICC-18pro, but when I tried compiling my project on an eval version of PICC-18pro it didn't work and I didn't figure out exactly why--perhaps something about variable layout not agreeing with my asm routines--I just kept using PICC-18std.

Incidentally, I just discovered that PICC-18 particularly likes do {} while(--bytevar); and particularly dislikes do {} while(--intvar); I wonder what's going through the compiler's "mind" when it generates the latter?

  do
  {
    local_test++;
    --lpw;
  } while(lpw);

  2533                           ;newflashpic.c: 792: do
  2534                           ;newflashpic.c: 793: {
  2535  0144A8  2AD9                incf    fsr2l,f,c
  2536                           ;newflashpic.c: 795: } while(--lpw);
  2537  0144AA  0E00                movlw   low ?_var_test
  2538  0144AC  6EE9                movwf   fsr0l,c
  2539  0144AE  0E01                movlw   high ?_var_test
  2540  0144B0  6EEA                movwf   fsr0h,c
  2541  0144B2  06EE                decf    postinc0,f,c
  2542  0144B4  0E00                movlw   0
  2543  0144B6  5AED                subwfb  postdec0,f,c
  2544  0144B8  50EE                movf    postinc0,w,c
  2545  0144BA  10ED                iorwf   postdec0,w,c
  2546  0144BC  E1F5                bnz l242

The compiler loads a pointer to the variable, not even using the LFSR instruction (which would take two words) but a combination of MOVLW/MOVWF (taking four). Then it uses this pointer to do the decrement and compare. While I'll admit that do{}while(--wordvar); cannot yield as nice code as do{}while(wordvar--); the code is better than what the latter format actually generates. Doing a separate decrement and while-test (e.g. while (--lpw,lpw)) yields sensible code, but it seems a bit ugly. The post-decrement operator could yield the best code for a down-counting loop:

  decf _lpw
  btfss _STATUS,0 ; Skip next inst if carry (i.e. wasn't zero)
   decf _lpw+1
  bc    loop  ; Carry will be clear only if lpw was zero

but it instead generates worse code than --lpw. The best code would be for an up-counting loop:

  infsnz  _lpw
   incfsz _lpw+1
   bra loop

but the compiler doesn't generate that.

EDIT 2 Another approach I might use: allocate a global 16-bit variable for the number of bytes, and write the functions so that the counter is always zeroed before exit. Then if only an 8-bit value is required, it would only be necessary to load 8 bits. I'd use macros for stuff so they could be tweaked for best efficiency. On the PIC, using |= on a variable which is known to be zero is never slower than using =, and is sometimes faster. For example, intvar |= 15 or intvar |= 0x300 would be two instructions (each case only has to bother with one byte of the result and can ignore the other); intvar |= 4 (or any power of 2) is one instruction. Obviously on some other processors, intvar = 0x300 would be faster than intvar |= 0x300; if I use a macro it could be tweaked as appropriate.

Your inner function should copy count + 1 bytes, e.g.,

 do /* copy one byte */ while(count-- != 0);

If the post-decrement is slow, other alternatives are:

 ... /* copy one byte */
 while (count != 0) { /* copy one byte */; count -= 1; }

 for (;;) { /* copy one byte */; if (count == 0) break; count -= 1; }

The caller/wrapper can do:

if (count > 0 && count <= 256) inner((uint8_t)(count-1))

if (((unsigned )(count - 1)) < 256u) inner((uint8_t)(count-1))

if its faster in your compiler.

FWIW, I'd choose some variant of option #1. The function's interface remains sensible, intuitive, and seems less likely to be called incorrectly (you might want to think about what you want to do if a value larger than 256 is passed in - a debug-build-only assertion might be appropriate).

I don't think the minor 'hack'/micro-optimization to loop the correct number of times using an 8-bit counter would really be a maintenance problem, and it seems you've done considerable analysis to justify it.

I wouldn't argue against wrappers if someone preferred them, but I'd personally lean toward option 1 ever-so-slightly.

However, I would argue against having the public interface require the caller to pass in a value one less than they wanted to read.

If an int parameter costs 3 instructions and a char parameter costs 1, you could pass an extra char parameter for the extra 1 bit you're missing. It seems pretty silly that your (presumably 16-bit) int takes more than twice as many instructions as an 8-bit char.

继续阅读：c embedded pic pic18

Best practice for function to handle 1-256 bytes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？