Reading memory in correct order Need some help

2023-03-20 07:26 问答作者：

We are sto开发者_运维问答ring some sort of records in memory location as follows

----------------------------------------------
|EventID | Timestamp | Variable Data | Length  |
----------------------------------------------

Lengths of these fields are as follows

EventID+ timestamp is 12 bytes Length Fields is 4 bytes , it indicates the length of data field.

Millions of such records are placed one after the other & I have a pointer pointing to the current index, so If I want to read all the records I go like this I read 4 bytes right to left & I fetch that particular record & doing this iteratively I read the complete memory space. But the problem with this method is that It reads records in the reverse order as compared to the order in which to they were entered.

I need to device a method which will allow me to read this memory records in the same order they were entered with minimal space complexity.

I have another great solution for you!

Read your records in reverse order (end to beginning) and swap in-memory values for EventID and Length fields.
When access rows, just keep in mind the new layout: Length | Timestamp | Data | EventID

As the variable length data section comes before the length, it will be impossible to read data starting with the beginning memory address. Assuming no changes can be made to architecture or storage, one possible option is to use your current system to build an index of the variable data lenghts. Then, once you reach the beginning of the data you would then read the records in the correct order - using the previous built index to determine variable data length.

However, you mention this dataset contains millions of records. Thus storing an index of all variable data lengths before processing may not be feasible. One such solution to this problem would be to index only every other entry, or every fourth, eight, etc... depending upon your specific requirements. Then you could start at each indexed record, work backwards temporarily saving the data lengths until you reach a record you havn't processed. Then work forward again using this saved data.

For example, let's say you index every 8 records your first pass. Then, you would start at record 8 and save the length of that record. Then go to 7, 6, 5, 4, 3, 2, 1. Now you've saved the next 8 lenghts. So process record 1, 2, 3, 4, 5, 6, 7, and 8. Now, you don't know the length of 9 - so jump to 16. Then record 16, 15, 14, .., 9 lengths. Then again as before process 9, 10, 11 ... 16. Now repeat.

Try to 'reverse' records order while fetching at first, and then make a second fetch using the same process (allocate same memory amount to reverse).

As the variable data has variable length, and the length value in last position, I see no way to get this fetching from left to right.

There is another way to find the end of a row with no additional memory.

All EventID fall into definite range, and could be sequential
All Timestamp have a definite range too (say, from 2009/09/09 through 2011/11/11)
Length, EventID, and Timestamp are adjacent between two rows and have fixed length of 16 bytes in total (4 for length, 4 for eventID, and 8 for timestamp).

Under these considerations you could write a function that searches the end of a row, e.g.

byte* FindNextRow(byte* rowStart, byte* memBlockEnd,
                 DWORD minEventID, DWORD maxEventID,
                 QWORD minTimestamp, QWORD maxTimestamp)
{
  long bytesAvail = (long)(memBlockEnd - rowStart) - 4;
  byte* ptr = rowStart + 12; // move to 'data'

  for (long i = 0; i < bytesAvail; i++, ptr++) {

    long length = *(long*)(ptr);

    // check if this is the last row
    if (ptr + 4 == memBlockEnd)
      return memBlockEnd;

    // try to find candidate for 'length' field first
    if (rowStart + 12 != ptr - length)
      continue;

    // then check 'EventID' and 'Timestamp' for the next row
    DWORD eventID = *(DWORD*)(ptr + 4);
    if (eventID < minEventID || eventID > maxEventID)
      continue; // you might add additional check on a sequence: eventID + 1 == *(DWORD*)(rowStart);

    QWORD timestamp = *(QWORD*)(ptr + 8);
    if (timestamp < minTimestamp || timestamp > maxTimestamp)
      continue; // you might add additional check on a sequence: timestamp > *(QWORD*)(rowStart + 4);

    // this is the match
    return ptr + 4;
  }
}

WARNING: this will not guarantee the correctness, but you could try to find a workaround this way.

Is allocating one pointer (in a 32 bits machine, usually 4 bytes) per message acceptable to you?

If it is, you could, starting from the end:

Read length at current position - 4
Get the pointer to the 1st byte of event id with: current position - 4 - length - 12
Resize the pointer array if needed
Store that pointer in the array
Repeat from 1

Of course, you would need to realloc() as the pointer array grows (no need to realloc every time, do it in chunks).

I am assuming you are treating them as a char array, so char pointer difference of contiguous elements (n and n-1) would give you the size of the entire message.

This wastes memory. I know you don't want to, but if you can't do like Opillect said, swapping EventID and Length fields because they have different sizes, this seems like a good way to do it.

继续阅读：algorithm c

Reading memory in correct order Need some help

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？