Huffman Decoding Sub-Table

2023-04-01 09:02 问答作者：

I've been trying to implement a huffman decoder, and my initial attempt suffered from low performance due to a sub-optimal choice of decoding algorithm.

I thought I try to implement huffman decoding using table-lookups. However, I go a bit stuck on generating the subtables and was hoping someone could point me in the right direction.

struct node
{
    node*               children; // 0 right, 1 left
    uint8_t             value;
    uint8_t             is_leaf;
};

struct entry
{
    uint8_t              next_table_index;
    std::vector<uint8_t> values;

    entry() : next_table_index(0){}
};

void build_tables(node* nodes, std::vector<std::array<entry, 256>>& tables, int table_index);
void unpack_tree(void* data, node* nodes);

std::vector<uint8_t, tbb::cache_aligned_allocator<uint8_t>> decode_huff(void* input)
{
    // Initial setup
    CACHE_ALIGN node                    nodes[512] = {};

    auto data = reinterpret_cast<unsigned long*>(input); 
    size_t table_size   = *(data++); // Size is first 32 bits.
    size_t result_size      = *(data++); // Data size is second 32 bits.

    unpack_tree(data, nodes);

    auto huffman_data = reinterpret_cast<long*>(input) + (table_size+32)/32; 
    size_t data_size = *(huffman_data++); // Size is first 32 bits.     
    auto huffman_data2  = reinterpret_cast<char*>(huffman_data);

    // Build tables

    std::vector<std::array<entry, 256>> tables(1);
    build_tables(nodes, tables, 0);

    // Decode

    uint8_t current_table_index = 0;

    std::vector<uint8_t, tbb::cache_aligned_allocator<uint8_t>> result; 
    while(result.size() < result_size)
    {
        auto& table  = tables[current_table_index];

        uint8_t key = *(huffman_data2++);
        auto& values = table[key].values;
        result.insert(result.end(), values.begin(), values.end());

        current_table_index = table[key].next_table_index;
    }

    result.resize(result_size);

    return result;
}

void build_tables(node* nodes, std::vector<std::array<entry, 256>>& tables, int table_index)
{
    for(int n = 0; n < 256; ++n)
    {
        auto current = nodes;

        for(int i = 0; i < 8; ++i)
        {
            current = current->children + ((n >> i) & 1);       
            if(current->is_leaf)
                tables[table_index][n].values.push_back(current->value);
        }

       开发者_如何学Python if(!current->is_leaf)
        {
            if(current->value == 0)
            {
                current->value = tables.size();
                tables.push_back(std::array<entry, 256>());
                build_tables(current, tables, current->value);
            }

            tables[table_index][n].next_table_index = current->value;
        }
    }   
}

void unpack_tree(void* data, node* nodes)
{   
    node* nodes_end = nodes+1;      
    bit_reader table_reader(data);  
    unsigned char n_bits = ((table_reader.next_bit() << 2) | (table_reader.next_bit() << 1) | (table_reader.next_bit() << 0)) & 0x7; // First 3 bits are n_bits-1.

    // Unpack huffman-tree
    std::stack<node*> stack;
    stack.push(&nodes[0]);      // "nodes" is root
    while(!stack.empty())
    {
        node* ptr = stack.top();
        stack.pop();
        if(table_reader.next_bit())
        {
            ptr->is_leaf = 1;
            ptr->children = nodes[0].children;
            for(int n = n_bits; n >= 0; --n)
                ptr->value |= table_reader.next_bit() << n;
        }
        else
        {
            ptr->children = nodes_end;
            nodes_end += 2;

            stack.push(ptr->children+0);
            stack.push(ptr->children+1);
        }
    }   
}

First off, avoid all those vectors. You can have pointers into a single preallocated buffer, but you don't want the scenario where vector allocates these tiny, tiny buffers all over memory, and your cache footprint goes through the roof.

Note also that the number of non-leaf states might be much less than 256. Indeed, it might be as low as 128. By assigning them low state IDs, we can avoid generating table entries for the entire set of state nodes (which may be as high as 511 nodes in total). After all, after consuming input, we'll never end up on a leaf node; if we do, we generate output, then head back to the root.

The first thing we should do, then, is reassign those states that correspond to internal nodes (ie, ones with pointers out to non-leaves) to low state numbers. You can use this to also reduce memory consumption for your state transition table.

Once we've assigned these low state numbers, we can go through each possible non-leaf state, and each possible input byte (ie, a doubly-nested for loop). Traverse the tree as you would for a bit-based decoding algorithm, and record the set of output bytes, the final node ID you end up on (which must not be a leaf!), and whether you hit an end-of-stream mark.

继续阅读：huffman-code

Huffman Decoding Sub-Table

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？