Decoding problems with Lempel-Ziv-Welch algorithm

2023-03-08 12:35 问答作者：

I have to implement the LZW algorithm but I have found some trouble with the decoding part. I think the code is right because it works with a example I've found somewhere on the web: if I initialize my dictionary as follows

m_dictionary.push_back("a");
m_dictionary.push_back("b");
m_dictionary.push_back("d");
m_dictionary.push_back("n");
m_dictionary.push_back("_");

and my input file has the string banana_bandana, I get the following results:

compressed.txt: 1036045328

decompressed.txt:banana_bandana

But if I initialize the dictionary with all the 255 ASCII characters, the decoding process fails miserably. I think the problem rests in the number of bits used on the codes because when I'm going to decode, I always read from the input file char by char (8 bits) instead the correct number of bits, I guess.

Below is the code of my implementation of this algorithm:

template <class T>
size_t toUnsigned(T t) {
  std::stringstream stream;
  stream << t;
  size_t x;
  stream >> x;
  return x;
}

bool LempelZivWelch::isInDictionary(const std::string& entry) {
  return (std::find(m_dictionary.begin(), m_dictionary.end(), entry) != m_dictionary.end());
}

void LempelZivWelch::initializeDictionary() {
  m_dictionary.clear();
  for (int i = 0; i < 256; ++i)
    m_dictionary.push_back(std::string(1, char(i)));
}

void LempelZivWelch::addEntry(std::string entry) {
  m_dictionary.push_back(entry);
}

size_t LempelZivWelch::encode(char *data, size_t dataSize) {    
  initializeDictionary();

  std::string s;
  char c;

  std::ofstream file;
  file.open("compressed.txt", std::ios::out | std::ios::binary);

  for (size_t i = 0; i < dataSize; ++i) {
    c = data[i];

    if(isInDictionary(s + c))
      s = s + c;
    else {
      for (size_t j = 0; j < m_dictionary.size(); ++j)
        if (m_dictionary[j] == s) {
          file << j;
          break;
        }

      addEntry(s + c);
      s = c;
    }
  }

  for (size_t j = 0; j < m_dictionary.size(); ++j)
    if (m_dictionary[j] == s) {
      file <<开发者_如何学JAVA; j;
      break;
    }

  file.close();

  return dataSize;
}

size_t LempelZivWelch::decode(char *data, size_t dataSize) {    
  initializeDictionary();

  std::string entry;
  char c;
  size_t previousCode, currentCode;

  std::ofstream file;
  file.open("decompressed.txt", std::ios::out | std::ios::binary);

  previousCode = toUnsigned(data[0]);

  file << m_dictionary[previousCode];

  for (size_t i = 1; i < dataSize; ++i) {
    currentCode = toUnsigned(data[i]);

    entry = m_dictionary[currentCode];
    file << entry;
    c = entry[0];
    addEntry(m_dictionary[previousCode] + c);
    previousCode = currentCode;
  }

  file.close();

  return dataSize;
}

And this is the function that reads the input files:

void Compression::readFile(std::string filename) {
  std::ifstream file;
  file.open(filename.c_str(), std::ios::in | std::ios::binary | std::ios::ate);

  if (!file.is_open())
    exit(EXIT_FAILURE);

  m_dataSize = file.tellg();
  m_data = new char [m_dataSize];

  file.seekg(0, std::ios::beg);
  file.read(m_data, m_dataSize);
  file.close();
}

My guess is the decoding problem resides in reading the input file as a array of chars and/or writing to the compressed file the chars as size_t.

Thanks in advance!

It looks like you are outputting the dictionary indices as ASCII encoded numbers. How are you going to tell the sequence 1,2,3 from 12,3 or 1,23. You need to encode the data in an unambiguous way using either 9-bit (10, 11 or whatever) numbers or some sort of prefix-free code like huffman coding.

继续阅读：algorithm compression

Decoding problems with Lempel-Ziv-Welch algorithm

更多精彩内容

精彩评论

最新问答

小度在家怎么设置保护儿童模式？

燕云十六声剑气纵横心法怎么获得?？

第一后裔虚空碎片怎么得?？

鸣潮漂泊者小助理在哪里?？

电视墙装修用什么材料好?？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？