开发者

Decoding problems with Lempel-Ziv-Welch algorithm

I have to implement the LZW algorithm but I have found some trouble with the decoding part. I think the code is right because it works with a example I've found somewhere on the web: if I initialize my dictionary as follows

m_dictionary.push_back("a");
m_dictionary.push_back("b");
m_dictionary.push_back("d");
m_dictionary.push_back("n");
m_dictionary.push_back("_");

and my input file has the string banana_bandana, I get the following results:

compressed.txt: 1036045328

decompressed.txt:banana_bandana

But if I initialize the dictionary with all the 255 ASCII characters, the decoding process fails miserably. I think the problem rests in the number of bits used on the codes because when I'm going to decode, I always read from the input file char by char (8 bits) instead the correct number of bits, I guess.

Below is the code of my implementation of this algorithm:

template <class T>
size_t toUnsigned(T t) {
  std::stringstream stream;
  stream << t;
  size_t x;
  stream >> x;
  return x;
}

bool LempelZivWelch::isInDictionary(const std::string& entry) {
  return (std::find(m_dictionary.begin(), m_dictionary.end(), entry) != m_dictionary.end());
}

void LempelZivWelch::initializeDictionary() {
  m_dictionary.clear();
  for (int i = 0; i < 256; ++i)
    m_dictionary.push_back(std::string(1, char(i)));
}

void LempelZivWelch::addEntry(std::string entry) {
  m_dictionary.push_back(entry);
}

size_t LempelZivWelch::encode(char *data, size_t dataSize) {    
  initializeDictionary();

  std::string s;
  char c;

  std::ofstream file;
  file.open("compressed.txt", std::ios::out | std::ios::binary);

  for (size_t i = 0; i < dataSize; ++i) {
    c = data[i];

    if(isInDictionary(s + c))
      s = s + c;
    else {
      for (size_t j = 0; j < m_dictionary.size(); ++j)
        if (m_dictionary[j] == s) {
          file << j;
          break;
        }

      addEntry(s + c);
      s = c;
    }
  }

  for (size_t j = 0; j < m_dictionary.size(); ++j)
    if (m_dictionary[j] == s) {
      file <<开发者_如何学JAVA; j;
      break;
    }

  file.close();

  return dataSize;
}

size_t LempelZivWelch::decode(char *data, size_t dataSize) {    
  initializeDictionary();

  std::string entry;
  char c;
  size_t previousCode, currentCode;

  std::ofstream file;
  file.open("decompressed.txt", std::ios::out | std::ios::binary);

  previousCode = toUnsigned(data[0]);

  file << m_dictionary[previousCode];

  for (size_t i = 1; i < dataSize; ++i) {
    currentCode = toUnsigned(data[i]);

    entry = m_dictionary[currentCode];
    file << entry;
    c = entry[0];
    addEntry(m_dictionary[previousCode] + c);
    previousCode = currentCode;
  }

  file.close();

  return dataSize;
}

And this is the function that reads the input files:

void Compression::readFile(std::string filename) {
  std::ifstream file;
  file.open(filename.c_str(), std::ios::in | std::ios::binary | std::ios::ate);

  if (!file.is_open())
    exit(EXIT_FAILURE);

  m_dataSize = file.tellg();
  m_data = new char [m_dataSize];

  file.seekg(0, std::ios::beg);
  file.read(m_data, m_dataSize);
  file.close();
}

My guess is the decoding problem resides in reading the input file as a array of chars and/or writing to the compressed file the chars as size_t.

Thanks in advance!


It looks like you are outputting the dictionary indices as ASCII encoded numbers. How are you going to tell the sequence 1,2,3 from 12,3 or 1,23. You need to encode the data in an unambiguous way using either 9-bit (10, 11 or whatever) numbers or some sort of prefix-free code like huffman coding.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜