Encode/compress sequence of repeating integers
I have very long integer sequences that look like this (arbitrary length!):
0000000001110002220033333
Now I need some algorithm to convert this string into something compressed like
a9b3a3c3a2d5
Which means "a 9 times, then b 3 times, then a 3 times" and so on, where "a" stands for 0, "b" for 1, "c" for 2 and "d" for 3.
How would you do that? So far nothing suitable came to my mind, and I had no luck with google because I didn't really know what to search for. What is this kind of encoding / compression called?
PS: I am going to do the encoding with PHP, and the decoding in JavaScr开发者_开发技巧ipt.
Edit: Thank you all!
I ended up with this function for encoding:
protected function numStringToRle($s){
$rle = '';
$count = 1;
$len = strlen($s);
for($i = 0; $i < $len; $i++){
if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
$count++;
} else {
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);
$count = 1;
}
}
return $rle;
}
And that for decoding:
var decodeCoords = function(str) {
str = str.replace(/(.)(\d+)/g, function(_, x, n) {
return new Array(parseInt(n, 10) + 1).join(x);
});
return str.
replace(/a/g, '0').
replace(/b/g, '1').
replace(/c/g, '2').
replace(/d/g, '3');
};
It is called Run Length Encoding
Basic encoder in PHP:
function numStringToRle($s){
$rle = '';
$count = 1;
$len = strlen($s);
for ( $i = 0; $i < $len; $i++ ){
if ( $i != $len && $s[$i] == $s[$i+1] ){
$count++;
}else{
$rle .= chr($s[$i] + 97).$count;
$count = 1;
}
}
return $rle;
}
Be warned it will preform badly issues with a string like
123456789123456789
If you were going to be handling a string that may have a lot of individual single characters you would be better to add some complexity and not write the length of the run if the length of the run is 1.
//change
$rle .= chr($s[$i] + 97).$count;
//to
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count );
//or
$rle .= chr($s[$i] + 97)
if ( $count != 1 ){
$rle .= $count;
}
Here is a naive implementation of what you want.
$toEncode = '0000000001110002220033333';
$currentChar = '-1';
$length = strlen($toEncode);
$encoded = '';
$currentNbrChar = 0;
for($i = 0; $i < $length; $i++){
if($toEncode[$i] != $currentChar){
if($currentChar != '-1'){
$encoded .= chr(97 + $currentChar).$currentNbrChar;
}
$currentNbrChar = 0;
$currentChar = $toEncode[$i];
}
$currentNbrChar ++;
}
if($currentChar != '-1'){
$encoded .= chr(97 + $currentChar).$currentNbrChar;
}
echo $encoded;
Here's a shorter version:
function smush(str) {
return str.replace(/((.)\2*)/g, function(_, w, x) {
return x + w.length;
});
}
edit oh I see you want to encode with php; sorry I don't know that. Here's a decoder in a similar spirit:
function unsmush(str) {
return str.replace(/(.)(\d+)/g, function(_, x, n) {
return new Array(parseInt(n, 10) + 1).join(x);
});
}
Just FYI, you could probably gzip your data and the browse will automatically unzip it. For most implementations this is going to work better than RLE. But less fun obviously.
$str="0000000001110002220033333";
//$c will count the number of occurances.
$c=1;
$lastInt=substr($str,0,1);
$str=substr($str,1);
$resultStr='';
$loopEnd=strlen($str);
for($i=1; $i<=$loopEnd+1;$i++)
{
$nowInt=substr($str,0,1);
if($lastInt==$nowInt)
{
$c++;
$str=substr($str,1);
}
else
{
$char=chr((int)$lastInt + 97);
$resultStr=$resultStr.$char.$c;
$str=substr($str,1);
$c=1;
$lastInt=$nowInt;
}
}
// we use if condition since for loop will not take the last integer if it repeats.
if($c>1)
{
$char=chr((int)$lastInt + 97);
$resultStr=$resultStr.$char.$c;
}
echo $resultStr;
function compress( $str) {
$strArr = str_split($str.'0');
$count = 0;
$resStr = '';
$strCheck = $strArr[0];
foreach($strArr as $key => $value)
{
if($strCheck == $value)
{
$count++;
}
else
{
if($count == 1)
{
$strCheck = $value;
$resStr .= $strArr[$key-1];
$count=1;
}
elseif($count == 2)
{
$strCheck = $value;
$resStr .= $strArr[$key-1].$strArr[$key-1];
$count=1;
}
else
{
$strCheck = $value;
$resStr .= $strArr[$key-1].$count;
$count=1;
}
}
}
return $resStr;
}
精彩评论