开发者

How to sort an array of Roman numerals?

I have an array containing Roman numerals (as strings of course). Like this:

 $a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');

I'd like to sort them according to the numeric values of these numerals, so the results should be something like:

 $sorted_a = array('III', 'V', 'XIII', 'XIX', 'LII', 'MCCXCIV');

So my question is: what is the best way to sort an array of Roman numerals? I know how to use the array sorting functions of PHP, I'm interested in the logic that goes on inside the comparison function.

EDIT: For simplicity, I'm only looking for a way that deals with strings constructed of the basic numerals in a standard way (no CCCC for example):

I, V, X, L, C, D, M

TEST RESULTS

I took the time to extensively test all the code examples that were posted. Two tests were taken, one with a random array of 20 Roman numerals, and a second with an arra开发者_如何学Cy containing 4000 of those. Same machine, lot of iterations, an average time taken, and all this run several times. Of course this is nothing offical, just my own tests.

TEST WITH 20 NUMERALS:

  1. hakre, bazmegakapa - around 0.0005 s
  2. anemgyenge, Andrea, Dirk McQuickly - around 0.0010 s
  3. Joe Nelson - around 0.0050 s
  4. Rob Hruska - around 0.0100 s

TEST WITH 4000 NUMERALS:

  1. hakre, bazmegakapa - around 0.13 s
  2. anemgyenge - around 1.4 s
  3. Dirk McQuickly, Andrea - around 1.8 s
  4. Rob Hruska - around 2.8 s
  5. Joe Nelson - around 15 s (surprise, checked several more times)

I have a hard time awarding the bounty. hakre and I made the fastest versions, following the same route, but he made a variation of mine, which was previously based on borrible's idea. So I will accept hakre's solution, because that is the quickest and nicer than mine (IMO). But I will award the bounty to anemgyenge, because I love his version and a lot of effort seems to be put into it.


Picking your class to convert roman numbers to integers, a user-defined sort callback can handle this to sort the array:

$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');

$bool = usort($a, function($a, $b) {
    return RomanNumber::Roman2Int($a) - RomanNumber::Roman2Int($b);
});    
var_dump($a);

So here you find the logic inside the comparison function: if both values are of the same weight, return 0. If the first is lower than the second, return < 0 (e.g. -1), otherwise the second is larger than the first so return > 0 (e.g. 1).

Naturally any other type of function that returns the decimal value for a roman number would work as well.

Edit:

As you commented, you do not want to run the conversion for each pair. That's fine, with a help of an additional array which contains all converted values, you can run the sort on the decimal values and use that sorting on the roman numbers as well (Demo):

$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
$b = array_map('RomanNumber::Roman2Int', $a);
array_multisort($b, $a);
var_dump($a);

array_multisort PHP Manual does most of the magic here.


function sortRomanNum($a, $b) {
    if($a == $b) return 0;

    $str = "0IVXLCDM";
    $len = 0;

    if(strlen($a) >= strlen($b)) {
        $len = strlen($a);
        $b .= str_repeat("0", $len - strlen($b));
    }
    else {
        $len = strlen($b);
        $a .= str_repeat("0", $len - strlen($a));
    }

    for($i = 0; $i < $len - 1; $i++) {
        $a1 = $a[$i]; $b1 = $b[$i]; $a2 = $a[$i+1]; $b2 = $b[$i+1];

        if( strpos($str, $a1.$b1.$a2) !== false ) return 1;
        if( strpos($str, $b1.$a1.$b2) !== false ) return -1;

        if($a1 != $b1) return strpos($str, $a1) > strpos($str, $b1) ? 1 : -1;
    }

    if($a[$i] != $b[$i]) return strpos($str, $a[$i]) > strpos($str, $b[$i]) ? 1 : -1;
}

Given two numbers (roman strings), $a and $b. If there are no substractions in the numbers (IV, IX, XC etc), then the solution would be trivial:

for all $i in $a and $b
    if $a[$i] > $b[$i] then return 1; //($a is greater then $b)
    if $a[$i] < $b[$i] then return 1; //($a is lower then $b)
return 0 //equality

Since there can be these special parts, the calculation is more complex. But the solution is to find the patterns:

a: IX | XC | CM
b: V  | L  | D

These are the only patterns which can mess up the trivial solution. If you find any of these, then $a will be greater then $b.

Note, that roman numbers don't include zeros, like the arabic ones. Therefore now we will use them (and basically put zeros where they are missing).

So here comes the function:

if $a == $b then return 0; //equality
create a string for ordering the roman numerals (strpos will give the right index)
define the length of the loop (take the longer string), and add zeros to the end of the shorter number
run the loop, and check:
    1. if the patterns above are found, return the comparision accordingly (1 or -1)
    2. otherwise do the trivial check (compare each numeral)
check the last numerals too.


Some people have suggested converting Roman numerals to integers, sorting, and mapping back. There is an easier way. All that we really need to do is compare any two arbitrary Roman numerals and let usort do the rest. Here is the code, and I will explain its design below.

$base = array( 'I' => 0, 'V' => 1, 'X' => 2, 'L' => 3,
               'C' => 4, 'D' => 5, 'M' => 6 ); 
function single($a) { global $base; return $base[$a]; }

function compare($a, $b) {
    global $base;
    if(strlen($a) == 0) { return true; }
    if(strlen($b) == 0) { return false; }
    $maxa = max(array_map('single', str_split($a)));
    $maxb = max(array_map('single', str_split($b)));
    if($maxa != $maxb) {
        return $maxa < $maxb;
    }
    if($base[$a[0]] != $base[$b[0]]) {
        return $base[$a[0]] < $base[$b[0]];
    }
    return compare(substr($a, 1), substr($b, 1));
}

$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
usort($a, compare);
print_r($a);

First we create a lookup array to assign a "magnitude" to single digit Roman numerals. Notice this isn't their decimal value, just numbers assigned in such a way that bigger numerals get bigger values. Then we create a helper function single used by some PHP functions to to retrieve the magnitudes.

OK, now to the meat of the algorithm. It is the compare function which sometimes has to call itself recursively when it needs to break a tie. For this reason, we start with some tests to see if it has reached terminal states in the recursion. Disregard that for now and look at the first interesting test. It checks to see if either numeral being compared has a digit in it that dwarfs any digits of the other. For instance, if one of them has X in it, and the other only has I and V, then the one with X wins. This relies on the convention that certain Roman numerals are not valid, like VV or VIIIII or IIIIIIIII. At least I have never seen them written that way, so I count them as invalid.

To make this check, we map the digits to magnitudes and compare maximums. Well, this test may not decide the issue. In that case it is safe to compare the first digits of each number, since we won't have to deal with confusing issues like V < IX where the first digits don't suggest the truth. These confusing situations were taken care of by comparing largest digits.

Finally, if the first digits are equal, strip them off and repeat. At some point one of the numerals will be reduced to an empty string, and those initial tests we were temporarily disregarding will take care of that.

This method has passed all the tests I threw at it, but let me know if you find a bug or optimizations.


There would seem to be three approaches, namely:

  • Convert the numbers, sort using a standard integer sort, and convert back. (Or keep the converted versions with the roman numerals and sort the structures, to avoid the double conversion.)
  • Write a sort function that takes the strings, at that point calls a conversion function and does the appropriate comparison.
  • Write a sort function that can compare Roman numerals directly, without necessary involving a full conversion. Since Roman numerals have their higher components first, (Ms then D/Cs. then L/Xs, then I/Vs) such a function might be able to short circuit early.

The first will obviously involve additional overhead for storage. The second will involve additional conversion overhead (since the same number may be converted many times). The third might involve some unnecessary conversion overhead (again, the same number may be converted several times) but save some work on the short circuiting. If storage overheads are not an issue, the first is likely to be the best.


I got quite interested in @borrible's 1st approach, so I decided I will give it a try:

function sortRomanArray($array) {
     $combined=array_combine($array, array_map('roman2int', $array));
     asort($combined);
     return array_keys($combined);
}

This basically converts all the Roman numerals in the array into integers using array_map() and a function called roman2int() (which can be any implementation). Then it creates an array where the keys are the Roman numerals and values are the integers. Then this array is sorted with asort() that preserves key associations, and the keys are returned as an array. This array will contain the sorted Roman numerals.

I like this method because it runs the conversion function only as much times as the size of the array (6 with my example array) and there is no need to convert back.

The conversion would run certainly much more if we put it in the comparison function (2 times for every comparison).


I think you'll have to either:

  1. Wrap the strings into a RomanNumeral class, that has a sorting method OR
  2. Write a method to calculate the value of each element in the array, and sort on that
  3. See if someone has already written a RomanNumeral class/library that does this - something like this

Either way, you'll need custom sorting code that calculates the value somewhere. Since prefixing characters in Roman Numerals can sometimes mean "subtract this value" as opposed to "add this value". This is fine, because as you've pointed out, what you're really doing is sorting by numeric value, so you'll have to tell the computer how to interpret the value.


  1. Convert the numeral to a decimal using this
  2. Compare the decimals

    function roman2dec($roman) {
        // see link above
    }
    
    function compare($a, $b) {
        return roman2dec($a) < $roman2dec($b) ? -1 : 1;
    }
    


The simplest solution is probably to first convert each numeral into a regular integer (in a new array), and then sort both arrays based on the integer array. Not sure if PHP contains a function for that, though. Alternatively, you can define a comparison function that converts two Roman numerals to integers and compares them. Writing a function that directly compares two Roman numerals without converting them to integers first will likely be cumbersome.


Let's say you make this "alphabet": I, IV, V, IX, X, XL, L, XC, C, CD, D, CM, M. Then you could sort the Roman numbers according to this 'alphabet'.

Maybe this will give someone new inspiration.

EDIT: got a working example. Not really fast, sorts 1000 Roman numbers in 1.3 secs

EDIT 2: added a check to avoid the 'notices', also optimized the code a little, runs a little faster, and about twice as fast than with a conversion to integer and than sort that (used PEAR Number_Roman package)

function sortromans($a, $b){
    $alphabet = array('M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I');
    $pos = 0;
    if ($a == $b) {
        return 0;
    }

    //compare the strings, position by position, as long as they are equal
    while(isset($a[$pos]) && isset($b[$pos]) && $a[$pos] === $b[$pos]){
        $pos++;
    }

    //if string is shorter than $pos, return value
    if(!isset($a[$pos])){
        return -1;
    } else if(!isset($b[$pos])){
        return 1;
    } else {

      //check the ´character´ at position $pos, and pass the array index to a variable
      foreach($alphabet as $i=>$ch){
            if(isset($a_index) && isset($b_index)){
         break;
        }
        $length = strlen($ch);
        if(!isset($a_index) && substr($a, $pos, $length) === $ch){
            $a_index = $i;
        }
        if(!isset($b_index) && substr($b, $pos, $length) === $ch){
            $b_index = $i;
        }
      }

    }

    return ($a_index > $b_index) ? -1 : 1;
}

$romans = array('III', 'IX', 'I', 'CM', 'LXII','IV');

usort($romans, "sortromans");

echo "<pre>";
print_r($romans);
echo "</pre>";


I think the best (see my comment) first solution is to use the standard usort PHP function with the help of a special roman compare function.

The following roman_compare function is very intuitive and do not use any kind of conversion. To keep it simple, it uses tail recursion.

function roman_start( $a )
{
    static $romans = array(
        'I'  => 1,    'V'  => 5,
        'X'  => 10,   'L'  => 50,
        'C'  => 100,  'D'  => 500,
        'M'  => 1000,
    );
    return $a[0] . ($romans[$a[0]] < $romans[$a[1]] ? $a[1] : '');
}

function roman_compare( $a, $b )
{
    static $romans = array(
        'I'  => 1,    'IV' => 4,   'V'  => 5,   'IX' => 9,
        'X'  => 10,   'XL' => 40,  'L'  => 50,  'XC' => 90,
        'C'  => 100,  'CD' => 400, 'D'  => 500, 'CM' => 900,
        'M'  => 1000,
    );
    $blockA = roman_start($a);
    $blockB = roman_start($b);
    if ($blockA != $blockB)
    {
        return $romans[$blockA] - $romans[$blockB];    
    }
    $compared = strlen($blockA);
    if (strlen($a) == $compared) //string ended
    {
        return 0;
    }
    return roman_compare(substr($a, $compared), substr($b, $compared));
}

Using the above functions, we can write

function array_equal( $a, $b )
{
    return count(array_diff_assoc($a, $b)) == 0 && count(array_diff_assoc($b, $a)) == 0;
}

$a        = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
$sorted_a = array('III', 'V', 'XIII', 'XIX', 'LII', 'MCCXCIV');

var_dump(array_equal($sorted_a, $a));
usort($a, 'roman_compare');
var_dump(array_equal($sorted_a, $a));

Running all the above code we get

bool(false)
bool(true)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜