开发者

How to array_merge a dynamic array based on one of it's value similarity

Good day,

I am retrieving information from various websites using cURL and various parsing techniques. I made the code so I can, if desired, add additional websites I scan information from.

The information retrieved is as follow : (Please note that the information may be inaccurate and may not reflect real price/name)

Array
(
    [website1.com] => Array
        (
            [0] => Array
                (
                    [0] => 60" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 5299.99
                )
            [1] => Array
                (
                    [0] => 52" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 4499.99
                )
            [2] => Array
                (
                    [0] => 46" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 3699.99
                )
            [3] => Array
                (
                    [0] => 40" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 2999.99
                )
        )
    [website2.com] => Array
        (
            [0] => Array
                (
                    [0] => Sony 3D 60" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 5400.99
                )
            [1] => Array
                (
                    [0] => Sony 3D 52" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 4699.99
                )
            [2] => Array
                (
                    [0] => Sony 3D 46" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 3899.99
                )
        )
)

The desired output must be :

Array
(
    [0] => Array
        (
            [Name] => 60" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 5299.99
            [website2.com] => 5400.99
        )
    [1] => Array
        (
            [Name] => 52" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 4499.99
            [website2.com] => 4699.99
        )
    [2] => Array
        (
            [Name] => 46" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 3699.99
            [website2.com] => 3899.99
        )
    [3] => Array
        (
            [Name] => 40" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 2999.99
        )
)

Please note that the name may vary so using similar_text is required. Also, some information may not be shown in all websites. I am aware that only one television name must be choosed, then I will use the one from the most relevent source (website1.com)

Here is the codes I'm trying to make work.

<?php
    $_Retreived = array(
        "website1.com" => array(
            array('60" BRAVIA LX900 Series 3D HDTV', 'website1.com', 5299.99),
            array('52" BRAVIA LX900 Series 3D HDTV', 'website1.com', 4499.99),
            array('46" BRAVIA LX900 Series 3D HDTV', 'website1.com', 3699.99),
            array('40" BRAVIA LX900 Series 3D HDTV', 'website1.com', 2999.99)
        ),
        "website2.com" => array(
            array('Sony 3D 60" LX900 HDTV BRAVIA', 'website2.com', 5400.99),
            array('Sony 3D 52" LX900 HDTV BRAVIA', 'website2.com', 4699.99),
            array('Sony 3D 46" LX900 HDTV BRAVIA', 'website2.com', 3899.99),
        )
    );

    $_Prices = array();
    $_PricesTemp = array();
    $_Sites = array("website1.com", "website2.com");

    for($i = 0; $i < sizeOf($_Sites); $i++)
    {
        $_PricesTemp = array_merge($_PricesTemp, $_Retreived[ $_Sites[$i] ]);
    }

    /*
        print_r($_PricesTemp);

        Array
        (
            [0] => Array
                (
                    [0] => 60" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 5299.99
                )
            [1] => Array
                (
                    [0] => 52" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 4499.99
                )
            [2] => Array
                (
                    [0] => 46" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 3699.99
                )
            [3] => Array
                (
                    [0] => 40" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 2999.99
                )
            [4] => Array
                (
                    [0] => Sony 3D 60" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 5400.99
                )
            [5] => Array
                (
                    [0] => Sony 3D 52" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 4699.99
                )
            [6] => Array
                (
                    [0] => Sony 3D 46" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 3899.99
                )
        )
    */

    foreach($_PricesTemp As $_KeyOne => $_EntryOne)
    {
        foreach(array_reverse($_PricesTemp, true) As $_KeyTwo => $_EntryTwo)
        {
            if ($_KeyOne != $_KeyTwo)
            {
                $_Percent = 0;

                similar_text(strtoupper($_EntryOne[0]), strtoupper($_EntryTwo[0]), $_Percent);

                if ($_Percent >= 90) //If names matches 90%+
                {
                    echo "Similar : <b>" . $_KeyOne . "</b> " . $_EntryOne[0] . " and <b>" . $_KeyTwo . "</b> " . $_EntryTwo[0] . " Percent : " . $_Percent . "<br />";

                    $_Prices[] = array();
                    $_Prices[ sizeOf($_Prices)-1 ]['Name'] = $_EntryOne[0]; //Use the product name of the most revelant website (website1.com)

                    foreach($_Sites As $_Site)
                    {
                        if (isset($_EntryOne[ 1 ]) && $_EntryOne[ 1 ] == $_Site) //Check if it contains price from website1.com
                        {
                            $_Prices[ sizeOf($_Prices)-1 ][ $_Site ] = $_EntryOne[ 2 ];
                        }
                        if (isset($_EntryTwo[ 1 ]) && $_EntryTwo[ 1 ] == $_Site) //Check if it contains price from website2.com
                        {
                            $_Prices[ sizeOf($_Prices)-1 ][ $_Site ] = $_EntryTwo[ 2 ];
                        }
                    }
                }
            }
        }
    }

    /*
        print_r($_Prices);

        Array
        (
            [0] => Array
                (
                    [Name] => 60" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 2999.99
                )
            [1] => Array
                (
                    [Name] => 60" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 3699.99
                )
            [2] => Array
                (
                    [Name] => 60" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 4499.99
                )
            [3] => Array
                (
                    [Name] => 52" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 2999.99
                )
            [4] => Array
                (
                    [Name] => 52" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 3699.99
                )
            [5] => Array
                (
                    [Name] => 52" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 5299.99
                )
            [6] => Array
                (
                    [Name] => 46" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 2999.99
                )
            [7] => Array
                (
                    [Name] => 46" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 4499.99
                )
            [8] =>开发者_StackOverflow Array
                (
                    [Name] => 46" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 5299.99
                )
            [9] => Array
                (
                    [Name] => 40" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 3699.99
                )
            [10] => Array
                (
                    [Name] => 40" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 4499.99
                )
            [11] => Array
                (
                    [Name] => 40" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 5299.99
                )
            [12] => Array
                (
                    [Name] => Sony 3D 60" LX900 HDTV BRAVIA
                    [website2.com] => 3899.99
                )
            [13] => Array
                (
                    [Name] => Sony 3D 60" LX900 HDTV BRAVIA
                    [website2.com] => 4699.99
                )
            [14] => Array
                (
                    [Name] => Sony 3D 52" LX900 HDTV BRAVIA
                    [website2.com] => 3899.99
                )
            [15] => Array
                (
                    [Name] => Sony 3D 52" LX900 HDTV BRAVIA
                    [website2.com] => 5400.99
                )
            [16] => Array
                (
                    [Name] => Sony 3D 46" LX900 HDTV BRAVIA
                    [website2.com] => 4699.99
                )
            [17] => Array
                (
                    [Name] => Sony 3D 46" LX900 HDTV BRAVIA
                    [website2.com] => 5400.99
                )
        )
    */
?>

First of all, the code above isn't working. There must be a logical error somewhere that I can't put my finger on. Also, I don't believe the code will work in case I add a third website to the listing.

Any ideas guys? I have been on this since this morning.

Edit 2011-02-16:

I added a bounty to this question.


Try this gist is clearer https://gist.github.com/835099

It produced your desired outcome for me.


A high level overview should be like this:

  • create the end result array $items
  • loop through all the found items in all the websites
  • for each one, check if it's similar enough to any of the existing item names in $items
  • if yes, then add the price to that key, if no then create a new one and add it there

Instead of similar_text() you should consider using levenshtein() which is similar in practice but quite faster.

Here's some (untested, on the spot) code:

$levThreshold = 3 ;

$_Prices = array() ;
foreach ($_Retreived as $website => $websiteItems) {
    $currName = $websiteItems[0] ;
    $currWebsite = $websiteItems[1] ;
    $currPrice = $websiteItems[2] ;

    $foundItemKey = false ;

    //check current price structure. Get $priceData by reference
    //so we can modify it in the loop and keep the changed instead 
    //of the loop copy.
    foreach ($_Prices as &$priceData) {

        if (isset($priceData[$website])) {
            //already done this
            continue ;
        }

        //check if this is the item name we are looping over
        $lev = levenshtein($priceData['Name'], $currName) ;

        if ($lev < $levThreshold) {
            //item exists, add price and break
            $priceData[$website] = $currPrice ;
            $foundItemKey = true ;
            break ;
        }

    }

    //if we haven't found the item key, create a new one
    if (!$foundItemKey) {
        $newItem = array() ;
        $newItem['Name'] = $currName ;
        $newItem[$website] = $currPrice ; 
        $_Prices[] = $newItem ;
    }

}

$levThreshold is the minimum number of characters that must be different between 2 strings for them to be considered different. You can tweak that accordingly.


The problem cannot be answered using similar_text. You want to match 60" BRAVIA LX900 Series 3D HDTV with Sony 3D 60" LX900 HDTV BRAVIA. However, 60" BRAVIA LX900 Series 3D HDTV is actually more similar to 52" BRAVIA LX900 Series 3D HDTV, only two characters different.

I suspect you'll need a custom handler to match details specific to the products you are trying to match. E.g. for televisions you probably want to match size (xx"), and product family (BRAVIA LX900).

This doesn't give you a solution to your question, but it is I fear the answer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜