Performance of recursively searching for indices in large sets of data

2023-03-10 15:07 问答作者：

I came across a question today of search efficiency for large sets today and I've done by best to boil it down to the most basic case. I feel like this sort of thing probably relates to some classic problem or basic concept I'm missing, so a pointer to that would be great.

Suppose I have a table definition like

CREATE TABLE foo(
    id int,
    type bool,
    reference int,
    PRIMARY KEY(id),
    FOREIGN KEY(reference) REFERENCES foo(id),
    UNIQUE KEY(reference)
) Engine=InnoDB;

Populated with n rows where n/2 are randomly assigned type=1. Each row references another with its same type except for the first, which has reference=null.

Now we want to print all items with type 1. I assume that at some point, it will be faster to recursively call something like

function printFoo1($ref){
    if($ref==null)
        return;
    $q = 'SELECT id, reference FROM foo WHERE id='.$ref;
    $arr = mysql_fetch_array( mysql_query($q) );
    echo $arr[0];
    printFoo1($arr[1]);
}

As opposed to

function printFoo2($ref){
    $q = 'SELECT id FROM foo WHERE type=1';
    $res = mysql_query($q);
    while( $id = mysql_fetch_array($res) ){
        echo $id[0];
    }
}

The main point here being that function 1 searches for "id", which is indexed, whereas function 2 has to make n/2 comparisons that don't result in a hit, but that the overhead of multiple queries is going to be significantly greater than the single SELECT.

Is my 开发者_运维问答assumption correct? If so, how large of a data set would we need before function 1 outperforms function 2?

Your example is a bit difficult to parse, but ill start at the top:

Your first function does not return all of the elements with type = 1. It returns all of the elements that are dependent (based on references) to the element you pass in. From the PHP standpoint, since the link/handle is already open there is a non-trivial overhead from your function call with each successive request, not to mention the string concatenation incurring a cost with each execution of that line.

Typically it is better to use the second function styling because it only queries the database one time and will return the elements you are requesting without further work. It will come down to a profiler of course, to determine which is going to return faster, but from my tests the second is hands down the better choice:

This was executed with n = 5000 elements in the db (n/2 = 2500 type 1 and passing in reference = highest id with type = 1 from query of db).

printFoo1: 3.591840 seconds
printFoo2: 0.010340 seconds

It wouldn't really make sense for it to work any other way. If you were able to do what you propose that would make JOIN calls have to perform less efficient as well.

Code

$res = mysql_query('SELECT MAX( id ) as `MAX_ID` FROM `foo` WHERE `type` = 1', $link);
$res2 = mysql_fetch_assoc($res);

$id = $res2['MAX_ID'];

// cleanup result and free resources here

echo "printFoo1: ";
$start = microtime(true);
printFoo1($id);
echo microtime(true) - $start;

echo '<br />';

echo "printFoo2: ";
$start = microtime(true);
printFoo2();
echo microtime(true) - $start;

mysql_close($link);

All of this was tested on PHP 5.2.17 running on Linux

继续阅读：database php

Performance of recursively searching for indices in large sets of data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？