PHP PDO: how does re-preparing a statement affect performance

2022-12-17 06:35 问答作者：

I'm writing a semi-simple database wrapper class and want to have a fetching method which would operate automagically: it should prepare each different statement only the first time around and just bind and execute the query on successive calls.

I guess the main question is: How does re-preparing the same MySql statement work, will PDO magically recognize the statement (so I don't have to) and cease the operation?

If not, I'm planning to achieve do this by generating a unique key for each different query and keep the prepared statements in a private array in the database object - under its unique key. I'm planning to obtain the array key in one of the following ways (none of which I like). In order of preference:

have the programmer pass an extra, always the s开发者_如何转开发ame parameter when calling the method - something along the lines of basename(__FILE__, ".php") . __LINE__ (this method would work only if our method is called within a loop - which is the case most of the time this functionality is needed)
have the programmer pass a totally random string (most likely generated beforehand) as an extra parameter
use the passed query itself to generate the key - getting the hash of the query or something similar
achieve the same as the first bullet (above) by calling debug_backtrace

Has anyone similar experience? Although the system I'm working for does deserve some attention to optimization (it's quite large and growing by the week), perhaps I'm worrying about nothing and there is no performance benefit in doing what I'm doing?

MySQL (like most DBMS) will cache execution plans for prepared statements, so if user A creates a plan for:

SELECT * FROM some_table WHERE a_col=:v1 AND b_col=:v2

(where v1 and v2 are bind vars) then sends values to be interpolated by the DBMS, then user B sends the same query (but with different values for interpolation) the DBMS does not have to regenerate the plan. i.e. it's the DBMS which finds the matching plan - not PDO.

However this means that each operation on the database requires at least 2 round trips (1st to present the query, the second to present the bind vars) as opposed to a single round trip for a query with literal values, then this introduces additional network costs. There is also a small cost involved in dereferencing (and maintaining) the query/plan cache.

The key question is whether this cost is greater than the cost of generating the plan in the first place.

While (in my experience) there definitely seems to be a performance benefit using prepared statements with Oracle, I'm not convinced that the same is true for MySQL - however, a lot will depend on the structure of your database and the complexity of the query (or more specifically, how many different options the optimizer can find for resolving the query).

Try measuring it yourself (hint: you might want to set the slow query threshold to 0 and write some code to convert literal values back into anonymous representations for the queries written to the logs).

Believe me, I've done this before and after building a cache of prepared statements the performance gain was very noticeable - see this question: Preparing SQL Statements with PDO.

An this was the code I came up after, with cached prepared statements:

function DB($query)
{
    static $db = null;
    static $result = array();

    if (is_null($db) === true)
    {
        $db = new PDO('sqlite:' . $query, null, null, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING));
    }

    else if (is_a($db, 'PDO') === true)
    {
        $hash = md5($query);

        if (empty($result[$hash]) === true)
        {
            $result[$hash] = $db->prepare($query);
        }

        if (is_a($result[$hash], 'PDOStatement') === true)
        {
            if ($result[$hash]->execute(array_slice(func_get_args(), 1)) === true)
            {
                if (stripos($query, 'INSERT') === 0)
                {
                    return $db->lastInsertId();
                }

                else if (stripos($query, 'SELECT') === 0)
                {
                    return $result[$hash]->fetchAll(PDO::FETCH_ASSOC);
                }

                else if ((stripos($query, 'UPDATE') === 0) || (stripos($query, 'DELETE') === 0))
                {
                    return $result[$hash]->rowCount();
                }

                else if (stripos($query, 'REPLACE') === 0)
                {
                }

                return true;
            }
        }

        return false;
    }
}

Since I don't need to worry about collisions in queries, I've ended up using md5() instead of sha1().

OK, since I've been bashing methods of keying the queries for the cache, other than simply using the query string itself, I've done a naive benchmark. The following compares using the plain query string vs first creating the md5 hash:

$ php -v
$ PHP 5.3.0-3 with Suhosin-Patch (cli) (built: Aug 26 2009 08:01:52)
$ ...
$ php benchmark.php
$ PHP hashing: 0.19465494155884 [microtime]
$ MD5 hashing: 0.57781004905701 [microtime]
$ 799994

The code:

<?php
error_reporting(E_ALL);

$queries = array("SELECT",
                 "INSERT",
                 "UPDATE",
                 "DELETE",
                 );
$query_length = 256;
$num_queries  = 256;
$iter = 10000;

for ($i = 0; $i < $num_queries; $i++) {
    $q = implode('',
           array_map("chr",
             array_map("rand",
                       array_fill(0, $query_length, ord("a")),
                       array_fill(0, $query_length, ord("z")))));
    $queries[] = $q;
}

echo count($queries), "\n";

$cache = array();
$side_effect1 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
    foreach ($queries as $q) {
        if (!isset($cache[$q])) {
            $cache[$q] = $q;
        }
        else {
            $side_effect1++;
        }
    }
}
echo microtime(true) - $t, "\n";

$cache = array();
$side_effect2 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
    foreach ($queries as $q) {
        $md5 = md5($q);
        if (!isset($cache[$md5])) {
            $cache[$md5] = $q;
        }
        else {
            $side_effect2++;
        }
    }
}
echo microtime(true) - $t, "\n";

echo $side_effect1 + $side_effect2, "\n";

To my knowledge PDO does not reuse already prepared statements as it does not analyse the query by itself so it does not know if it is the same query.

If you want to create a cache of prepared queries, the simplest way imho would be to md5-hash the query string and generate a lookup table.

OTOH: How many queries are you executing (per minute)? If less than a few hundred then you only complicate the code, the performance gain will be minor.

Using a MD5 hash as a key you could eventually get two queries that result in the same MD5 hash. The probability is not high, but it could happen. Don't do it. Lossful hashing algorithms like MD5 is just ment as a way to tell if two objects are different with high certainty, but are not a safe means of identifying something.

继续阅读：pdo php prepared-statement

PHP PDO: how does re-preparing a statement affect performance

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？