开发者

PHP PDO: how does re-preparing a statement affect performance

I'm writing a semi-simple database wrapper class and want to have a fetching method which would operate automagically: it should prepare each different statement only the first time around and just bind and execute the query on successive calls.

I guess the main question is: How does re-preparing the same MySql statement work, will PDO magically recognize the statement (so I don't have to) and cease the operation?

If not, I'm planning to achieve do this by generating a unique key for each different query and keep the prepared statements in a private array in the database object - under its unique key. I'm planning to obtain the array key in one of the following ways (none of which I like). In order of preference:

  • have the programmer pass an extra, always the s开发者_如何转开发ame parameter when calling the method - something along the lines of basename(__FILE__, ".php") . __LINE__ (this method would work only if our method is called within a loop - which is the case most of the time this functionality is needed)
  • have the programmer pass a totally random string (most likely generated beforehand) as an extra parameter
  • use the passed query itself to generate the key - getting the hash of the query or something similar
  • achieve the same as the first bullet (above) by calling debug_backtrace

Has anyone similar experience? Although the system I'm working for does deserve some attention to optimization (it's quite large and growing by the week), perhaps I'm worrying about nothing and there is no performance benefit in doing what I'm doing?


MySQL (like most DBMS) will cache execution plans for prepared statements, so if user A creates a plan for:

SELECT * FROM some_table WHERE a_col=:v1 AND b_col=:v2

(where v1 and v2 are bind vars) then sends values to be interpolated by the DBMS, then user B sends the same query (but with different values for interpolation) the DBMS does not have to regenerate the plan. i.e. it's the DBMS which finds the matching plan - not PDO.

However this means that each operation on the database requires at least 2 round trips (1st to present the query, the second to present the bind vars) as opposed to a single round trip for a query with literal values, then this introduces additional network costs. There is also a small cost involved in dereferencing (and maintaining) the query/plan cache.

The key question is whether this cost is greater than the cost of generating the plan in the first place.

While (in my experience) there definitely seems to be a performance benefit using prepared statements with Oracle, I'm not convinced that the same is true for MySQL - however, a lot will depend on the structure of your database and the complexity of the query (or more specifically, how many different options the optimizer can find for resolving the query).

Try measuring it yourself (hint: you might want to set the slow query threshold to 0 and write some code to convert literal values back into anonymous representations for the queries written to the logs).


Believe me, I've done this before and after building a cache of prepared statements the performance gain was very noticeable - see this question: Preparing SQL Statements with PDO.

An this was the code I came up after, with cached prepared statements:

function DB($query)
{
    static $db = null;
    static $result = array();

    if (is_null($db) === true)
    {
        $db = new PDO('sqlite:' . $query, null, null, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING));
    }

    else if (is_a($db, 'PDO') === true)
    {
        $hash = md5($query);

        if (empty($result[$hash]) === true)
        {
            $result[$hash] = $db->prepare($query);
        }

        if (is_a($result[$hash], 'PDOStatement') === true)
        {
            if ($result[$hash]->execute(array_slice(func_get_args(), 1)) === true)
            {
                if (stripos($query, 'INSERT') === 0)
                {
                    return $db->lastInsertId();
                }

                else if (stripos($query, 'SELECT') === 0)
                {
                    return $result[$hash]->fetchAll(PDO::FETCH_ASSOC);
                }

                else if ((stripos($query, 'UPDATE') === 0) || (stripos($query, 'DELETE') === 0))
                {
                    return $result[$hash]->rowCount();
                }

                else if (stripos($query, 'REPLACE') === 0)
                {
                }

                return true;
            }
        }

        return false;
    }
}

Since I don't need to worry about collisions in queries, I've ended up using md5() instead of sha1().


OK, since I've been bashing methods of keying the queries for the cache, other than simply using the query string itself, I've done a naive benchmark. The following compares using the plain query string vs first creating the md5 hash:

$ php -v
$ PHP 5.3.0-3 with Suhosin-Patch (cli) (built: Aug 26 2009 08:01:52)
$ ...
$ php benchmark.php
$ PHP hashing: 0.19465494155884 [microtime]
$ MD5 hashing: 0.57781004905701 [microtime]
$ 799994

The code:

<?php
error_reporting(E_ALL);

$queries = array("SELECT",
                 "INSERT",
                 "UPDATE",
                 "DELETE",
                 );
$query_length = 256;
$num_queries  = 256;
$iter = 10000;

for ($i = 0; $i < $num_queries; $i++) {
    $q = implode('',
           array_map("chr",
             array_map("rand",
                       array_fill(0, $query_length, ord("a")),
                       array_fill(0, $query_length, ord("z")))));
    $queries[] = $q;
}

echo count($queries), "\n";

$cache = array();
$side_effect1 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
    foreach ($queries as $q) {
        if (!isset($cache[$q])) {
            $cache[$q] = $q;
        }
        else {
            $side_effect1++;
        }
    }
}
echo microtime(true) - $t, "\n";

$cache = array();
$side_effect2 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
    foreach ($queries as $q) {
        $md5 = md5($q);
        if (!isset($cache[$md5])) {
            $cache[$md5] = $q;
        }
        else {
            $side_effect2++;
        }
    }
}
echo microtime(true) - $t, "\n";

echo $side_effect1 + $side_effect2, "\n";


To my knowledge PDO does not reuse already prepared statements as it does not analyse the query by itself so it does not know if it is the same query.

If you want to create a cache of prepared queries, the simplest way imho would be to md5-hash the query string and generate a lookup table.

OTOH: How many queries are you executing (per minute)? If less than a few hundred then you only complicate the code, the performance gain will be minor.


Using a MD5 hash as a key you could eventually get two queries that result in the same MD5 hash. The probability is not high, but it could happen. Don't do it. Lossful hashing algorithms like MD5 is just ment as a way to tell if two objects are different with high certainty, but are not a safe means of identifying something.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜