开发者

Is MySQL database access speed limited primarily by the db, or by the language used to access it?

I need to update a large db quickly. It may be easier to code in a scripting language but I suspect a C program would do the update faster. Anybody know if ther开发者_开发技巧e have been comparative speed tests?


It wouldn't. The rate of the update speed depends on:

  • database configuration (engine used, db config)
  • hardware of the server, especially the HDD subsystem
  • network bandwith between source and target machine
  • amount of data transfered

I suspect that you think that a scripting language will be a hog in this last part - amount of data transfered.

Any scripting language will be fast enough to deliver the data. If you have a large amount of data that you need to parse / transform quickly - then yes, C would definitely be language of choice. However if it's sending simple string data to the db, there's no point in doing that, although it's not like it's difficult to create a simple C program for UPDATE operation. It's not like it's that complicated to do it in C, it's almost on par with using PHP's mysql_ functions from "complexity" point of view.


Are you concerned about speed because you're already dealing with a situation where speed is a problem, or are you just planning ahead?

I can say comfortably that DB interactions are generally constrained by IO, network bandwidth, memory, database traffic, SQL complexity, database configuration, indexing issues, and the quantity of data being selected far more than by the choice of a scripting language versus C.

When you run into bottlenecks, they'll almost always be solved by a better algorithm, smarter use of indexes, faster IO devices, more caching... those sorts of things (beginning with algorithms).

The fourth component of LAMP is a scripting language after all. When fine tuning, memcache becomes an option, as well as persistent interpreters (such as mod_perl in a web environment, for example).


The majority cost in database transactions lie on the database side. The cost of interpreting / compiling your SQL statement and evaluating the query execution is much more substantial than any difference to be found in the language of what sent it.

It is in rare situations that the application's CPU usage for database-intensive work is a greater factor than the CPU use of the database server, or the disk speed of that server.

Unless your applications are long-running and don't wait on the database, I wouldn't worry about benchmarking them. If they do need benchmarking, you should do it yourself. Data use cases vary wildly and you need your own numbers.


Since C's a lower-level language, it won't have the parseing/type-conversion overhead that the scripting languages will. A MySQL int can map directly onto a C int, whereas a PHP int has various metadata attached to it that needs to be populated/updated.

On the other hand, if you need to do any text manipulation as part of this large update, any speed gains from C would probably be lost in hairpulling/debugging because of its poor string manipulation support versus what you could do with trivial ease in a scripting language like Perl or PHP.


I've heard speculation that the C API is faster, but I haven't seen any benchmarks. For performing large database operations quickly, regardless of programming language, use Stored Procedures: http://dev.mysql.com/tech-resources/articles/mysql-storedprocedures.html.

The speed comes from the fact that there is a reduced strain on the network.

From that link:

Stored procedures are fast! Well, we can't prove that for MySQL yet, and everyone's experience will vary. What we can say is that the MySQL server takes some advantage of caching, just as prepared statements do. There is no compilation, so an SQL stored procedure won't work as quickly as a procedure written with an external language such as C. The main speed gain comes from reduction of network traffic. If you have a repetitive task that requires checking, looping, multiple statements, and no user interaction, do it with a single call to a procedure that's stored on the server. Then there won't be messages going back and forth between server and client, for every step of the task.


The C API will be marginally faster, for the simple reason that any other language (regardless of whether it's a "scripting language" or a fully-compiled language) will probably, at some level, be mapping from that language to the C API. Using the C API directly will obviously be a few dozen CPU cycles faster than performing a mapping operation and then using the C API.

But this is just spitting in the ocean. Even accessing main memory is an order of magnitude or two slower than CPU cycles on a modern machine and I/O operations (disk or network access) are several orders of magnitude slower still. There's no point in optimizing to make it a microsecond faster to send the query if it will still take half a second (or even multiple seconds, for queries which are complex or examine/return large amounts of data) to actually run the query.

Choose the language that you will be most productive in and don't worry about micro-optimizing language choice. Even if the language itself becomes a performance issue (which is extremely unlikely), your additional productivity will save more money than the cost of an additional server.


I have found that for large batches of data (Gigabytes or more), it is commonly faster overall to dump the data from mysql into a file or multiple files on an application machine. Then process it there (with your favourite tool, here: Perl) and the use LOAD DATA LOCAL INFILE to slurp it back into a fresh table while doing as little as possible in SQL. While doing that, you should

  • remove indexes from the table before LOAD (may not be necessary for MyISAM, but meh).

  • always, ALWAYS load the data in PK order!

  • add indexes after being done with loading.

Another advantage is that it may be much easier to parallelize the processing on a cheap application machine with a bunch of fast-but-volatile disks rather than do concurrent writing to your expensive and non-scalable database master.

Either way. Large datasets usually mean that the DB is the bottleneck.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜