Which database for a web crawler, and how do I use MySQL in a distributed environment?
Which database engine should I use for a web crawler, InnoDB or MYiSAM? I have two PC's, each with 1TB hard drives. If one fills up, I'开发者_StackOverflow社区d like for it to save to the other PC automatically, but reads should go to the correct PC; how do I do that?
As for the first part of your question, it rather depends on you precise implementation. If you are going to have a single crawler limited by network bandwidth, then MYiSAM can be quicker. If you are using multiple crawlers then InnoDB will give you advantages such as transactions which may help.
AFAIK MySQL doesn't support the hardware configuration you are suggesting. If you need large storage you may wan tot look at MySQL Cluster.
MyISAM is the first choice, because you will have write only operations and crawlers -- even run in parallel -- will be configured -- I suppose -- to crawl different domains/urls. So you do not need to take care of access conflicts.
When writing a lot of data, especially text!, to Mysql avoid transactions, indexes, etc., because it will slow down MySQL drastically.
精彩评论