Document-oriented dbms as primary db and a RDBMS db as secondary db?
I'm having 开发者_运维知识库some performance issues with MySQL database due to it's normalization.
Most of my applications that uses a database needs to do some heavy nested queries, which in my case takes a lot of time. Queries can take up 2 seconds to run, with indexes. Without indexes about 45 seconds.
A solution I came a cross a few month back was to use a faster more linear document based database, in my case Solr, as a primary database. As soon as something was changed in the MySQL database, Solr was notified.
This worked really great. All queries using the Solr database only took about 3ms.
The numbers looks good, but I'm having some problems.
- Huge database
The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.
- Difficult to render both a Solr object and a Active Record (MySQL) object without getting wet.
The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.
Like this.
# Controller
@song = Song.first
# View
@song.artist.urls.first.service.name
The problem in my case is that the data being returned from Solr is flat like this.
{
id: 123,
song: "Waterloo",
artist: "ABBA",
service_name: "Groveshark",
urls: ["url1", "url2", "url3"]
}
This forces me to build an active record object that can be passed to the view.
My question
Is there a better way to solve the problem? Some kind of super duper fast primary read only database that can handle complex queries fast would be nice.
Solr individual fields update
About reindexing all on schema change: Solr does not support updating individual fields yet, but there is a JIRA issue about this that's still unresolved. However, how many times do you change schema?
MongoDB
If you can live without a RDBMS (without joins, schema, transactions, foreign key constrains), a document-based DB like MongoDB, or CouchDB would be a perfect fit. (here is a good comparison between them )
Why use MongoBD:
- data is in native format (you can use an ORM mapper like Mongoid directly in the views, so you don't need to adapt your records as you do with Solr)
- dynamic queries
- very good performance on non-full text search queries
- schema-less (no need for migrations)
- build-in, easy to setup replication
Why use SOLR:
- advanced, very performant full-text search
Why use MySQL
- joins, constrains, transactions
Solutions
So, the solutions (combinations) would be:
Use MongoDB + Solr
- but you would still need to reindex all on schema change
Use only MongoDB
- but drop support for advanced full-text search
Use MySQL in a master-slave configuration, and balance reads from slave(s) (using a plugin like octupus) + Solr
- setup complexity
Keep current setup, denormalize data in MySQL
- messy
Solr reindexing slowness
The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.
Reindexing 200MB DB in Solr SHOULD NOT take 12 hours! Most probably you have also other issues like:
MySQL:
- n+1 issue
- indexes
SOLR:
- commit after each request - this is the default setup is you use a plugin like sunspot, but it's a perf killer for production
From http://outoftime.github.com/pivotal-sunspot-presentation.html:
- By default, Sunspot::Rails commits at the end of every request that updates the Solr index. Turn that off.
- Use Solr's autoCommit functionality. That's configured in solr/conf/solrconfig.xml
- Be glad for assumed inconsistency. Don't use search where results need to be up-to-the-second.
- other setup issues (http://wiki.apache.org/solr/SolrPerformanceFactors#Indexing_Performance)
Look at the logs for more details
Instead of pushing your data into Solr to flatten the records, why don't you just create a separate table in your MySQL database that is optimized for read only access.
Also you seem to contradict yourself
The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.
The problem in my case is that the data being returned from Solr is flat... This forces me to build a fake active record object that can be rendered by the view.
精彩评论