Document-oriented dbms as primary db and a RDBMS db as secondary db?

2023-04-12 09:42 问答作者：

I'm having 开发者_运维知识库some performance issues with MySQL database due to it's normalization.

Most of my applications that uses a database needs to do some heavy nested queries, which in my case takes a lot of time. Queries can take up 2 seconds to run, with indexes. Without indexes about 45 seconds.

A solution I came a cross a few month back was to use a faster more linear document based database, in my case Solr, as a primary database. As soon as something was changed in the MySQL database, Solr was notified.

This worked really great. All queries using the Solr database only took about 3ms.

The numbers looks good, but I'm having some problems.

Huge database

The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.

Difficult to render both a Solr object and a Active Record (MySQL) object without getting wet.

The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.

Like this.

# Controller
@song = Song.first

# View
@song.artist.urls.first.service.name

The problem in my case is that the data being returned from Solr is flat like this.

{
  id: 123,
  song: "Waterloo",
  artist: "ABBA",
  service_name: "Groveshark",
  urls: ["url1", "url2", "url3"]
}

This forces me to build an active record object that can be passed to the view.

My question

Is there a better way to solve the problem? Some kind of super duper fast primary read only database that can handle complex queries fast would be nice.

Solr individual fields update

About reindexing all on schema change: Solr does not support updating individual fields yet, but there is a JIRA issue about this that's still unresolved. However, how many times do you change schema?

MongoDB

If you can live without a RDBMS (without joins, schema, transactions, foreign key constrains), a document-based DB like MongoDB, or CouchDB would be a perfect fit. (here is a good comparison between them )

Why use MongoBD:

data is in native format (you can use an ORM mapper like Mongoid directly in the views, so you don't need to adapt your records as you do with Solr)
dynamic queries
very good performance on non-full text search queries
schema-less (no need for migrations)
build-in, easy to setup replication

Why use SOLR:

advanced, very performant full-text search

Why use MySQL

joins, constrains, transactions

Solutions

So, the solutions (combinations) would be:

Use MongoDB + Solr
- but you would still need to reindex all on schema change
Use only MongoDB
- but drop support for advanced full-text search
Use MySQL in a master-slave configuration, and balance reads from slave(s) (using a plugin like octupus) + Solr
- setup complexity
Keep current setup, denormalize data in MySQL
- messy

Solr reindexing slowness

The MySQL database is about 200mb, the Solr db contains about 1.4Gb of data. Each time I need to change a table/column the database need to be reindexed, which in this example took over 12 hours.

Reindexing 200MB DB in Solr SHOULD NOT take 12 hours! Most probably you have also other issues like:

MySQL:

n+1 issue
indexes

SOLR:

commit after each request - this is the default setup is you use a plugin like sunspot, but it's a perf killer for production

From http://outoftime.github.com/pivotal-sunspot-presentation.html:

By default, Sunspot::Rails commits at the end of every request that updates the Solr index. Turn that off.

Use Solr's autoCommit functionality. That's configured in solr/conf/solrconfig.xml

Be glad for assumed inconsistency. Don't use search where results need to be up-to-the-second.

other setup issues (http://wiki.apache.org/solr/SolrPerformanceFactors#Indexing_Performance)

Look at the logs for more details

Instead of pushing your data into Solr to flatten the records, why don't you just create a separate table in your MySQL database that is optimized for read only access.

Also you seem to contradict yourself

The view is relying on a certain object. It doesn't care if the object it self is an Active Record object or an Solr object, as long as it can call a set of attributes on the it.

The problem in my case is that the data being returned from Solr is flat... This forces me to build a fake active record object that can be rendered by the view.

继续阅读：database document-oriented-db ruby-on-rails solr

Document-oriented dbms as primary db and a RDBMS db as secondary db?

Solr individual fields update

MongoDB

Solutions

Solr reindexing slowness

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Solr individual fields update

MongoDB

Solutions

Solr reindexing slowness

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？