Converting a legacy EAV schema to Mongo or Couch

2023-02-25 17:01 问答作者：

Let's say I have a legacy application that, for various reasons, previous developers decided must have an arbitrarily flexible schema, and they reinvented the Entity-Attribute-Value model yet again. They were actually trying to build a document repository, for which tools like Mongo or Couch would now be a better fit in today's world, but were not available or not known to the previous teams.

To stay competitive, let's say we need to build more powerful methods for querying and analyzing information in our system. Based on the sheer number and variety of attributes, it seems like map/reduce is a better fit for our set of problems than gradually refactoring the system into a more relational schema.

The original source database has millions of documents, but only a small number of distinct document types. There ar开发者_开发技巧e some commonalities across the distinct document types.

What's an effective strategy for doing a migration from a massive EAV implementation in, say, MySql, to a document-oriented store like Mongo or Couch?

I can certainly imagine an approach to attack this, but I'd really like to see a tutorial or war story to learn from someone who has already attacked this type of problem.

What were some strategies for doing this kind of conversion that worked well? What lessons did you learn? What pitfalls should I avoid? How did you deal with legacy apps that still expect to be able to interact with the existing database?

My first usage of Couch was after I had written a Ruby and Postgres web crawler (directed crawl of mp3 blogs to build a recommendation engine).

The relational schema got deeply gnarly as I tried to record ID3 metadata, audio signatures, etc etc, and the detect overlaps and otherwise do deduplication. It worked but it was slow. So slow I started caching my JSON API rows onto the corresponding primary ActiveRecord objects as blob fields.

I had a choice: dig in and learn Postgres performance tuning, or move to a horizontal approach. So I used Nutch and Hadoop to spider the web, and the PipeMapper to parse pages with Ruby / Hpricot. So I was able to reuse all my parser code, and just change it from saving as a normalized database, into saving as JSON. I wrote a little library to handle the JSON and the REST URL endpoints, called CouchRest, which I used to save the Hpricot results into CouchDB.

For that project I just ran Couch on a single EC2 node, with a small 6 node Hadoop cluster populating it. It was only when I got around to building the browsing interface for the spidered data, that I really got a good feeling for the query capabilities.

I turned out to be flexible and especially well suited to OLTP applications, I quickly started using it in all my project and eventually founded a company around the technology with two of the creators.

继续阅读：entity-attribute-value mongodb

Converting a legacy EAV schema to Mongo or Couch

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？