开发者

Gem that allows for data access using sharded mysql databases while maintaining the usage of Activerecord

This is a relatively complex problem that I am thinking of, so please suggest edits or comment on parts where you are not clear about. I will update and iterate based on your comments

I am thinking of a developing a rails gem that simplifies the usage of sharded tables, even when most of your data is stored in relational databases. I believe this is similar to the concept being used in Quora or Friendfeed when they hit a wall scaling w traditional mysql, with most of the potential solutions requiring massive migration (nosql), or just being really painful (sticking w relational completely)

  • http://bret.appspot.com/entry/how-friendfeed-uses-mysql
  • http://www.quora.com/When-Adam-DAngelo-says-partition-your-data-at-the-application-level-what-exactly-does-he-mean?q=application+layer+quora+adam+

Essentially, how can we continue using MySQL for a lot of things it is really good at, yet allowing parts of the system to scale? This will allow someone got started using mysql/activerecord, but hit a roadblock scaling to easily scale the parts of the database that makes sense.

For us, we are using Ruby on Rails on a sharded database, and storing JSON blobs in them. Since we cannot do joins, we are creating tables for relationships between entities.

For example, we have 10 different type of entities. Each entity can be linked to each other using a big (sharded) relationship tables.

The tables are extremely simple. The indexes is (Id1, Id2..., type), and da开发者_如何学Cta is stored in the JSON blob.

  • Id, type, {json data}
  • Id1, Id2, type {json data}
  • Id1, Id2, Id3, type {json data}

We have put a lot of work into creating higher level interfaces for storing a range of data sets for relational data

For any given type, you can define a type of storage - (value, unweighted list, weighted lists, weighted lists with guids)

We have higher level interfaces for each of them - querying, sorting, timestamp comparison, intersections etc.

That way, if someone realizes that they need to scale a specific part of the database, they can keep most of their infrastructure, and move only the tables they need into this sharded database

What are your thoughts? As mentioned above, I would love to know what you folks think


Scalability is a tough nut to crack. My background includes two years as a sales engineer for BEA systems, back when all they sold was the TUXEDO middleware (TUXEDO == Transactions for UNix Extended for Distributed Operations). TUXEDO is still the king of the TPC-C benchmark on Unix platforms.

Scaling WRT a database is not so much about the database itself, it's about how you access that database. For example, if you establish a connection to a database, and you want that single connection to scale, make that connection access the same table in the database always. The problem with today's infrastructures (RoR included) is that when they open generic connections, those connections accesses many tables in the database.

So if you want to make a database CONNECTION scale, make that connection focus the database engine on as few database resources as possible. If you can manage to create a 'focused' connection, that ONLY accesses one table, and one table index, for example, it will scale much better than a connection that accesses EVERY table in the database and every index defined for all those tables.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜