Are there any libraries or components that handle storage and fast retrieval of user-generated content?

2022-12-18 10:21 问答作者：

Considering the case of having a large and active user base where each user wants to store a profile picture and some additional images or other artifacts, are there any libraries or frameworks that allow for easy storage and query of such data?

A reference implementation would be Facebook's Haystack Photo Infrastructure.

The following characteristics are important

Data store should scale well: adding resources should be transparent to the application using the store (similar question had an answer referring to LinkedIn's Voldemort).
Ability to add some meta-data alongside the data being stored.
Meta-data can be queried with good performance (e.g. stored in configurable index like Lucene/Solr).
Quick key-based access and some intermediate caching layer

Any recommendations for libraries or frameworks that can be easily integrated into a Java web application are welcome.

Update: thank you for the first few answers. I have to go into more detail on what type of answers are expected. Tobu's answer, although not java related is very good 开发者_如何学编程(just voted up). It is possible to implement a solution with a combination of file system access and a DB and add some layer of caching in between, but I consider it a waste of time, if someone more qualified than me has already designed, implemented and run a better solution. Something based on a solution with underlying DB or JCR implementations is a good fit, but implementing the other infrastructure is not what I want to do.

MogileFS is what LiveJournal uses. Not particularly Java though.

We've made good experiences with the media repository from Fedora Commons (http://www.fedora-commons.org/), which allows you to store media assets alongside their associated metadata. We did not have any problems with scalability or customization nor was it difficult to exchange the underlying storage layer with a triple store (if this would be needed in your case). If you need to index your data using Solr you can use a predefined meta data field ("RELS-EXT") to store XML based data.

I feel your requirements are pretty close to what a database is providing. Just make sure the tables design correspond to your needs (for example, you could have the big data like images in a separate table from the metadata).

All your requirements would be covered, including the caching layer in the database (and you could have an additional caching layer in your application as needed, that would probably be used also for the rest of your application).

Apache Jackrabbit is a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and 283). But it has some performance issues (at least in the 2 years old version I use), best way to overcome them is replicating static images to a webserver. (Using WebDAV, davfs and rsync)

It depends on the quantification of "large and active user base"...

80% of websites could simply use a NoSQL schema-free approach like y_serial:

y_serial.py module :: warehouse Python objects with SQLite

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."

http://yserial.sourceforge.net

If the photos and artifacts per user are under 2M compressed, performance should be good.

For the remaining 20% case usage, one easily import the data from yserial into Cassandra -- which is now adopted by Facebook, Digg, and Twitter.

继续阅读：data-storage user-generated-content

Are there any libraries or components that handle storage and fast retrieval of user-generated content?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？