开发者

recommendation system data collection methodology

i am building a recommendation system in my application and i am probably going to use apache mahout, i ve to collect a big 开发者_开发百科dataset, it ll be collected over a period of time...so which one is least expensive between collecting it in some sort of log file vs collecting in a DB and exporting it when i need it


Mahout's recommender code can read directly from a database or a file -- if the data is reasonably formatted. It won't read general log files; they need to be translated into simple CSV or TSV. But it can read just about any table that contains users/items/preferences.

If you're already putting your data into a database table, I'd say leave it there and don't duplicate it or export it needlessly. You will probably want to have Mahout suck all that into memory, if possible.

If you're not already storing this data, and want to choose a simple and efficient representation, then I'd suggest you extract the user/item/preference information and store them in simple CSV files, compressed with gzip. These can be used easily with Mahout too and will be simpler and more compact than full log files or a database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜