Running a Disco map-reduce job on data stored in Discodex
I have a large amount of static data that needs to offer random access. Since, I'm using Disco to digest it, I'm using the very impressive looking Discodex (key, value) store on top of the Disco Distributed File System. However, Disco's documentation is rather sparse, so I can't figure out how to use my Discodex indices as an input into a Disco job.
Is this even possible? If so, how do I do this?
Alternatively, I am thinking about this incorrectly? Would it be better to just store that data as a t开发者_C百科ext file on DDFS?
Never mind, it appears that what I'm doing isn't really meant to be done. It might be possible, but it would be far better to merely use semantic DDFS tags to refer to blobs of data.
The correct use case for Discodex is to store indexes constructed by a Disco map-reduce program that does not need be the input of another map-reduce program.
You could also use DiscoDB to store the output of one job, then use it as input to another job. The DiscoDB tutorial has a good example.
http://discoproject.org/doc/howto/discodb.html
精彩评论