Does Cloudera Mountable HDFS provide deduplicaion
Looking at running a HDFS based storage cluster, and looking at a simple method of using the Mountable HDFS system through the Cloudera release.
The first question I ask is will this provide automatic deduplication of data?
The second question I ask if deduplication will be done, when all user delete files that contain the certain deduplicated block, does it then actually delete the block from st开发者_如何学Pythonorage or just the index/reference for that user?
Lastly, would this method include the Rainstor compression methods?
Thanks for your input
No, HDFS does not include data deduplication.
The architecture is mainly focused on optimally use sequential write/read patterns, so it is pretty much against deduplication as every deduplication approach I am aware of introduces a certain amount of random IO pattern.
精彩评论