Fastest access of a file using Hadoop
I need fastest access to a single file, several copies of which are stored in many systems using Hadoop. I also need to finding the ping t开发者_高级运维ime for each file in a sorted manner. How should I approach learning hadoop to accomplish this task? Please help fast.I have very less time.
If you need faster access to a file just increase the replication factor to that file using setrep command. This might not increase the file throughput proportionally, because of your current hardware limitations.
The ls command is not giving the access time for the directories and the files, it's showing the modification time only. Use the Offline Image Viewer to dump the contents of hdfs fsimage files to human-readable formats. Below is the command using the Indented option.
bin/hdfs oiv -i fsimagedemo -p Indented -o fsimage.txt
A sample o/p from the fsimage.txt, look for the ACCESS_TIME column.
INODE
INODE_PATH = /user/praveensripati/input/sample.txt
REPLICATION = 1
MODIFICATION_TIME = 2011-10-03 12:53
ACCESS_TIME = 2011-10-03 16:26
BLOCK_SIZE = 67108864
BLOCKS [NUM_BLOCKS = 1]
BLOCK
BLOCK_ID = -5226219854944388285
NUM_BYTES = 529
GENERATION_STAMP = 1005
NS_QUOTA = -1
DS_QUOTA = -1
PERMISSIONS
USER_NAME = praveensripati
GROUP_NAME = supergroup
PERMISSION_STRING = rw-r--r--
To get the ping time in a sorted manner, you need to write a shell script or some other program to extract the INODE_PATH and ACCESS_TIME for each of the INODE section and then sort them based on the ACCESS_TIME. You can also use Pig as shown here.
How should I approach learning hadoop to accomplish this task? Please help fast.I have very less time.
If you want to learn Hadoop in a day or two it's not possible. Here are some videos and articles to start with.
精彩评论