开发者

FileInputStream for a generic file System

I have a file that contains java serialized objects like "Vector". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readOb开发者_开发技巧ject) in one of the map task. I suppose

FileInputStream in = new FileInputStream("hdfs/path/to/file");

wont' work as the file is stored over HDFS. So I thought of using org.apache.hadoop.fs.FileSystem class. But Unfortunately it does not have any method that returns FileInputStream. All it has is a method that returns FSDataInputStream but I want a inputstream that can read serialized java objects like vector from a file rather than just primitive data types that FSDataInputStream would do.

Please help!


FileInputStream doesn't give you facitily to read serialized objects directly. You need to wrap it into ObjectInputStream. You can do the same with FSDataInputStream, just wrap it into ObjectInputStream and then you can read your objects from it.

In other words, if you have fileSystem of type org.apache.hadoop.fs.FileSystem, just use:

ObjectInputStream in = new ObjectInputStream(fileSystem.open(path));


You need to convert the FSDataInputStream like this (scala code)

val hadoopConf = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get(new     java.net.URI("hdfs://nameserv"), hadoopConf)

val in = hdfs.open(new org.apache.hadoop.fs.Path("hdfs://nameserv/somepath/myfile")).asInstanceOf[java.io.InputStream]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜