FileInputStream for a generic file System
I have a file that contains java serialized objects like "Vector". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readOb开发者_开发技巧ject) in one of the map task. I suppose
FileInputStream in = new FileInputStream("hdfs/path/to/file");
wont' work as the file is stored over HDFS. So I thought of using org.apache.hadoop.fs.FileSystem class. But Unfortunately it does not have any method that returns FileInputStream. All it has is a method that returns FSDataInputStream but I want a inputstream that can read serialized java objects like vector from a file rather than just primitive data types that FSDataInputStream would do.
Please help!
FileInputStream doesn't give you facitily to read serialized objects directly. You need to wrap it into ObjectInputStream. You can do the same with FSDataInputStream, just wrap it into ObjectInputStream and then you can read your objects from it.
In other words, if you have fileSystem
of type org.apache.hadoop.fs.FileSystem
, just use:
ObjectInputStream in = new ObjectInputStream(fileSystem.open(path));
You need to convert the FSDataInputStream like this (scala code)
val hadoopConf = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://nameserv"), hadoopConf)
val in = hdfs.open(new org.apache.hadoop.fs.Path("hdfs://nameserv/somepath/myfile")).asInstanceOf[java.io.InputStream]
精彩评论