开发者

java library to find the mime type from file content [duplicate]

This question already has answers here: How to get a file's Media Type (MIME type)? (27 answers) Closed 9 years ago.

I am searching for a java library which tells you the mime type by looking at the file content(byte array). I found this project using jmimemagic and it no longer supports newer file types (eg. MS word docx forma开发者_运维百科t) as it is inactive now (from 2006).


Maybe useful for someone, who needs the most used office formats as well (and does not use Apache Tika):

public class MimeTypeUtils {

    private static final Map<String, String> fileExtensionMap;

    static {
        fileExtensionMap = new HashMap<String, String>();
        // MS Office
        fileExtensionMap.put("doc", "application/msword");
        fileExtensionMap.put("dot", "application/msword");
        fileExtensionMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
        fileExtensionMap.put("dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template");
        fileExtensionMap.put("docm", "application/vnd.ms-word.document.macroEnabled.12");
        fileExtensionMap.put("dotm", "application/vnd.ms-word.template.macroEnabled.12");
        fileExtensionMap.put("xls", "application/vnd.ms-excel");
        fileExtensionMap.put("xlt", "application/vnd.ms-excel");
        fileExtensionMap.put("xla", "application/vnd.ms-excel");
        fileExtensionMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
        fileExtensionMap.put("xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template");
        fileExtensionMap.put("xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12");
        fileExtensionMap.put("xltm", "application/vnd.ms-excel.template.macroEnabled.12");
        fileExtensionMap.put("xlam", "application/vnd.ms-excel.addin.macroEnabled.12");
        fileExtensionMap.put("xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12");
        fileExtensionMap.put("ppt", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("pot", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("pps", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("ppa", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
        fileExtensionMap.put("potx", "application/vnd.openxmlformats-officedocument.presentationml.template");
        fileExtensionMap.put("ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow");
        fileExtensionMap.put("ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12");
        fileExtensionMap.put("pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
        fileExtensionMap.put("potm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
        fileExtensionMap.put("ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12");
        // Open Office
        fileExtensionMap.put("odt", "application/vnd.oasis.opendocument.text");
        fileExtensionMap.put("ott", "application/vnd.oasis.opendocument.text-template");
        fileExtensionMap.put("oth", "application/vnd.oasis.opendocument.text-web");
        fileExtensionMap.put("odm", "application/vnd.oasis.opendocument.text-master");
        fileExtensionMap.put("odg", "application/vnd.oasis.opendocument.graphics");
        fileExtensionMap.put("otg", "application/vnd.oasis.opendocument.graphics-template");
        fileExtensionMap.put("odp", "application/vnd.oasis.opendocument.presentation");
        fileExtensionMap.put("otp", "application/vnd.oasis.opendocument.presentation-template");
        fileExtensionMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
        fileExtensionMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet-template");
        fileExtensionMap.put("odc", "application/vnd.oasis.opendocument.chart");
        fileExtensionMap.put("odf", "application/vnd.oasis.opendocument.formula");
        fileExtensionMap.put("odb", "application/vnd.oasis.opendocument.database");
        fileExtensionMap.put("odi", "application/vnd.oasis.opendocument.image");
        fileExtensionMap.put("oxt", "application/vnd.openofficeorg.extension");
    }

    public static String getContentTypeByFileName(String fileName) {
        // 1. first use java's buildin utils
        FileNameMap mimeTypes = URLConnection.getFileNameMap();
        String contentType = mimeTypes.getContentTypeFor(fileName);
        // 2. nothing found -> lookup our in extension map to find types like ".doc" or ".docx"
        if (!StringUtils.hasText(contentType)) {
            String extension = FilenameUtils.getExtension(fileName);
            contentType = fileExtensionMap.get(extension);
        }
        return contentType;
    }
}


Use Apache tika for content detection. Please find the link below. http://tika.apache.org/0.8/detection.html. We have so many jar dependencies which you can find when you build tika using maven

        ByteArrayInputStream bai = new ByteArrayInputStream(pByte);
        ContentHandler contenthandler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        Parser parser = new AutoDetectParser();
        try {
              parser.parse(bai, contenthandler, metadata);

        } catch (IOException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
        } catch (SAXException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
        } catch (TikaException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
        }           
        System.out.println("Mime: " + metadata.get(Metadata.CONTENT_TYPE));
        return metadata.get(Metadata.CONTENT_TYPE);


I use javax.activation.MimetypesFileTypeMap. It starts with a small set: $JRE_HOME/lib/content-types.properties, but you can add you own. Create a file mime.types in the format shown in MimetypesFileTypeMap's javadoc (I started with a large list from the net, massaged it, and added types I found missing). Now you can add that in your code by opening your mime.types file and adding its contents to your map. However the easier solution is to add your mime.types file to the META-INF of your jar. java.activation will pick that up automagically.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜