开发者

Determining the mime type of a file

How can I determine the mime-type of a file (in OCaml)?

I am tryin开发者_StackOverflowg to set the language for a GtkSourceView control, but to do that, I need to first determine the language. The only way I can see of doing this is using the mime-type - there is a function that will return the correct language as follows:

GSourceView.source_languages_manager#get_language_from_mime_type : string -> source_language option

I really don't want to hard code the language into my source. If it isn't possible to determine the mime-type in OCaml (and I haven't yet found a way, after searching through the documentation), is there perhaps another way I can determine the source language?


After studying the source code of gedit, which includes this functionality, I have discovered a method in glib which will do this for me. This answer provides an example use of the g_file_info_get_content_type() method. There is also the g_content_type_get_mime_type() method, which is also available in glib.

Unfortunately, there is no wrapping available for these functions yet, which means I may have to generate my own wrapping for them.


Most languages lack this, so I would be very surprised to find it in OCaml. Apache does it with a mime.types file - you can look there for hints. This is the most usual way - a huge table which maps extensions into mimetypes. You can implement it in OCaml easily:

let mimetype_of_extension = function
    | "txt" | "log" -> "text/plain"
    | "html" | "htm" -> "text/html"
    | "zip" | "application/zip"
...

Another way is to look at the file contents, but then you basically need to know about the various file formats.

That said, it does not help you much, since source files of all languages are normally treated as text/plain. They are not distinguishable by mimetype; and thus I really have no idea what your get_language_from_mime_type function does.

However, filename extensions of various source files are more-or-less standardised, so if you know the extension, you will know the language. Getting the extension is as simple as ripping whatever follows the last period from the filename.

let extension_of_filename filename =
    let pos = (String.rindex filename '.') + 1 in
    let len = String.length filename in
    let ext = String.create (len - pos) in
    String.blit filename pos ext 0 (len - pos);
    ext;;

Well, okay, simple in any language except Brainfuck and OCaml, at least. After that, it's easy - "c" is a C program, as is "h"; "ml" is OCaml; etc.


In GTK, you can wrap the functions you have already found.

It is also not hard to parse /etc/mime.types - it's a simple whitespace-separated file. I believe both Ocsigen and Ocamlnet contain code to do this, but I don't know off-hand if they make it easy to access (e.g. a function exposed by the Ocamlnet netstring library).


This is probably not the best method for determining the type of source code (using /etc/mime.types is best for that IMO), but there are also OCaml bindings for libmagic that you could use.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜