Proper way to differentiate pst and dbx files in bash shell
I want to identify the file-format of the input file given to my shell script - whether a .pst
or a .dbx
file. I checked How to check the extension of a filename in a bash script?. That one deals with txt
files and two methods are given there -
check if the extension is
txt
check if the mime type is
application/text
etc.I tried
file -ib <filename>
on a.pst
and a.dbx
file and it showedapplication/octet-stream
for both. However, if I just dofile <filename>
, then I get
this for the dbx file -
file1.dbx: Microsoft Outlook Express DBX File Message database
and this for the pst file -
file2.pst: Microsoft Outlook binary email folder (Outlook >=2003)
So, my questions are -
is it better to use mime type detection everytime when the output can be anything and we need a proper check?
How to apply mime type check in this case - both returning "application/octet-stream"?
开发者_开发问答
Update
I didn't want to do an extension based detection because it seems we just can't be sure on a Unix system, that a .dbx file truly is a dbx file. Sincefile <filename>
returns a line which contains the correct information of the file (e.g. "Microsoft Outlook Express DBX File Message database"). That means the file command is able to identify the file type properly. Then why does it not get the correct information in file -ib <filename>
command?
Will parsing the string output of file <filename>
be fine? Is it advisable assuming I only need to identify a narrow set of data storage files of outlook family (MS Outlook Express, MS Office Outlook 2003,2007,2010 etc.). A small text identifier like application/dbx
which could be compared would be all I need.The file
command relies on having a file type detection database which includes rules for the file types that you expect to encounter. It may not be possible to recognize these file types if the file content doesn't have a unique code near the beginning of the file.
Note that the -i option to emit mime types actually uses a separate "magic" numbers file to recognize file types rather than translating long descriptions to file types. It is quite possible for these two databases to be out of sync. If your application really needs to recognize these two file types I suggest that you look at the Linux source code for "file" to see how they recognize them and then code this recognition algorithm right into your app.
If you want to do the equivalent of DOS
file type detection, then strip the extension off the filename (everything after the last period) and look up that string in your own table where you define the types that you need.
精彩评论