Proper way to differentiate pst and dbx files in bash shell

2023-02-07 16:32 问答作者：

I want to identify the file-format of the input file given to my shell script - whether a .pst or a .dbx file. I checked How to check the extension of a filename in a bash script?. That one deals with txt files and two methods are given there -

check if the extension is txt
check if the mime type is application/text etc.

I tried file -ib <filename> on a .pst and a .dbx file and it showed application/octet-stream for both. However, if I just do file <filename>, then I get

this for the dbx file -

file1.dbx: Microsoft Outlook Express DBX File Message database

and this for the pst file -

file2.pst: Microsoft Outlook binary email folder (Outlook >=2003)

So, my questions are -

is it better to use mime type detection everytime when the output can be anything and we need a proper check?
How to apply mime type check in this case - both returning "application/octet-stream"?

开发者_开发问答

Update

I didn't want to do an extension based detection because it seems we just can't be sure on a Unix system, that a .dbx file truly is a dbx file. Since file <filename> returns a line which contains the correct information of the file (e.g. "Microsoft Outlook Express DBX File Message database"). That means the file command is able to identify the file type properly. Then why does it not get the correct information in file -ib <filename> command?

Will parsing the string output of file <filename> be fine? Is it advisable assuming I only need to identify a narrow set of data storage files of outlook family (MS Outlook Express, MS Office Outlook 2003,2007,2010 etc.). A small text identifier like application/dbx which could be compared would be all I need.

The file command relies on having a file type detection database which includes rules for the file types that you expect to encounter. It may not be possible to recognize these file types if the file content doesn't have a unique code near the beginning of the file.

Note that the -i option to emit mime types actually uses a separate "magic" numbers file to recognize file types rather than translating long descriptions to file types. It is quite possible for these two databases to be out of sync. If your application really needs to recognize these two file types I suggest that you look at the Linux source code for "file" to see how they recognize them and then code this recognition algorithm right into your app.

If you want to do the equivalent of DOS file type detection, then strip the extension off the filename (everything after the last period) and look up that string in your own table where you define the types that you need.

继续阅读：bash detection file-format

Proper way to differentiate pst and dbx files in bash shell

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？