How to check if a file contains plain text?
I have a folder full of files and I want to search some string inside them. The issue is that some files may be zip, exe, ogg, etc. Can I check somehow what kind of file is it so I on开发者_运维知识库ly open and search through txt, PHP, etc. files. I can't rely on the file extension.
Use Python's mimetypes
library:
import mimetypes
if mimetypes.guess_type('full path to document here')[0] == 'text/plain':
# file is plaintext
You can use the Python interface to libmagic to identify file formats.
>>> import magic
>>> f = magic.Magic(mime=True)
>>> f.from_file('testdata/test.txt')
'text/plain'
For more examples, see the repo.
try something like this :
def is_binay_file(filepathname):
textchars = bytearray([7,8,9,10,12,13,27]) + bytearray(range(0x20, 0x7f)) + bytearray(range(0x80, 0x100))
is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
if is_binary_string(open(filepathname, 'rb').read(1024)):
return True
else:
return False
use the method like this :
is_binay_file('<your file path name>')
This will return True if file is of binary type and False if it is of text - it should be easy to convert this to reflect your needs, fx. make a function is_text_file
- I leave that up to you
If you're on linux you can parse the output of the file
command-line tool.
精彩评论