How can I compare the content of two files of different types?
I've some documents in MHT开发者_如何学运维ML format and in pdf format. I want to know whether the content is same or not in MHTML and PDF. How can i compare the difference?
You will need an MHTML parser as well as a PDF parser library. Then you traverse both documents in parallell and compare the contents. Not that this is definitely non-trivial to do as you will have to build a mapping system between elements in the different file formats.
If you want to take into account that content can be written in different ways (e.g. tables vs. tabs) and still look exactly the same to the user things get very complicated quickly.
My gut feeling from the way you are asking your questions is that this project is way larger and more complex than you are ready for.
精彩评论