LoadIFilter() fails on all PDFs (but MS's filtdump.exe doesn't.)
I'm trying to write a C# utility that mimics the behavior of filtdump.exe
from the Windows Search SDK (since filtdump
doesn't appear to be redistributable itself.) I'm running into a combination of contradictory and/or non-existent documentation and technical problems I can't seem to track down. I'm hoping someone can help eliminate one or the other of those hurdles...
According to MSDN, filtdump
uses ILoadFilter::LoadIFilter
to load it's IFilter. I contend that MSDN is lying, since it also claims ILoadFilter::LoadIFilter
only exists on Windows 7, but filtdump
works fine on earler OS's. Process Monitor indicates that it's actually calling LoadIFilter()
from query.dll
, so that's what I'm doing:
public static class NativeMethods
{
// From Windows SDK v7.1, NTQuery.h
[DllImport("query.dll", CharSet = CharSet.Unicode)]
public static extern int LoadIFilter(
string pwcsPath,
[MarshalAs(UnmanagedType.IUnknown)]
ref object pUnkOuter,
ref IFilter ppIUnk);
}
object iUnknown = null;
IFilter filter = null;
var result = NativeMethods.LoadIFilter(args[0], ref iUnknown, ref filter);
if (result != ResultCodes.S_OK)
{
Console.WriteLine("Failed to load an IFilter for {0}: {1}", args[0], result);
return;
}
For the most part, this application and filtdump
give me the same results -- they can both open and extract text from text, Word document, and Outlook开发者_StackOverflow社区 emails, and both fail on the same set of other documents that have no IFilter. However, PDFs are giving me a problem. Filtdump
manages to open and extract the text from most of the PDFs I've thrown at it, but every single one of the PDFs I try with my own application gives me an HRESULT of 0x80004005, E_FAIL.
This is the same error from this question but I'm getting it on every PDF, and filtdump
is not, so I know that the IFilter is working on at least some documents. Has anyone done this kind of thing before with PDFs that can see what I'm doing wrong?
You may want to see this blog post. In short, v10 of Adobe's PDF filter uses a whitelist of applications allowed to use the filter, including Microsoft's diagnostic tools like filtdump.exe
, supposedly as a “security measure”.
Load IFilter fails because Adove PDF Filter is marked as STA and our c sharp application are by default MTA so that is why it can not load PDF Filter. Try to make your application STA then load PDF Filter.
Ajax
I also expect filtdump is using the old Win32 LoadIFilter call which was available from Windows 2000.
I've seen the same problem as you solved by running the calling process in a job. https://stackoverflow.com/a/8841476/1111659.
I also got a similar problem with Reader 10.1.5 installed although the Win32 LoadIFilter() returned E_NOTIMPL not E_FAIL.
Seems like Adobe broke the standard Win32 LoadIFilter() call by removing the ability to load the content into the IFilter via the IStorage interface's Load method but the object still returns that interface as available via QI.
For that problem on Windows 7 and later you can create the FilterRegistration object which implements the ILoadFilter and then call ILoadFilter::LoadIFilter() to create the filter COM object. Then get the IPersistStream and call Load() on that with an IStream containing the file content.
For older versions you need to search for the Filter CLSID in the registry first or statically set the Adobe CLSID as a config value if you want to make it constant.
精彩评论