need to search for social security number in thousands of documents (.doc,.docx,.pdf) in C#
Which is the best way to access the documents (opening and reading only text) so that searching is faster. I have already tried using Microsoft office word object to open and get the text by creating a word application and opening the files. I cant even go w开发者_开发问答ith threading because either i need to create only one word application which wont help me in threading and if i create word application in each thread the system cant handle it. How do you suggest me to go.
Thanks in advance
Ah... go back to reading the documentation of your operating system. FOr quite some time (i.e. many many years) there is an indexing and search system there that actually a lot of things can hook in (if you install the proper filters, downloadable from microsoft, adobe etc.).
This creates a full text index that then has an API to search. A LOT more efficient for repeatedly searching a large number of documents.
精彩评论