Extract hyperlinks from .doc
Is there any way 开发者_如何学编程to extract hyperlinks from .doc. I got bunch of hyperlinks in doc that I need to import in my database.
I have tried converting doc to HTML, but hyperlinks are not transferred.
Regardz, Mladen
We had a similar issue and ended up using a third party component called Aspose.Words. You can find it here: http://www.aspose.com
It's available for .NET and Java.
You could try importing the file into OpenOffice and see whether hyperlinks are transferred. OpenDocument is just a ZIP file with XML inside, very easy to parse once you've got the hang of it.
I have done the following thing. I have opened the .doc file with officeXP, then published it as a blog and after that I have saved that blog in the form of filtered web page. That gives you nice HTML which you can parse with ease.
I realise this is some months after your initial question, however, You can also extract hyperlinks in a .doc file through through Word Automation. There are hyperlink objects in the API that you can easily extract.
精彩评论