开发者

Extract hyperlinks from .doc

Is there any way 开发者_如何学编程to extract hyperlinks from .doc. I got bunch of hyperlinks in doc that I need to import in my database.

I have tried converting doc to HTML, but hyperlinks are not transferred.

Regardz, Mladen


We had a similar issue and ended up using a third party component called Aspose.Words. You can find it here: http://www.aspose.com

It's available for .NET and Java.


You could try importing the file into OpenOffice and see whether hyperlinks are transferred. OpenDocument is just a ZIP file with XML inside, very easy to parse once you've got the hang of it.


I have done the following thing. I have opened the .doc file with officeXP, then published it as a blog and after that I have saved that blog in the form of filtered web page. That gives you nice HTML which you can parse with ease.


I realise this is some months after your initial question, however, You can also extract hyperlinks in a .doc file through through Word Automation. There are hyperlink objects in the API that you can easily extract.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜