开发者

Has anybody ever tried to screen scrape data from sites built with SharePoint?

Or at least could anybody point me to docs about its crazy proprietary url parameters and html field name obfuscation? I can only suppose this is caused by SharePoint...

The main problem is, given a start page built with SharePoint, I can't recreate a form post with a programmative client because:

  • field names vary开发者_开发知识库, they are appended with a some sort of id, hash, whatever (I think session.wise? Not sure)
  • tracing HTTP traffic on my side, I see the HTTP request is packed with strange parameters like __REQUESTDIGEST, __VIEWSTATE, and many others

Is this an intentional protection device put up by SharePoint? Which is the underlying architecture and which objects are involved (script callbacks, ... )?

(BTW, I'm not doing anything evil, just trying to extract public government data from a website).

Thanks.


SharePoint is nothing more than an ASP.NET Application, SharePoint completely Built on top of ASP.NET 2.0. Being said that __VIEWSTATE is nothing but a Hidden Field that has the View State Information

Coming to __REQUESTDIGEST this is an Intentional Protection, this carries some sort of securito validation which is called FormDigest

And finally to answer your Question, You will not be able to guess field and stuffs unless you have control to change the sourcecode of the application. Reason why the Name of the fields looks like obfuscated is because those controls are not handwritten but generated by the Code of ASP.NET Engine and parser, Reason field having such a name called Naming Container

One suggestion I would say is that, rather than trying to scraping the screen data, you can try alternate approaches, like each of the List in the SharePoint has the XML Feed inbuilt,try to consume it, if you have access to the site, try to retrieve the information using export to excel etc.


In addition to RSS, SharePoint also has a Web Services interface that you can use to get at and interact with data stored in SharePoint in a programatic way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜