开发者

Use cURL or wget with http POST to reach search results after the first page

EDIT: I've got a much more specific idea of what I'm looking for now so I'm re-writing the whole question.

My overall goal is to get to the search results after the first page (from within a script) on the webpage http://www.ncbi.nlm.nih.gov/images. Using the Firefox extension "Tamper Data", I have inspected the requests sent by my browser and found that I am able to modify the http POST request to get to any page of the result开发者_运维问答s.

Now I would like to do this within a script. I've tried both

wget --post-data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images

and

cURL --data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images

and I've tried making the initial request to http://www.ncbi.nlm.nih.gov/images?term=INSERTSEARCHTERMHERE and saving a cookie, then loading the cookie the next time I request, this time with POST data indicating page number. It doesn't work. Anytime I request to the first URL I get the home page for image search or I get a page titled "Images - Error encountered" with no search results. If I request to the second URL (replacing INSERTSEARCHTERMHERE with my actual search term) I always get the first page of the results, even though I sent POST data including a variable asking for a different page. It seems there are two - maybe three? - variables denoting page number:

EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14

and in Tamper Data this is always the current page (the one I was on when I made the request for a new page):

EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1

(Yes, there are two variables in the POST data with the same name - I don't know what that is about...??)

So how can I use cURL or wget within a script to get to all of the pages of the search results? Thanks for your help! (and thanks to the commenters for helping me clarify the question!)

Additional info: There are a ton of POST fields, and I am sending all of them. I copied this out of what Tamper Data recorded:

EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.SearchResourceList=images&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.Term=drug&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.CurrDb=images&EntrezSystem2.PEntrez.ImagesDb.Entrez_PageController.PreviousPageName=results&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.FileFormat=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Presentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.PageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Format=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastFormat=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.ResultCount=38231&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.RunLastQuery=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation2=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize2=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_MultiItemSupl.Discovery_SearchDetails.SearchDetailsTerm=drug%5BAll+Fields%5D&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.HistoryDisplay.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.Db=images&EntrezSystem2.PEntrez.DbConnector.LastDb=images&EntrezSystem2.PEntrez.DbConnector.Term=drug&EntrezSystem2.PEntrez.DbConnector.LastTabCmd=&EntrezSystem2.PEntrez.DbConnector.LastQueryKey=1&EntrezSystem2.PEntrez.DbConnector.IdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LastIdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LinkName=&EntrezSystem2.PEntrez.DbConnector.LinkReadableName=&EntrezSystem2.PEntrez.DbConnector.LinkSrcDb=&EntrezSystem2.PEntrez.DbConnector.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.TabCmd=&EntrezSystem2.PEntrez.DbConnector.QueryKey=&p%24a=EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage&p%24l=EntrezSystem2&p%24st=images

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜