Extracting html tables from website

2023-03-04 16:07 问答作者：

I am trying to use XML, RCurl package to read some html tables of the following URL http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#

Here is the code I am using

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#"
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
## Required tables 
tmp[[13]]
tmp[[14]]

If you look at t开发者_开发百科he tables it has not been able to parse the values from the webpage. I guess this due to some javascipt evaluation happening on the fly. Now if I use "save page as" option in google chrome(it does not work in mozilla) and save the page and then use the above code i am able to read in the values.

But is there a work around so that I can read the table of the fly ? It will be great if you can help.

Regards,

Looks like they're building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing out some string. Maybe you could grab that data and parse it out instead of scraping the page itself.

Looks like you'll have to build a request with the proper referrer headers using cURL, though. As you can see, you can't just hit that ajaxGetQuote page with a bare request.

You can probably read the appropriate headers to put in by using the Web Inspector in Chrome or Safari, or by using Firebug in Firefox.

继续阅读：rcurl web-scraping

Extracting html tables from website

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？