How to read a specific number from a HTML page [closed]
for example , if I wanted to set the Index value from this page: http://ca.finance.yahoo.com/q;_ylt=Agfc5O8HHTlOLgX.q6V4HEtyzJpG;_ylu=X3oDMTFkdnZqMHBkBHBvcwMyBHNlYwN5ZmlNYXJrZXRTdW1tYXJ5RnJvbnRwYWdlBHNsawNzcHRzeA--?s=^GSPTSE
to a variable, how can I do that?开发者_Python百科?
I am VERY NEW to programming, I would really appreciate if you explained every line. My point isnt to get it done, I want to understand it.
Thank you very much in advance!
If you look at the source code of the web page, you find that the index number is within a span tag which has a unique id: <span id="yfs_l10_^gsptse">13,702.33</span>
.
This means that you can scrape the page and then single out that individual tag.
You need to start by connecting to the host and downloading the page. The way in which you do this depends on which language you're using. There are plenty of tutorials around - just search for "[language] web scraping".
Then you need to create a Document Object Model from the html source code - again, this depends on the language, it's easy in some and difficult in others. Once you've done that, simply search for the tag with an id of yfs_l10_^gsptse
and grab the content.
Hope that helps - obviously there's a lot I haven't said, but it depends what language you want to use.
精彩评论