real time stock quotes, StreamReader performance optimization

2022-12-26 19:22 问答作者：

I am working on a program that extracts real time quote for 900+ stocks from a website. I use HttpWebRequest to send HTTP request to the site and store the response to a stream and open a stream using the following code:

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream stream = response.GetResponseStream ();
StreamReader reader = new  StreamReader( stream )

the size of the received HTML is large (5000+ lines), so it takes a long time to parse it and extract the price. For 900 files, It takes about 6 mins for parsing and extracting. Which my boss isn't happy with, he told me he'd want the whole process to be done in TWO mins.

I've identified the part开发者_StackOverflow社区 of the program that takes most of time to finish is parsing and extracting. I've tried to optimize the code to make it faster, the following is what I have now after some optimization:

// skip lines at the top
for(int i=0;i<1500;++i) 
  reader.ReadLine();

// read the line that contains the price 
string theLine = reader.ReadLine();  

// ... extract the price from the line

now it takes about 4 mins to process all the files, there is still a significant gap to what my boss's expecting. So I am wondering, is there other way that I can further speed up the parsing and extracting and have everything done within 2 mins?

I was doing HTML screen scraping for a while with stock quotes but I found that Yahoo offers a great simple web service that is much better that loading websites.

http://www.gummy-stuff.org/Yahoo-data.htm

With this service you can request up to 100 stock quotes in a single request and it returns a csv formatted response with one line for every symbol. You can set what columns you want returned in the query string of the request. I built a small program that would query the service once a day for every stock in the stock market to get prices. It seemed to work well for me and was way faster than hitting websites for the data.

An example querystring would be http://finance.yahoo.com/d/quotes.csv?s=GE&f=nkqwxyr1l9t5p4

Which returns text of

"GENERAL ELEC CO",32.98,"Jun 26","21.30 - 32.98","NYSE",2.66,"Jul 25",28.55,"Jul 3","-0.21%"

for(int i=0;i<1500;++i) 
  reader.ReadLine();

this particulary is not good. ReadLine reads all line and stores it somewhere, but no one uses it. Extra work for GC. Read byte-by-byte and catch \D \A.

Then don't use StreamReader at all! It is fat overhead, read from stream.

Hard to see how this is possible, StreamReader is blindingly fast compared to HttpWebRequest. Some basic assumptions: say you are downloading 900 files with 5000 lines, 100 chars each in 6 minutes. That means you need to download 900 x 5000 x 100 = 450 Megabytes. In 6 minutes, that requires a bandwidth of 450E6 / 6 / 60 * 8 = 10 Mbps.

What do you have? 10 Mbps is about typical for high-speed Internet service, although you need a server that can sustain this. To get it down to 2 seconds, you'll need to upgrade your service to 30 Mbps. Your boss can fix that.

About the speed improvement you saw: watch out for the cache.

If you really need to have real-time data fast then you should subscribe to the data feeds rather than scrape them off a site.

Alternatively, isn't there some token that you can search for to find the field/data pair(s) you need.

4 minutes sounds ridiculously long for reading in 900 files.

real time stock quotes, StreamReader performance optimization

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？