SAS memory usage and sorting
I'm curious about SAS's use of memory, sorting, and why it seems to be so inefficient.
I have a quad core xeon with 8GB ram. I have a 3GB dataset. Why, at any given ti开发者_运维技巧me during a standard proc sort, is a mere 120MB of ram being used and a meager 15-20% CPU utilization? This seems like something horribly inefficient is going on with the procedure.
In my opinion, as I have the available memory, it would load the entire dataset and then proceed to obliterate all available CPU cycles. But only 15%? It's a stunning waste of available resources and bothers me. It seems like it's constantly going back and forth to the disk which is painfully slow.
Is there some magical setting that says "SAS, you can utilize everything to go faster" I'm missing?
64bit OS running 64bit SAS, btw.
You might check your MEMSIZE and SORTSIZE settings. More discussion about sort performance is here.
The thing with sort is that it's not the sorting that takes the time, generally it's the reading the data set in and writing it out again. Sorting is, comparatively, quick. So with a 3GB data set significant time is taken just waiting for the disk to supply all of the data. It can overlap sorting parts of the data with reading more of it in, but it's still likely to be I/O bound. That said, MEMSIZE and SORTSIZE will at least allow you to make maximum use of your available memory. You need to ensure that SAS will be reading the entire data set in and sorting it in one go and then writing it out again. With lower memory, or if MEMSIZE/SORTSIZE are not suitably configured, it will sort the data set in chunks and then have to merge those chunks. You really want to avoid "multi-pass sort" if at all possible as it will double the time it takes (has to go through the whole data set sorting chunks, then to through all the data again, merging those chunks). I think you get hints from the SASLOG as to whether it is multi-pass sorting or not.
In general, that's not how SAS works. SAS keeps your data on your disk drives and only reads a small portion of it at a time. To, me that's the advantage of SAS: I use SAS for stuff that can't fit in RAM.
You might be interested in Stata, R, or another package that keeps your data in RAM. It's pretty easy to move back & forth between the programs, even for the same project.
精彩评论