Programming in the era of SSD
I am wondering how the oncoming SSD technology affects (mosty system) programming. Tons of questions arise, but here are some most obvious ones:
- Can the speed of disk acce开发者_如何学Css be considered anywhere near to the memory speed?
- If not, is it either just a temporary state, or there are some fundamental reasons why SSD won't ever be as fast as RAM?
- Are B-Trees (and its cousins) still relevant?
- If so, are there any adjustments or modifications of B-Trees (B+-Trees, R-Trees, etc.) made for SSD? If not, are there any other data structures crafted for SSD?
It is true that SSDs eliminate the seek time issue for reading, but writing efficiently on them is quite tricky. We have been doing some research into these issues while looking for the best way to use SSDs for the Acunu storage core.
You might find these interesting:
- Log file systems and SSDs – made for each other?
- Why theory fails for SSDs
- Current flash-based SSDs are not nearly as fast as main-memory DRAM. Will non-volatile memory technology eventually perform as well as DRAM? Someday. There's a lot of promising technologies under development.
- One bottleneck in SSD performance is the SATA interface. As the technology improves, SSDs will be connected into the DRAM or PCIe bus.
- B-trees are still relevant, as long as memory access is performed in blocks. Even DRAM is accessed in blocks, and popular blocks are cached in the CPU. Although difficult to implement, a B-tree designed to operate in DRAM can outperform other kinds of volatile search trees. The performance benefit will not likely be apparent until the tree has millions of entries in it, however.
- B-trees implemented for SSDs benefit from improvements in block allocation. Current generation flash SSDs prefer sequentially ordered writes. As the B-tree grows (or changes), new blocks should be allocated in sequential order to get the best write performance. Log-based storage formats should do well, but I've not seen any implementations that scale. As the performance gap between sequentially and randomly ordered writes narrows, allocation order will become less important.
- RAM doesn't have to remember state after reset/reboot. I highly doubt SSD will ever be as fast as RAM.
- B-Trees are still very much relevant as you still try to minimize the disk reads.
One factor comes readily to mind...
There has been a growing trend towards treating hard drives as if they are tape drives, due to the high relative cost of making heads move between widely separated tracks. This has led to efforts to optimise data access patterns so that the head can move smoothly across the surface rather than seeking randomly.
SSDs practically eliminate the seek penalty, so we can go back to not worrying so much about the layout of data on disk. (More accurately, we have a different set of worries, due to wear-levelling concerns).
While the seek times of SSDs are better than those of HDDs by an order of magnitude or two, compared to RAM, these times are still significant. This means that issues related to seek times are not as bad, but they still are there. The throughput is still much lower than in RAM. Apart from the storage technology, the connections matter. RAM is physically very close to the CPU and other components on the motherboard and uses a special bus. Mass-storage devices don't have this advantage. There exist battery-backed packages of RAM modules which can act as an ultra-fast HDD substitute but if they attach via SATA, SCSI or other typical disk interface, the still are slower than system RAM.
This means that B-trees stil are significant and for high performance you still need to take care of what is in RAM and what is in permanent storage. Due to the whole architecture and physical limitations (non-volatile writes probably always will tend to be slower than volatile ones), I think this gap may become smaller but I doubt it will be completely gone in any foreseeable future. Even if you look at "RAM", you really don't have a single speed there, but several levels of faster and faster (but smaller and more expensive) caches. So at least some differences are there to stay.
I tested build time on SSD and RamDisk, SSD was a little faster. Same result was achived by my coworker with entirely different setup - build time on HDD was 9 minutes, on RamDisk 3min 30sec, on SSD 3min 0sec.
精彩评论