Writing text files - performance wise?
We are about to start a new project which invovles, at end of process, writing some 5,000 files in various sizes. All files are regular 开发者_Go百科text files and i wonder what is the best way (if someone has experience) to write them.
i was thinking of using file templates (pre-loaded to memory) or direct file streams.
I wonder if someone has experience and can share it with me. thanks
I would suggest to write a prototype to check in advance if you can meet the performance requirements in the way you would like to realize the project. But don't forget that harddisks are sometimes hard to evaluate (although their name probably doesn't come from this fact :-) ): They have caches and their performance might differ heavily on background processes, fragmentation, filesystem etc.
A rule of thumb is to reduce the number of file writes. Usually it is the fastest if you first write everything to a memory buffer and then write this buffer to the disk. (A very bad way would be to write char by char.)
Depending on the filesystem it might also be faster to write one big file instead of many small ones, so maybe creating a ZIP archive might be an alternative.
On windows there is the MultiMediaFile IO API (native), which can be faster than standard I/O mechanisms (http://home.roadrunner.com/~jgglatt/tech/mmio.htm) in several cases, even if your content is not "Multimedia".
The curios thing is that "best way" know only you.
For example, writing a big file with small chunks can be affordable solution, as you do not consume too much memory and execute your operation in "slow writing" way. Bad: long IO operations, Good: low memory
Or collect data in big chunks of data and execute single atomic writes. Bad: much bigger memory you need. Good:commonly suggested choice Open/Read-Write/Close in shortest time possible,
Or use of MemoryMappedFiles: So use constants (usually) pointer to some file and laveraging between affordable performance and low memory consumption. Usually very good, if not only possible, choice for very big files, like multimedia files processing.
Choice is up to you.
Like stuff to read on deep performance analysis I would suggest amazing source like Rico Mariani blog
If you use the standard .Net libraries and do something like this (in a try catch block):
using (StreamWriter writer = new StreamWriter("filenumber1.txt"))
{
writer.Write("This is a test"); //Write no newline
writer.WriteLine("This is a test"); //Write with newline
}
Performance should be reasonable. When writing to the file, just keep the strings at a decent size (read and write and chunks if you have to) to avoid memory issues. For example, if the data that makes up the file is 10 gig, writing the strings in chunks would be necessary.
I once had to read 1000s of blobs in a database and push them out to distribution servers on the file system. My initial approach was a single read and write. That was OK, then I used multi-threaded approach and got a decent performance gain.
I would do a single operation approach first and do some performance runs. If it takes X amount of time and everyone is happy, done. If you need to make it Y, implement the multiple thread approach.
Just a note, I would make the number of threads configurable so the performance can be dailed in. Too many threads and it slows down. Need to find the sweet spot so make it configurable. This usually depends on the hardware.
With that much of writing to disk I would look more at the disks layout (raid etc) because saving a few cycles of CPU may not be as helpful as having a faster disk subsystem.
精彩评论