c# Time taken to load file
I am finding that "loading" a file into memeory can take very different amounts of time - even if my machine appears not to be doing much else. I have attached some code to illustrate the issue:
The output is below.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Runtime.InteropServices;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
LoadFileUnman();
LoadFileUnman();
LoadFileUnman();
LoadFileUnman();
LoadFileUnman();
Console.WriteLine("Done");
}
public unsafe bool LoadFileUnman()
{
string filename = @"C:\DataFile.BNF";
var fileStream = new FileStream(filename,
FileMode.Open,
FileAccess.Read,
FileShare.Read,
16 * 1024,
FileOptions.SequentialScan);
if (fileStream == null)
{
Console.WriteLine( "Could not open file");
return true;
}
Int64 length = fileStream.Length;
Console.WriteLine("File length: " + length.ToString("#,###"));
UnmanagedMemoryStream GlobalMS;
IntPtr GlobalBuffer;
try
{
IntPtr 开发者_JS百科myp = new IntPtr(length);
GlobalBuffer = Marshal.AllocHGlobal(myp);
}
catch (Exception er)
{
Console.WriteLine("Could not allocate memory: " + er.Message);
return true;
}
unsafe
{
byte* pBytes = (byte*)GlobalBuffer.ToPointer();
GlobalMS = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.ReadWrite);
DateTime befDT = DateTime.Now;
fileStream.CopyTo(GlobalMS);
Console.WriteLine("Load took: " + DateTime.Now.Subtract(befDT).TotalMilliseconds.ToString("#,###") + "ms");
GlobalMS.Seek(0, SeekOrigin.Begin);
}
GlobalMS.Close();
fileStream.Close();
return false;
}
}
}
Here is the output, the timings differ even more when I use bigger files (10G). Then sometimes it's a few seconds to load or even a minute.
File length: 178,782,404
Load took: 5,125ms
File length: 178,782,404
Load took: 156ms
File length: 178,782,404
Load took: 172ms
File length: 178,782,404
Load took: 141ms
File length: 178,782,404
Load took: 1,891ms
Can anyone tell me why it is so variable, and if there is anything I could do.
EDIT 1
From the comments I have had - it seems a good idea for me to highlight that what I need is a way to fix the variability of the load NOT the overall speed. I can increase the speed by optimising in vaious ways (and I have) but it is the difference in consequetive load times that is the issue.
EDIT 2
Here are services that I am running. I would be grateful if anyone noticed any that might cause me problems.
It depends on many factors, such as what else your PC is doing at the time, the fragmentation of the disk, whether memory is (almost) full, etc.
There really isn't much you can do except optimize your environment:
- Get fast hard disks.
- Optimize the hard disks regularly (i.e., defragment).
- Reduce the load on the PC -- remove any unneeded software, services.
- Increase memory if your footprint gets above 75%.
If the files you read are copies, then you can read them from a RAM disk -- so you may have a background process that copies the files into a RAM disk, and then your program can read them from there. That is also significantly faster than reading from disk.
See also http://www.softperfect.com/products/ramdisk/ for RAM disk software.
EDIT: From your image I notice the following, which may impact performance (note this list is non-exhaustive, so there may be other services that I didn't notice that cause delays):
- Google Software Updater - Not sure, but it may cause delays
- Goto My PC - Are you sure nobody is logging into the machine and doing stuff that slows down your PC?
- LiveShare P2P Server - Again, if there are people connecting to your PC to download stuff, that would cause perf variability
- SQL Server Express - If it's being queried, causes serious variability.
Things to consider:
- Disk caching. Windows will use much of the available memory to cache files you have read. This gives you an initial penalty hit, then high speed. Anything else loaded may eject your file from memory. Memory allocation may eject your files. (So when you have allocated enough memory it will drop the cached file.)
- To put your data in memory Windows needs to free up the memory. This will take time as (in the case of a 10GB file with less RAM) it may have to allocate the space on disk.
- When you free up memory Windows has to clear it so it is ready for reuse. In the case of a big file this is done to disk.
- Windows will buffer write operations. Freeing up a lot of memory would queue a lot of wiping. This is not done immediately iirc.
- Other things going on on the disk will affect the result a LOT when you are talking about milliseconds ... A seek alone eats a handfull of ms, so any small write operation while you are testing smallscale will affect the outcome (the test simply isn't valid in its current form).
- Various "normal" factors like disk fragmentation.
It would be interesting to see the results if you ran that more than 5 times.
Some additional info:
A IO-bound process waiting for disk will be boosted in priority so it is able to handle the data immediately. Most OS do this as part of their scheduler architecture. This means that usually a moderately busy system should not have a big impact on the process running ... unless they share some slow device. Disk is a slow device, but its easy to forget that memory is a relatively slow device too and should be shared with care.
For paralellism (assuming you are writing server software): My MSSQL-server has DB/log spread over effectively 28 disks and the server contains several cards with several CPU's all with separate bus access to separate memory, plus some cross connections. MSSQL utilizes this to allocate parts of the DB to memory corresponding to the closest CPU. Searches are done in parallel on all CPUs+their closes memory (see NUMA. My point being that there is hardware designed specifically to boost similar scenarios.
The first time you instantiate the buffer, the OS searches for free memory. For a 10G file it's clear that the space must be found on disk, thus the huge delay. Once you redo the task again, the memory is still available before it is reclaimed.
Probably you may verify this by placing a GC.Collect() after each LoadFileUnman(), within the button handler.
Check out http://social.technet.microsoft.com/Forums/en/winservergen/thread/09c80df1-4bd4-4400-bcaf-cec892a0626a
The windows system is doing things behind the scenes, which makes it 'impossible' to control or test what is really happening. The windows system is having its own layer of buffering on top of everything else. A filestream flush does not flush the data to disk, but rather to the win system that does what it wants and when it wants.
See the resource monitor that can be started from the task manager, then you might see a system process reading and writing to the same file as your application is.
-All I want is the best sequential read and write speeds of large files, but thanks to smart system like this along with 'excellent' ms documentation I am really stuck. Guess I'll do same as everyone else, -whatever works... Sad thing
精彩评论