Should variable declarations always be placed outside of a loop?
Is it better to declare a variable used in a loop outside of the loop rather then inside? Sometimes I see examples where a variable is declared inside the loop. Does this effectively cause the program to allocate memory for a new variable each time the loop runs? Or is .NET smart enough to know that it's really the same variable.
For example see the code below from this answer.
public static void CopyStream(Stream input, Stream output)
{
byte[] buff开发者_运维问答er = new byte[32768];
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
}
Would this modified version be any more efficent?
public static void CopyStream(Stream input, Stream output)
{
int read; //OUTSIDE LOOP
byte[] buffer = new byte[32768];
while (true)
{
read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
}
No, it wouldn't be more efficient. However, I'd rewrite it this way which happens to declare it outside the loop anyway:
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, read);
}
I'm not generally a fan of using side-effects in conditions, but effectively the Read
method is giving you two bits of data: whether or not you've reached the end of the stream, and how much you've read. The while loop is now saying, "While we've managed to read some data... copy it."
It's a little bit like using int.TryParse
:
if (int.TryParse(text, out value))
{
// Use value
}
Again you're using a side-effect of calling the method in the condition. As I say, I don't make a habit out of doing this except for this particular pattern, when you're dealing with a method returning two bits of data.
The same thing comes up reading lines from a TextReader
:
string line;
while ((line = reader.ReadLine()) != null)
{
...
}
To go back to your original question: if a variable is going to be initialized in every iteration of a loop and it's only used within the body of the loop, I'd almost always declare it within the loop. One minor exception here is if the variable is being captured by an anonymous function - at that point it will make a difference in behaviour, and I'd pick whichever form gave me the desired behaviour... but that's almost always the "declare inside" form anyway.
EDIT: When it comes to scoping, the code above does indeed leave the variable in a larger scope than it needs to be... but I believe it makes the loop clearer. You can always address this by introducing a new scope if you care to:
{
int read;
while (...)
{
}
}
In the unlikely environment that doesn't help you with this, it would still be a micro-optimization. Factors like clarity and proper scoping is much more important than the edge case where this might just make next to no difference.
You should give your variables proper scope without thinking about performance. Of course, complex initializations are a different beast, so if something should only be initialized once but is only used within a loop, you'd still want to declare it outside.
I am going to agree with most of these other answers with a caveat.
If you are using lambada expressions you must be careful with capturing variables.
static void Main(string[] args)
{
var a = Enumerable.Range(1, 3);
var b = a.GetEnumerator();
int x;
while(b.MoveNext())
{
x = b.Current;
Task.Factory.StartNew(() => Console.WriteLine(x));
}
Console.ReadLine();
}
will give the result
3
3
3
Where
static void Main(string[] args)
{
var a = Enumerable.Range(1, 3);
var b = a.GetEnumerator();
while(b.MoveNext())
{
int x = b.Current;
Task.Factory.StartNew(() => Console.WriteLine(x));
}
Console.ReadLine();
}
will give the result
1
2
3
or some order there of. This is because when the task finally starts it will check the current value of it's reference to x. in the first example all 3 loops pointed at the same reference, where in the second example they all pointed at different references.
As is the case with lots of simple optimizations like this, the compiler takes care of it for you. If you try both of these and look at the assemblies' IL in ildasm you can see that they both declare a single int32 read variable, although it does reorder the declarations:
.locals init ([0] int32 read,
[1] uint8[] buffer,
[2] bool CS$4$0000)
.locals init ([0] uint8[] buffer,
[1] int32 read,
[2] bool CS$4$0000)
I've generally preferred the latter as a matter of personal habit because, even if .NET is smart enough, other environments in which I might work later may not be smart enough. It could be nothing more than compiling down to an extra line of code inside the loop to re-initialize the variable, but it's still overhead.
Even if they're identical for all measurable purposes in any given example, I would say the latter has less of a chance of causing problems in the long run.
It really doesn't matter, and if I was reviewing the code for that particular example, I wouldn't care either way.
However, be aware that the two can mean very different things if you end up capturing the 'read' variable in a closure.
See this excellent post from Eric Lippert where this issue comes up regarding foreach loops - http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
精彩评论