开发者

VBA I/O Performance

I'd like to know if there is a performance difference between those two codes :

Open strFile For Output As #fNum
For var1 = 1 to UBound(strvar1)
  For var2 = 1 to UBound(strvar2)
     For var3 = 1 to UBound(strvar3)
        For var4 = 1 to UBound(strvar4)
          Print #fNum texte
        Next var4
     Next var3
   Next var2
Next var1
Close #fNum

And

For var1 = 1 to UBound(strvar1)
  For var2 = 1 to UBound(strvar2)
     For var3 = 1 to UBound(strvar3)
        For var4 = 1 to UBound(strvar4)
         开发者_开发知识库 texteTotal = texteTotal  +  texte
        Next var4
     Next var3
   Next var2
Next var1

Open strFile For Output As #fNum
    Print #fNum texteTotal 
Close #fNum

In case, the loops are pretty big ?


You'll have to try it, because it depends on the size of texte.

Each time you do texteTotal = texteTotal + texte, vba makes a fresh copy of textTotal. As textTotal gets larger and larger, your loop will slow down.

You also run the risk of creating a string larger than vba can handle.

So:

If you are writing to a network drive, and texte is a single character, the second approach will probably be better.

If you are writing to a fast local disc, and texte is 64kb, and the arrays are 1M entries each, the first approach will be better.


Since you said that texte and texteTotal are strings, I have a couple of suggestions:

1. Always concatenate strings with the & operator.

    In VBScript, there are two ways to concatenate (add together) two string variables: the & operator and the + operator. The + operator is normally used to add together two numeric values, but is retained for backwards compatibility with older versions of BASIC that did not have the & operator for strings. Because the & operator is available in VBScript, it is recommended that you always prefer using it to concatenate strings and reserve the + for adding together numeric values. This won't necessarily provide any speed increase, but it eliminates any ambiguity in your code and makes clear your original intention. This is especially important in VBScript where you're working with Variants. When you use the + operator on a string that may contain numeric values, you have no way of knowing whether it will add the two numeric values together or combine the two strings.

2. Remember that string concatenation in VBScript has tremendous overhead and is very inefficient.

    Unlike VB.NET and Java, VBScript does not have a StringBuilder class to aid in the creation of large strings. Instead, whenever you repeatedly add things to the end of a string variable, VB will repeatedly copy that variable over and over again. When you're building a string in a loop like you do in the above code, this can really degrade performance as VB constantly allocates space for the new string variable and performs a copy. With each iteration of the loop, the concatenation becomes slower and slower (in geek-speak, you're dealing with an n² algorithm, where n = the number of concatenations). The problem gets even worse if the string exceeds 64K in size. VB can store small strings in a 64K cache, but if a string becomes larger than the cache, performance drops even more. This kind of thing is of course invisible to the programmer for simplicity, but if you're concerned about optimization, it's something to understand is happening in the background.

    In light of the above information, let's revisit the two code samples that you posted. You said that `texte` is "not very big but there are hundreds [of] thousands [of] lines." That means you may easily run out of space in the 64K string cache, and eventually you may even run out of space in the RAM allocated to the script. The limit varies, but you can eventually get "Out of Memory" errors, depending on how large your string grows. Even if you're not anywhere near that point now, it's worth considering for the future. If someone goes back later to add functionality to the script, will they remember or bother to change the string concatenation algorithm? Probably not.

    To prevent any "Out of Memory" errors from cropping up, you can simply stop keeping the text string in RAM and write it directly to a file instead. This makes even more sense in your case because that's what you're eventually going to do with the string anyway! Why waste CPU cycles and space in memory by continually allocating and reallocating new string variables for each iteration of the loop when you could just write the value to the file and forget about it? I'd say that your first code sample is the simplest and preferred method to accomplish what you want.

    The only time that I would consider the second method is if you were dealing with file I/O that was extremely inefficient, such as a disk being accessed over a network connection. As long as you're writing to a local disk, you're not going to have any performance problems here. The other concern, pointed out by @astander, is that the first method leaves the file you're writing to open for a long period of time, placing a lock on that resource. While this is an important concern, I think its implication for your application is minimal, as I assume that you're creating and writing to your own private file that no other application is expected to be able to access.

    However, despite my recommendation, the first method is still not the most optimized and efficient way to concatenate strings. The best way would be a VBScript implementation of a StringBuilder class that stores the string in memory in an array of bytes as it is created. Periodically, as you add text to the string, the concatenation class would allocate more space to the array to hold the additional text. This would be much faster than the native VBScript implementation of string concatenation because it would perform these reallocations far less often. Additionally, as the string you're building in memory grows larger, your StringBuilder class could flush the contents of the string in memory to the file, and start over with an empty string. You could use something like Francesco Balena's CString class (http://www.vbcode.com/asp/showsn.asp?theID=415), or Microsoft's example (complete with some benchmarks and a further explanation) available here: http://support.microsoft.com/kb/170964.


I think the biggest difference would be the period for which you have the file open.

In the second case I would assume that it will be open for a shorter period of time, which is better as you should only ever lock resources for the smallest period of time required.


Cody Thank you very much for your time. Here are a few more information: The Code Sample #1 is the current production code. To be more precise, this is one part of the whole process from

1)getting information from a DB #1

2)calculation with mathematics formulas (matrices, vectors) , N= BIG

3)copy results to txt files ( 10 k lignes + ) each.

4)psql queries to insert to Databases

5)restitution I am wondering if the copy to txt files is really necessary and costly compared to psql insertions? I like your idea of building a custom string class, do you think it can overtake the I/O performance?


Ran into this issue recently, where I was writing large amounts of text (~100k lines) to a network file. As each Print command creates I/O activity, the process of writing the file was terribly slow. However, creating a large string by concatenating new lines to it proved to be very slow as well, as explained in the other answers.

I solved this problem by writing the individual lines to a buffer array, then joining this array into a string, which is then written to the file at once. Based on your example, it would be something like:

Dim buffer() as Variant
Dim i as Long
i = 1
ReDim buffer(1 to Ubound(strvar1) * Ubound(strvar2) * Ubound(strvar3) * Ubound(strvar4)
For var1 = 1 to UBound(strvar1) 
  For var2 = 1 to UBound(strvar2)
     For var3 = 1 to UBound(strvar3)
        For var4 = 1 to UBound(strvar4)
          buffer(i) = texte
          i = i + 1
        Next var4
     Next var3
   Next var2
Next var1
Open strFile For Output As #fNum
Print #fNum Join(buffer, vbCrLf)
Close #fNum

This prevents both the overhead of the incremental concatenations (the Join function scales linear with the amount of lines, instead of exponential as the concatenation does), and the I/O overhead of writing many lines individually to a network file.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜