Why is printf in F# so slow?
I've just been really surprised by how slow printf from F# is. I have a number of C# programs that process large data files and write out a number of CSV files. I originally started by using fprintf writer "%s,%d,%f,%f,%f,%s"
thinking that that would be simple and reasonably efficient.
However after a while I was getting a bit fed up with waiting for the files to process. (I've got 4gb XML files to go through and write out entries from them.).
When I ran my applications through a profiler, I was amazed to see printf as being one of the really slow methods.
I changed the code to not use printf and now performance is so much better. Printf performance was killing my overall application performance.
To give an example, my original code is:
fprintf sectorWriter "\"%s\",%f,%f,%d,%d,\"%s\",\"%s\",\"%s\",%d,%d,%d,%d,\"%s\",%d,%d,%d,%d,%s,%d"
sector.Label sector.Longitude sector.Latitude sector.RNCId sector.CellId
siteName sector.Switch sector.Technology (int sector.Azimuth) sector.PrimaryScramblingCode
(int sector.FrequencyBand) (int sector.Height) sector.PatternName (int sector.Beamwidth)
(int sector.ElectricalTilt) (int sector.MechanicalTilt) (int (sector.ElectricalTilt + sector.MechanicalTilt))
sector.SectorType (int sector.Radius)
And I've changed it to be the following
seq {
yield sector.Label; yield string sector.Longitude; yield string sector.Latitude; yield string sector.RNCId; yield string sector.CellId;
yield siteName; yield sector.Switch; yield sector.Technology; yield string (int sector.Azimuth); yield string sector.PrimaryScramblingCode;
yield string (i开发者_JAVA百科nt sector.FrequencyBand); yield string (int sector.Height); yield sector.PatternName; yield string (int sector.Beamwidth);
yield string (int sector.ElectricalTilt); yield string (int sector.MechanicalTilt);
yield string (int (sector.ElectricalTilt + sector.MechanicalTilt));
yield sector.SectorType; yield string (int sector.Radius)
}
|> writeCSV sectorWriter
Helper functions
let writeDelimited delimiter (writer:TextWriter) (values:seq<string>) =
values
|> Seq.fold (fun (s:string) v -> if s.Length = 0 then v else s + delimiter + v) ""
|> writer.WriteLine
let writeCSV (writer:TextWriter) (values:seq<string>) = writeDelimited "," writer values
I'm writing out files with about 30,000 rows. Nothing special.
I am not sure how much it matters, but...
Inspecting the code for printf:
https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/printf.fs
I see
// The general technique used this file is to interpret
// a format string and use reflection to construct a function value that matches
// the specification of the format string.
and I think the word 'reflection' probably answers the question.
printf
is great for writing simple type-safe output, but if you want good perf in an inner loop, you might want to use a lower-level .NET API to write output. I haven't done my own benchmarking to see.
TextWriter
already buffers its output. I recommend using Write
to output each value, one at a time, instead of formatting an entire line and passing it to WriteLine
. On my laptop, writing 100,000 lines takes nearly a minute using your function, while, using the following function, it runs in half a second.
let writeRow (writer:TextWriter) siteName (sector:Sector) =
let inline write (value:'a) (delim:char) =
writer.Write(value)
writer.Write(delim)
let inline quote s = "\"" + s + "\""
write (quote sector.Label) ','
write sector.Longitude ','
write sector.Latitude ','
write sector.RNCId ','
write sector.CellId ','
write (quote siteName) ','
write (quote sector.Switch) ','
write (quote sector.Technology) ','
write (int sector.Azimuth) ','
write sector.PrimaryScramblingCode ','
write (int sector.FrequencyBand) ','
write (int sector.Height) ','
write (quote sector.PatternName) ','
write (int sector.Beamwidth) ','
write (int sector.ElectricalTilt) ','
write (int sector.MechanicalTilt) ','
write (int (sector.ElectricalTilt + sector.MechanicalTilt)) ','
write sector.SectorType ','
write (int sector.Radius) '\n'
Now that F# 3.1 has been preview released, the performance of printf
is claimed to have increased by 40x. You might want to have a look at this:
F# 3.1 Compiler/Library Additions
Printf performance
The F# 3.1 core library sees improved performance of the printf family of functions for type-safe formatting. For example, printing using the following format string now runs up to 40x faster (though your exact mileage may vary):
sprintf "%d: %d, %x %X %d %d %s"
No changes in your code are needed to take advantage of this improved performance, though you do need to be using the F# 3.1 FSharp.Core.dll runtime component.
EDIT: This answer is only valid for simple format strings, like "%s" or "%d". See comments below.
It is also interesting to note that if you can make a curried function and reuse that, the reflection will only be carried out once. Sample:
let w = new System.IO.StringWriter() :> System.IO.TextWriter
let printer = fprintf w "%d"
let printer2 d = fprintf w "%d" d
let print1() =
for i = 1 to 100000 do
printer 2
let print2() =
for i = 1 to 100000 do
printer2 2
let time f =
let sw = System.Diagnostics.Stopwatch()
sw.Start()
f()
printfn "%s" (sw.ElapsedMilliseconds.ToString())
time print1
time print2
print1 takes 48 ms on my machine while print2 takes 1158 ms.
精彩评论