Fast and simple binary concatenate files in Powershell
What's the best way of concatenating binary files using Powershell? I'd prefer a one-liner that simple to remember and fast to execute.
The best I've come up with is:
gc -Encoding Byte -Path ".\File1.bin",".\File2.bin" | sc -Encoding Byte new.bin
This seems to work ok, but i开发者_如何学Cs terribly slow with large files.
The approach you're taking is the way I would do it in PowerShell. However you should use the -ReadCount parameter to improve perf. You can also take advantage of positional parameters to shorten this even further:
gc File1.bin,File2.bin -Encoding Byte -Read 512 | sc new.bin -Encoding Byte
Editor's note: In the cross-platform PowerShell (Core) edition (version 6 and up), -AsByteStream
must now be used instead of -Encoding Byte
; also, the sc
alias for the Set-Content
cmdlet has been removed.
Regarding the use of the -ReadCount parameter, I did a blog post on this a while ago that folks might find useful - Optimizing Performance of Get Content for Large Files.
It's not Powershell, but if you have Powershell you also have the command prompt:
copy /b 1.bin+2.bin 3.bin
As Keith Hill pointed out, if you really need to run it from inside Powershell, you can use:
cmd /c copy /b 1.bin+2.bin 3.bin
I had a similar problem recently, where I wanted to append two large (2GB) files into a single file (4GB).
I tried to adjust the -ReadCount parameter for Get-Content, however I couldn't get it to improve my performance for the large files.
I went with the following solution:
function Join-File (
[parameter(Position=0,Mandatory=$true,ValueFromPipeline=$true)]
[string[]] $Path,
[parameter(Position=1,Mandatory=$true)]
[string] $Destination
)
{
write-verbose "Join-File: Open Destination1 $Destination"
$OutFile = [System.IO.File]::Create($Destination)
foreach ( $File in $Path ) {
write-verbose " Join-File: Open Source $File"
$InFile = [System.IO.File]::OpenRead($File)
$InFile.CopyTo($OutFile)
$InFile.Dispose()
}
$OutFile.Dispose()
write-verbose "Join-File: finished"
}
Performance:
cmd.exe /c copy file1+file2 File3
around 5 seconds (Best)gc file1,file2 |sc file3
around 1100 seconds (yuck)join-file File1,File2 File3
around 16 seconds (OK)
Performance is very much dependent on the buffer size used. Those are fairly small by default. Concatenating 2x2GB files I'd take a buffersize of about 256kb. Going larger might sometimes fail, smaller and you'll get less throughput than your drive is capable of.
With gc
that'd be with -ReadCount
not simply -Read
(PowerShell 5.0):
gc -ReadCount 256KB -Path $infile -Encoding Byte | ...
Plus I found Add-Content
to be better and going file-by-file for a lot of small files, because piping only a moderate amount of data (200MB) I found my computer going oom, PowerShell freezing and CPU at full.
Although Add-Content
randomly fails a few times for a few hundred files with an error about the destination file being in use, so I added a while loop and a try catch:
# Empty the file first
sc -Path "$path\video.ts" -Value @() -Encoding Byte
$tsfiles | foreach {
while ($true) {
try { # I had -ReadCount 0 because the files are smaller than 256KB
gc -ReadCount 0 -Path "$path\$_" -Encoding Byte | `
Add-Content -Path "$path\video.ts" -Encoding Byte -ErrorAction Stop
break;
} catch {
}
}
}
Using a file stream is much faster still. You cannot specify a buffer size with [System.IO.File]::Open
but you can with new [System.IO.FileStream]
like so:
# $path = "C:\"
$ins = @("a.ts", "b.ts")
$outfile = "$path\out.mp4"
$out = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
$outfile,
[System.IO.FileMode]::Create,
[System.IO.FileAccess]::Write,
[System.IO.FileShare]::None,
256KB,
[System.IO.FileOptions]::None)
try {
foreach ($in in $ins) {
$fs = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
"$path\$in",
[System.IO.FileMode]::Open,
[System.IO.FileAccess]::Read,
[System.IO.FileShare]::Read,
256KB,
[System.IO.FileOptions]::SequentialScan)
try {
$fs.CopyTo($out)
} finally {
$fs.Dispose()
}
}
} finally {
$out.Dispose()
}
精彩评论