How to transpose data in powershell
I have a file that looks like this:
a,1 b,2 c,3 a,4 b,5 c,6 (...repeat 1,000s of lines)开发者_运维百科How can I transpose it into this?
a,b,c 1,2,3 4,5,6Thanks
Here's a brute-force one-liner from hell that will do it:
PS> Get-Content foo.txt |
Foreach -Begin {$names=@();$values=@();$hdr=$false;$OFS=',';
function output { if (!$hdr) {"$names"; $global:hdr=$true}
"$values";
$global:names=@();$global:values=@()}}
-Process {$n,$v = $_ -split ',';
if ($names -contains $n) {output};
$names+=$n; $values+=$v }
-End {output}
a,b,c
1,2,3
4,5,6
It's not what I'd call elegant but should get you by. This should copy/paste correctly as-is. However if you reformat it to what is shown above you will need put back-ticks after the last curly on both the Begin and Process scriptblocks. This script requires PowerShell 2.0 as it relies on the new -split operator.
This approach makes heavy use of the Foreach-Object cmdlet. Normally when you use Foreach-Object (alias is Foreach) in the pipeline you specify just one scriptblock like so:
Get-Process | Foreach {$_.HandleCount}
That prints out the handle count for each process. This usage of Foreach-Object uses the -Process scriptblock implicitly which means it executes once for each object it receives from the pipeline. Now what if we want to total up all the handles for each process? Ignore the fact that you could just use Measure-Object HandleCount -Sum
to do this, I'll show you how Foreach-Object can do this. As you see in the original solution to this problem, Foreach can take both a Begin scriptblock that is executed once for the first object in the pipeline and a End scripblock that executes when there are no more objects in the pipeline. Here's how you can total the handle count using Foreach-Object:
gps | Foreach -Begin {$sum=0} -Process {$sum += $_.HandleCount } -End {$sum}
Relating this back to the problem solution, in the Begin scriptblock I initialize some variables to hold the array of names and values as well as a bool ($hdr) that tells me whether or not the header has been output (we only want to output it once). The next mildly mind blowing thing is that I also declare a function (output) in the Begin scriptblock that I call from both the Process and End scriptblocks to output the current set of data stored in $names and $values.
The only other trick is that the Process scriptblock uses the -contains operator to see if the current line's field name has already been seen before. If so, then output the current names and values and reset those arrays to empty. Otherwise just stash the name and value in the appropriate arrays so they can be saved later.
BTW the reason the output function needs to use the global: specifier on the variables is that PowerShell performs a "copy-on-write" approach when a nested scope modifies a variable defined outside its scope. However when we really want that modification to occur at the higher scope, we have to tell PowerShell that by using a modifier like global: or script:.
精彩评论