Powershell, File system provider, Get-ChildItem filtering... where are the official docs?
As mentioned in another question, if you try to do a Get-ChildItem -filter ...
command you are more limited than if you used -include
instead of -filter
. I'd like to read the official docs for the file system provider's filtering syntax b开发者_如何学JAVAut after a half hour of searching I still haven't found them. Anyone know where to look?
tl;dr -Filter
uses .NET's implementation of FsRtllsNameInExpression
, which is documented on MSDN along with basic pattern matching info. The algorithm is unintuitive for compatibility reasons, and you should probably avoid using this feature. Additionally, .NET has numerous bugs in its implementation.
-Filter
does not use the filtering system provided by PowerShell--that is, it does not use the filtering system described by Get-Help about_Wildcard
. Rather, it passes the filter to the Windows API. Therefore, the filtering works the same as it does in any other program that utilizes the Windows API, such as cmd.exe
.
Instead, PowerShell uses a FsRtlIsNameInExpression
-like algorithm for -Filter pattern matching. The algorithm based on old MS-DOS behavior, so it's riddled with caveats that are preserved for legacy purposes. It's typically said to have three common special characters. The exact behavior is complex, but it's more or less like the following:
*
: Matches any number of characters (zero-inclusive)?
: Matches exactly one character, excluding the last period in a name.
: If the last period in a pattern, anchors to the last period in the filename, or the end of the filename if it doesn't have a period; can also match a literal period
Just to make things more complicated, Windows added three additional special characters that behave exactly the same as the old MS-DOS special characters. The original special characters have slightly different behavior now to account for more flexible filesystems.
"
is equivalent to MS-DOS.
(DOS_DOT
andANSI_DOS_DOT
in ntifs.h)<
is equivalent to MS-DOS?
(DOS_QM
andANSI_DOS_QM
in ntifs.h)>
is equivalent to MS-DOS*
(DOS_STAR
andANSI_DOS_STAR
in ntifs.h)
Quite a few sources seem to reverse <
and >
. Frighteningly, Microsoft confuses them in their .NET implementation, which means they are also reversed in PowerShell. Additionally, all three compatibility wildcards are inaccissible from -Filter
, as System.IO.Path
mistakenly treats "<>
as invalid, non-wildcard characters. (It allows .*?
.) This contributes to the notion that -Filter is incomplete, unstable, and buggy. You can see .NET's (buggy) implementation of the algorithm on GitHub.
This is additionally complicated by the algorithm's support for 8.3 compatibility filenames, otherwise known as "short" filenames. (You've probably seen them before; they look something like: SOMETH~1.TXT
) A file matches the pattern if either its full filename or its short filename match. FrankFranchise has more information about this caveat in his answer.
The previously-linked MSDN article on FsRtlIsNameInExpression
has the most up-to-date documentation on Windows filename pattern matching, but it's not particularly verbose. For a more thorough explanation of how matching used to work on MS-DOS and how this affects modern matching, this MSDN blog article is the best source I've found. Here's the basic idea:
- Every filename was exactly 11 bytes.
- The first 8 bytes stored the body of the filename, right-padded with spaces
- The last 3 bytes stored the extension, right-padded with spaces
- Letters were converted to uppercase
- Letters, numbers, spaces, and some symbols matched only themselves
?
matched any single character, except spaces in the extension.
would fill the remainder of the first 8 bytes with spaces, then advance to the 9th byte (the start of the extension)*
would fill the remainder of the current section (body or extension) with question marks, then advance to the next section (or the end of the pattern)
The transformations would look like this:
11
User 12345678901
------------ -----------
ABC.TXT > ABC TXT
WILDCARD.TXT > WILDCARDTXT
ABC.??? > ABC ???
*.* > ???????????
*. > ????????
ABC. > ABC
Extrapolating this to work with modern-day filesystems is an unintuitive process at best. For example, take a directory such as the following:
Name Compat Name
-----------------------------------------------
Apple1.txt APPLE1 .TXT
Banana BANANA .
Something.txt SOMETH~1.TXT
SomethingElse.txt SOMETH~2.TXT
TXT.exe TXT .EXE
TXT.eexe TXT~1 .EEX
Wildcard.txt WILDCARD.TXT
I've done quite a bit of testing of these wildcards on Windows 10 and have gotten very inconsistent results, especially DOS_DOT
("
). If you test these from on your own from the command prompt, you'll likely need to escape them (e.g., dir ^>^"^>
in cmd.exe to emulate MS-DOS *.*
).
*.* (everything)
<"< (everything)
* (everything)
< Banana
. (everything)
" (everything)
*. Banana
<" Banana
*g.txt Something.txt
<g.txt Something.txt
<g"txt (nothing)
*1.txt Apple1.txt, Something.txt
<1.txt Apple1.txt, Something.txt
<1"txt (nothing)
*xe TXT.eexe, TXT.exe
<xe (nothing)
*exe TXT.eexe, TXT.exe
<exe TXT.exe
??????.??? Apple1.txt, Asdf.tx, Banana, TXT.eexe, TXT.exe
>>>>>>.>>> Apple1.txt, Asdf.tx, TXT.eexe, TXT.exe
>>>>>>">>> Banana
????????.??? (everything)
>>>>>>>>.>>> (everything except Banana)
>>>>>>>>">>> Banana
???????????.??? (everything)
>>>>>>>>>>>.>>> (everything except Banana)
>>>>>>>>>>>">>> Banana
?????? Banana
>>>>>> Banana
??????????? Banana
>>>>>>>>>>> Banana
???????????? Banana
???? (nothing)
>>>> (nothing)
Banana??. Banana
Banana>>. Banana
Banana>>" Banana
Banana????. Banana
Banana>>>>. Banana
Banana>>>>" Banana
Banana. Banana
Banana" Banana
*txt Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
<txt Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
*t Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
<t (nothing)
*txt* Apple1.txt, Something.txt, SomethingElse.txt, TXT.eexe, TXT.exe, Wildcard.txt
<txt< Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
*txt< Apple1.txt, Something.txt, SomethingElse.txt, Wildcard.txt
<txt* Apple1.txt, Something.txt, SomethingElse.txt, TXT.eexe, TXT.exe, Wildcard.txt
Note: As of writing, WINE's matching algorithm yields significantly different results when testing these "gotchas". Tested with WINE 1.9.6.
As you can see, the backwards-compatible MS-DOS wildcards are obscure and buggy. Even Microsoft has implemented them incorrectly at least once, and it's unclear whether their current behavior in Windows is intentional. The behavior of "
seems completely random, and I expected the results of the last two tests to be swapped.
There is almost nothing on -filter
.
There is a little bit when you do Get-Help Get-ChildItem -full
, but I'm sure you've seen it. There is a post on the Powershell blog, as well. Neither give examples.
Best example I could find is this one, which simply demonstrates that the filter is a string that the provider uses to return a subset of what it would otherwise return, and it's not even directly demonstrating -filter
but simply uses it. However, it's a bit better glimpse than the other links.
However, because the provider is doing the filtering before the results get back to the cmdlet, there are certain caveats. For example, if I want to recursively find all files and directories that begin with "test", I would not want to start with this:
Get-ChildItem -filter 'test*' -recurse
This would filter all results in the current directory before returning anything for the recursion. If I had a directory that began with "test", it would recurse that directory (since the provider would return it to the cmdlet), but no others.
As the example shows, it can address properties in some providers. In the FileSystem provider, you may only be able to use wildcard matching strings on the directory's or file's name (leaf, not full-qualified).
To follow up on what Zenexer mentioned, you should see about the same results that you would see using the same filters with cmd.exe. This includes things you might not expect like 8.3 short file names. You can test this yourself.
Create some example files with PowerShell
md filtertest | cd
(1..1000) | % { New-item -Name ("aaaaa{0:D6}.txt" -f $_) -ItemType File }
Now open up a cmd prompt and run
dir /x
dir aaab*
The first command shows the 8.3 short-names. The second matches some files, even though there is no 'b' character in any of the normal names, because those files contain a 'b' in the short-name.
Now you can flip back to PowerShell and run ls -Filter aaab*
to see the same files again. The -Filter
string is passed to the WinAPI, which matches against those files with 'b' in the 8.3 short-names, just like dir
in cmd.exe. So beware unexpted results when using -Filter
, you might be matching against the 8.3 short-name.
This is all assuming that 8.3 short-names are enabled on your computer.
They are the same place as the docs for all the cmdlets. At the prompt type:
Get-Help Get-ChildItem
If that doesn't tell you enough, then:
Get-Help Get-ChildItem -Detailed
Or if you really want to dig in then:
Get-Help Get-ChildItem -Full
EDIT: While -Detail works fine since PS automagically disambiguates parameter names, it never hurts to have it right :)
精彩评论