Is .( ever legal in C# or VB.Net?
Can the sequence .(
ever appear in C# or VB.Net code?
I'm reasonably certai开发者_运维知识库n that the answer is no, but I'd like to make sure.
The only places that .
appears in the grammar are:
real-literal:
decimal-digits . decimal-digits ...
. decimal-digits ...
namespace-or-type-name:
namespace-or-type-name . identifier ...
member-access:
primary-expression . identifier ...
predefined-type . identifier ...
qualified-alias-member . identifier ...
base-access:
base . identifier
unbound-type-name:
unbound-type-name . identifier
qualified-identifier:
qualified-identifier . identifier
member-name:
interface-type . identifier
indexer-declarator:
type interface-type . this
(The ... means I have elided the remainder of the production rule.) In none of these cases is a .(
valid as .
is either followed by digits, a valid identifier, or the keyword this
.
In C#, a #region
segment allows any characters to follow it:
#region foo.(
// this is perfectly legal C#
#endregion
Note that in VB.Net this is not a concern because the region label has to be a valid string literal, so it has quotes:
#Region "foo.("
' quotes required
#End Region
It's also legal after #error
and #warn
which have no VB equivalent.
The biggest concern, though, is that you can have any arbitrary code inside of an #if
block. In C#:
#if false
foo.( perfectly legal
#endif
In VB.Net:
#If False Then
foo.( perfectly legal
#End If
It's actually worse than that, because the VB version allows arbitrary expressions so you can't know if some code is actually VB unless you evaluate the expressions. In other words, parsing alone is not sufficient -- you have to evaluate too.
That said, analyzing the grammar in the C# Language Specification Version 4.0, Appendix B, the .
character appears in the following lines:
real-literal: decimal-digits . decimal-digits exponent-partopt real-type-suffixopt . decimal-digits exponent-partopt real-type-suffixopt operator-or-punctuator: one of { } [ ] ( ) . , : ; namespace-or-type-name: namespace-or-type-name . identifier type-argument-listopt member-access: primary-expression . identifier type-argument-listopt predefined-type . identifier type-argument-listopt qualified-alias-member . identifier base-access: base . identifier unbound-type-name: unbound-type-name . identifier generic-dimension-specifieropt qualified-identifier: qualified-identifier . identifier member-name: interface-type . identifier indexer-declarator: type interface-type . this [ formal-parameter-list ]
Since a .
is always followed by a decimal digit, an identifier, or a this
token, the only way to have a .(
sequence is to allow multiple operator-or-punctuator
symbols next to each other. Looking up operator-or-punctuator
, we see:
token: operator-or-punctuator
Since token
is only used in lexical analysis, there's nothing to suggest that a .
is legal followed by a (
in regular code.
Of course that still leaves comments, literals, etc. which I leave out because you already know about those.
No reference to the grammar and completely unscientific, but here's my guess:
.(
is not legal in C# (can't speak for VB.NET).
Outside of comments and string literals, I think .
can only appear as:
- The member access operator, which must be followed by an identifier. Since identifiers may not begin with
(
, this is a no go. - As a decimal point in real literals, which must be followed by a digit.
(
is not a digit.
Finally, the .
operator is not overloadable, so foo.(bar)
won't work either.
Having perused the VB reference, I’m now confident that the answer for VB is no.
VB uses the character .
for only three things: inside floating point number literals and for member access and nested name access.
Leaving aside XML literals, the only thing that may every appear behind a member access is an IdentifierOrKeyword
(§1.105.6). Identifiers are very well-defined and they may only start with letters, underscores or, in the case of an escaped identifier, the character [
.
The same goes for nested name access (and, for completeness’ sake, also in With
blocks and field initialisers).
As for floating point literals, the point there must be followed by at least one more digit (§1.6.3).
On this page http://blogs.msdn.com/b/lucian/archive/2010/04/19/grammar.aspx I put a copy of the complete grammar for C#4 and VB10 in machine-readable format (EBNF & ANTLR) and human-readable (HTML). The HTML version includes the computed "may-follows" set for each token.
According to this, the "may-follows" set of PERIOD does not include LPAREN in either C#4 or VB10.
Unfortunately the grammars aren't quite complete. Nevertheless, within the VB/C# teams, these grammars are what we start with for a lot of our analysis. For instance...
VB10 introduced "single-line statement lambdas" of the form "Sub() STMT". A lambda itself is an expression, and may appear in a list e.g. "Dim array = {Sub() STMT1, Sub() STMT2}". We had to be aware of ambiguities about what came after an expression and what came after a statement. For instance, "Dim x = Sub() Dim y = 5, z = 3" is ambiguous because the "z=3" might be part of the first OR the second Dim.
VB10 introduced "implicit line continuation" feature, which is more or less analogous to allowing C# to include SEMICOLON anywhere in the source code. We had to figure out whether this would introduce any ambiguities. That's equivalent to asking whether a prefix of any sentence in the language is also itself a valid sentence. That's equivalent to figuring out whether the intersection of two context-free languages is empty. It's an undecidable problem in general, but not in the case of the VB grammar, which we were able to decide with additional "human insight" into the algorithm.
I don't think they'll add anything like this to C#, would just look plain wrong.
I am not at all sure about VB.Net, though. just by looking at how they did generics, it seems that the VB.Net team doesn't have this "not looking weird" attitude.
So, if you build any kind of tool that should work with future versions of those languages, better watch out for VB.Net...
精彩评论