开发者

Why do programming languages not allow spaces in identifiers?

This may seem like a dumb question, but still I don't know the answer.

Why do programming languages not allow spaces in the names ( for instance method n开发者_如何学Goames )?

I understand it is to facilitate ( allow ) the parsing, and at some point it would be impossible to parse anything if spaces were allowed.

Nowadays we are so use to it that the norm is not to see spaces.

For instance:

 object.saveData( data );
 object.save_data( data )
 object.SaveData( data );
 [object saveData:data];

etc.

Could be written as:

 object.save data( data )  // looks ugly, but that's the "nature" way.

If it is only for parsing, I guess the identifier could be between . and ( of course, procedural languages wouldn't be able to use it because there is no '.' but OO do..

I wonder if parsing is the only reason, and if it is, how important it is ( I assume that it will be and it will be impossible to do it otherwise, unless all the programming language designers just... forget the option )

EDIT

I'm ok with identifiers in general ( as the fortran example ) is bad idea. Narrowing to OO languages and specifically to methods, I don't see ( I don't mean there is not ) a reason why it should be that way. After all the . and the first ( may be used.

And forget the saveData method , consider this one:

key.ToString().StartsWith("TextBox")

as:

key.to string().starts with("textbox");


Be cause i twoul d makepa rsing suc hcode reallydif ficult.


I used an implementation of ALGOL (c. 1978) which—extremely annoyingly—required quoting of what is now known as reserved words, and allowed spaces in identifiers:

  "proc" filter = ("proc" ("int") "bool" p, "list" l) "list":
     "if" l "is" "nil" "then" "nil"
     "elif" p(hd(l)) "then" cons(hd(l), filter(p,tl(l)))
     "else" filter(p, tl(l))
     "fi";

Also, FORTRAN (the capitalized form means F77 or earlier), was more or less insensitive to spaces. So this could be written:

  799 S = FLO AT F (I A+I B+I C) / 2 . 0
      A  R E  A = SQ R T ( S *(S - F L O ATF(IA)) * (S - FLOATF(IB)) *
     +     (S - F LOA TF (I C)))

which was syntactically identical to

  799 S = FLOATF (IA + IB + IC) / 2.0
      AREA = SQRT( S * (S - FLOATF(IA)) * (S - FLOATF(IB)) *
     +     (S - FLOATF(IC)))

With that kind of history of abuse, why make parsing difficult for humans? Let alone complicate computer parsing.


Yes, it's the parsing - both human and computer. It's easier to read and easier to parse if you can safely assume that whitespace doesn't matter. Otherwise, you can have potentially ambiguous statements, statements where it's not clear how things go together, statements that are hard to read, etc.


Such a change would make for an ambiguous language in the best of cases. For example, in a C99-like language:

if not foo(int x) {
    ...
}

is that equivalent to:

  1. A function definition of foo that returns a value of type ifnot:

    ifnot foo(int x) {
        ...
    }
    
  2. A call to a function called notfoo with a variable named intx:

    if notfoo(intx) {
        ...
    }
    
  3. A negated call to a function called foo (with C99's not which means !):

    if not foo(intx) {
        ...
    }
    

This is just a small sample of the ambiguities you might run into.

Update: I just noticed that obviously, in a C99-like language, the condition of an if statement would be enclosed in parentheses. Extra punctuation can help with ambiguities if you choose to ignore whitespace, but your language will end up having lots of extra punctuation wherever you would normally have used whitespace.


Before the interpreter or compiler can build a parse tree, it must perform lexical analysis, turning the stream of characters into a stream of tokens. Consider how you would want the following parsed:

a = 1.2423 / (4343.23 * 2332.2);

And how your rule above would work on it. Hard to know how to lexify it without understanding the meaning of the tokens. It would be really hard to build a parser that did lexification at the same time.


There are a few languages which allow spaces in identifiers. The fact that nearly all languages constrain the set of characters in identifiers is because parsing is more easy and most programmers are accustomed to the compact no-whitespace style.

I don’t think there’s real reason.


Check out Stroustrup's classic Generalizing Overloading for C++2000.


We were allowed to put spaces in filenames back in the 1960's, and computers still don't handle them very well (everything used to break, then most things, now it's just a few things - but they still break).

We simply can't wait another 50 years before our code will work again. :-)

(And what everyone else said, of course. In English, we use spaces and punctuation to separate the words. The same is true for computer languages, except that computer parsers define "words" in a slightly different sense)


Using space as part of an identifier makes parsing really murky (is that a syntactic space or an identifier?), but the same sort "natural reading" behavior is achieved with keyword arguments. object.save(data: something, atomically: true)


The TikZ language for creating graphics in LaTeX allows whitespace in parameter names (also known as 'keys'). For instance, you see things like

\shade[
  top color=yellow!70,
  bottom color=red!70,
  shading angle={45},
]

In this restricted setting of a comma-separated list of key-value pairs, there's no parsing difficulty. In fact, I think it's much easier to read than the alternatives like topColor, top_color or topcolor.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜