Walter Bright's use of the word "redundancy"... or 'The heck does that mean?'

2023-01-12 12:09 问答作者：

So I'm reading this interview with Walter Bright about the D language in Bitwise (http://www.bitwisemag.com/copy/programming/d/interview/d_programming_language.html), and I come across this really interesting quote about language parsing:

From a theoretical perspective, however, being able to generate a good diagnostic requires that there be redundancy in the syntax. The redundancy is used to make a guess at what was intended, and the more redundancy, the more likely that guess will be correct. It's like the English language - if we misspell a wrod now and then, or if a word missing, the redundancy enables us to correctly guess the meaning. If there is no redundancy in a language, then any random sequence of characters is a valid program.

And now I'm trying to figure out what the heck he means when he says "redundancy".

I can barely wrap my head around the last part, where he mentions that it is possible to have a language in which "any random sequence of characters is a valid program." I was taught that there are three kinds of errors: syntactic, run-time, and semantic. Are there languages in which the o开发者_StackOverflow社区nly possible errors are semantic? Is assembly like that? What about machine code?

I'll focus on why (I think) Walther Bright thinks redunancy is good. Let's take XML as an example. This snippet:

<foo>...</foo>

has redunancy, the closing tag is redunant if we use S-Expressions instead:

(foo ...)

It's shorter, and the programmer doesn't have to type foo more often than neccessary to make sense of that snippet. Less redunancy. But it has downsides, as an example from http://www.prescod.net/xml/sexprs.html shows:

(document author: "paul@prescod.net"
    (para "This is a paragraph " (footnote "(better than the one under there)" ".")
    (para "Ha! I made you say \"underwear\"."))


<document author="paul@prescod.net">
<para>This is a paragraph <footnote>(just a little one).</para>
<para>Ha! I made you say "underwear".</para>
</document>

In both, the end tag/a closing paren for footnote is missing. The xml version is plain invalid as soon as the parser sees </para>. The S-Expression one is only invalid by the end of the document, and only if you don't have an unneeded closing paren somewhere else. So redunancy does help, in some cases, to udnerstand what the writer meant (and point out errors in his way of expressing that).

Assembly language (most assembly languages, anyway) is not like that at all -- they have quite a rigid syntax, and most random strings would be diagnosed as errors.

Machine code is a lot closer. Since there's no translation from "source" to "object" code involved, all errors are semantic, not syntactic. Most processors do have various inputs they'd reject (e.g., execute a "bad opcode" trap/interrupt). You could argue that in some cases this would be syntactic (e.g., an opcode that wasn't recognized at all) where others are semantic (e.g., a set of operands that weren't allowed for that instruction).

For those who remember it, TECO was famous (notorious?) for assigning some meaning to almost any possible input, so it was pretty much the same way. An interesting challenge was to figure out what would happen if you typed in (for one example) your name.

nglsh nclds ll srts of xtr ltrs t mk it ezr t read

Well, to use an example from C# (since I don't know D). If you have a class with an abstract method, the class itself must be marked abstract:

public abstract class MyClass
{
    public abstract MyFunc();
}

Now, it would be trivial for the compiler to automatically mark MyClass as abstract (that is the way C++ handles it), but in C#, you must do it explicitly, so that your intentions are clear.

Similarly with virtual methods. In C++, if declare virtual in a base class, a method is automatically virtual in all derived classes. In C#, the method must nevertheless be explicit marked override, so there is no confusion about what you wanted.

I think he was talking about syntactical structures in the language and how they can be interpreted. As an example, consider the humble "if" statement, rendered in several languages.

In bash (shell script), it looks like this:

if [ cond ]; then
  stmts;
elif [ other_cond ]; then
  other_stmts;
else
  other_other_stmts;
fi

in C (w/single statments, no curly braces):

if (cond)
  stmt;
else if (other_cond)
  other_stmt;
else
  other_other_stmt;

You can see that in bash, there is a lot more syntactical structure to the if statement than there is in C. In fact, all control structures in bash have their own closing delimiters (e.g. if/then/fi, for/do/done, case/in/esac,...), whereas in C the curly brace is used everywhere. These unique delimiters disambiguate the meaning of the code, and thereby provide context from which the interpreter/compiler can diagnose error conditions and report them to the user.

There is, however, a tradeoff. Programmers generally prefer terse syntax (a la C, Lisp, etc.) to verbose syntax (a la Pascal, Ada, etc.). However, they also prefer descriptive error messages containing line/column numbers and suggested resolutions. These goals are of course at odds with each other--you can't have your cake and eat it too (at least, while keeping the internal implementation of the compiler/interpreter simple).

It means that the syntax contains more information than necessary to encode a working program. An example is function prototypes. As K&R C shows us, they're redundant because the compiler can just let the caller push whatever arguments you want on, then let the function pop the correct arguments off. But C++ and other languages mandate them, because they help the compiler check that you're calling the function the right way.

Another example is the requirement to declare variables before using them. Some languages have this, while others don't. It it is clearly redundant, but it often helps prevent errors (e.g misspelling, using a variable that has been removed).

I think a better example of redundancy is something like int a[10] =. At this point, the compiler knows what should come next, an int array initializer, and can come up with an appropriate error message if what follows isn't an int array initializer. If the language syntax said that anything could follow int a[10], it would be a lot harder for the compiler to figure out problems with one.

then any random sequence of characters is a valid program.

Although not quite "any random sequence is valid", consider Perl and Regular Expressions. Their very short syntax makes it easier for invalid characters to still pass syntactic and semantic analysis.

继续阅读：d error-handling parsing programming-languages semantics

Walter Bright's use of the word "redundancy"... or 'The heck does that mean?'

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？