Relation between programming languages
I was wondering about the following questions:
- What does it mean "some language is a subset/superset of another"? Can it be defined in mathematics? Is it related to the subset/superset concept in elementary set theory?
Are almost all existing languages implemented/written in some small number of low-level languages? For example, are most languages written in C? Is C++ written in C?
Is there some relation between the implementation relation and the concept of subset/superset of languages?
In terms of language features, some languages have more than some other. In some cases, some开发者_如何学Python has all the features of some other, for example, does C++ have all the features of C?
Is there some relation between the subset/superset relation in terms of the set of features and the subset/superset relation between languages?
- Are there other aspects that characterize relation between languages?
Thanks and regards!
What does it mean "some language is a subset/superset of another"?
Syntactically a language A is a subset of a language B if every program that is valid in language A is also valid in language B. Semantically it is a subset if it is a syntactic subset and each valid A program also exhibits the same behavior in language B.
Can it be defined in mathematics? Is it related to the subset/superset concept in elementary set theory?
Syntactic subset: If P_A
is the set of all valid programs in language A and P_B
is the set of all valid programs in language B, then language A
is a syntactic subset of language B
exactly if P_A
is a subset of P_B
.
Semantic subset: Let A(p)
be a function which describes the behavior of program p
in language A, and B(p)
describe the behavior of program p
in language B. A is a subset of B if and only if for all p
for which A(p)
is defined, B(p)
is also defined and A(p) = B(p)
.
Are almost all existing languages implemented/written in some small number of low-level languages?
This depends on your definition of "almost all" of course, but I'm inclined to say no. Many compilers and interpreters are written in C and C++ (simply because a lot of software in general is implemented in C and C++), but by far not all.
For example, are most languages written in C? Is C++ written in C?
As has been pointed out in comments already, C++ is a language not a piece of software. g++
which is the GNU C++ compiler is written in C, but there are also C++ compilers which are written in different languages (probably).
In terms of language features, some languages have more than some other. In some cases, some has all the features of some other, for example, does C++ have all the features of C?
Yes (unless you count simplicity as a feature).
Is there some relation between the subset/superset relation in terms of the set of features and the subset/superset relation between languages?
If a language is a superset of another language, the set of that language's features will also have to be a superset of the other language's features (again unless you count simplicity or things like "the language does not allow X" as a feature).
This is not applicable in the other direction however (i.e. just because A's features are a superset of B's features, A does not have to be a superset of B).
I wanted to pick up this:
Are almost all existing languages implemented/written in some small number of low-level languages? For example, are most languages written in C? Is C++ written in C?
As far as I know, in practice almost all languages that originated after C are written in C, due to C's overwhelming popularity during a certain period in time- until they are ready to implement their own compilers. Most languages that compile to native code implement themselves- that is, modern C++ compilers are written in C++. This is achieved by compiling the new compiler with a previous version of the compiler that is known to be good- the LKG or "Last Known Good" compiler. I know for a fact that the Visual C++ compiler is done this way, and I recall that there are Haskell IDEs which are also done like this and even PROLOG. The original C++ compiler was written in C- but since C++ became a general-purpose powerful language in it's own right, people wrote C++ compilers in it.
Of course, this process is impossible for languages which do not compile to native code, as they must always have some underlying interpreter or virtual machine to execute their code which cannot be written in that language, making it impossible to invalidate native languages with managed or interpreted languages.
Is there some relation between the implementation relation and the concept of subset/superset of languages?
Yes, there is. If you're implementing C#, why ditch the many years of good experience of C++ of making polymorphic function calls fast? The easiest thing to do would be to just fall back on that implementation- and it's my understanding that in C# running on .NET framework, that this is indeed what happens- they use an implementation basically taken straight from C++. If you're implementing a language feature that already exists in a certain language, you're losing out on experience and innovation if you roll a new implementation from scratch. Of course, this is different if those implementations are proprietary or something, but in general.
Are there other aspects that characterize relation between languages?
Yes, there are. The most obvious one is syntactic- consider the syntactic relations between C, C++, C#, and Java, even though Java and C# are clearly not supersets of C. Then consider the approach to major issues in software development. For example, Java and C# are both statically typed, garbage collected, virtual-machine based languages. Then you could consider design mistakes. In my opinion, design mistakes is one of the biggest hints that two languages are much more closely related than they really should be. Here, you can consider Java and C# again. Covariant arrays are broken. A Giraffe[]
is not an Animal[]
, but both Java and C# allow the conversion. This is a clear design mistake, yet both languages have it, which is a sign that they are much too closely related.
Of course, C++ is in a bit of a unique position here, I don't know of any languages that directly succeeds another language quite like that, and C/C++ are the closest thing you'll ever find to language superset. The C++ Standard committee is still Standardising features in C++ purely to keep compatibility with C99.
There is a strict definition for formal languages - a language L1 is a subset of language L2 if and only if every well-formed formula of L1 is a well-formed formula of L2.
In the case of programming languages a "well-formed formula" means a syntactically valid program, and you may or may not want your definition of "subset" to say not only that a valid program of L1 is also a valid program of L2, but also that it has the same semantics in L2 as in L1. Since C and C++ both have a semantic notion of undefined behavior, you would also then say that for L1 to be a subset of L2, it is only necessary that every syntactically valid program with defined behavior is valid in L2 with the same defined behavior - it's not required that every program with UB in L1 also has UB in L2. Formal languages don't define semantics, just grammar, which is why this isn't part of the first definition.
C++ is not truly a superset of C. It's very easy to write valid C programs which are not valid C++ programs, perhaps the most obvious way being that C++ reserves some keywords that are not reserved in C, so a valid C program using new
as a variable name isn't valid C++. In practice, people talk about a slightly looser notion of a language being a superset, and might say that C++ is "almost" a superset of C, to mean that a great many valid C programs are also valid C++. Of course loose notions can lead to errors (both of communication and of programming).
A proper definition of subset is important when you are trying to change a language (to create a new version) while maintaining so-called "backward compatibility". For your new version to be truly backward-compatible with the old one, an implementation of the new one should run every program from the old one exactly as before (at least, as far as the language defines its meaning), because this means users can update to the new version and all their old programs will still work (at least, assuming they relied only on guaranteed behavior). The same applies with, say, a library API, except that then you aren't worried about the whole language, you're just worried about interaction with your interface.
- Although the term and general concept comes from set theory (and if you defined a programming language as a number of sets you could take the term literal and see subset/superset relationships between some of these sets), for all practical purposes the definition is much more informal: Language L1 is a superset of language L2 if programs valid in L2 are valid in L1 as well.
- Don't confuse languages with language implementation. C++ is just an abstract specification, but it has been implemented in numerous ways - propably C first, nowadays propably in C++. But basically, yes - since the first implementation of L can't be written in L, you have to write it in something else. That something else is usually a widely-used, mature language. In the case of interpreters/VMs, it's usually C or C++ for raw speed and to have control over memory management.
- There's very rarely "more" or "less", there's always "different". C++ is built on top of C, so of course it has most of its features. But even in that case, we don't have a real superset relation and not just "more" features - C++ doesn't have all C features (anymore, at least) - just think of C99, variable length arrays to give a concrete example. To be a full superset of another language, a language of course needs to be support all of that language. In that case one could indeed talk about it having "more" features, I suppose.
- Countless. Take your pick, use your imagination. Few of them are useful or interesting though.
精彩评论