开发者

A more fundamental reason Java does not include operator overloading (at least for assignment)?

Referring to a ~2-year old discussion of the fact that there is no operator overloading in Java ( Why doesn't Java offer operator overloading? ), and coming from many intense C++ years myself to Java, I wonder whether there is a more fundamental reason that operator overloading is not part of the Java language, at least in the case of assignment, than the highest-rated answer in that link states near the bottom of the answer (namely, that it was James Gosling's personal choice).

Specifically, consider assignment.

// C++
#include <iostream>

class MyClass
{
public:
    int x;
    MyClass(const int _x) : x(_x) {}
    MyClass & operator=(const MyClass & rhs) {x=rhs.x; return *this;}
};

int main()
{
    MyClass myObj1(1), myObj2(2);
    MyClass & myRef = myObj1;
    myRef = myObj2;

    std::cout << "myObj1.x = " << myObj1.x << std::endl;
    std::cout << "myObj2.x = " << myObj2.x << std::endl;

    return 0;
}

The output is:

myObj1.x = 2
myObj2.x = 2

In Java, however, the line myRef = myObj2 (assuming the declaration of myRef in the previous line was myClass myRef = myObj1, as Java requires, since all such variables are automatically Java-style 'references') behaves very differently - it would not cause myObj1.x to change and the output would be

myObj1.x = 1
myObj2.x = 2

This difference between C++ and Java leads me to think that the absence of operator overloading in Java, at least in the case of assignment, is not a 'matter of personal choice' on the part of James Gosling, but rather a fundamental necessity given Java's syntax that treats all object variables as references (i.e. MyClass myRef = myObj1 defines myRef to be a Java-style reference). I say this because if assignment in Java causes the left-hand side reference to refer to a different object, rather than allowing the possibility that the object itself change its value, then it would seem that there is no possibility of providing an overloaded assignment operator.

In other words - it's not simply a 'choice', and there's not even the possibility of 'holding your breath' with the hope that it will ever be introduced, as the aforementioned high-rated answer also states (near the end). Quoting: "The reasons for not adding them now could be a mix of internal politics, allergy to the feature, distrust of developers (you know, the saboteur ones), compatibility with the previous JVMs, time to write a correct specification, etc.. So don't hold your breath waiting for this feature.". <-- So this isn't correct, at least for the assignment operator: the reason there's no operator overloading (at least for assignment) is instead fundamental to the nature of Java.

Is this a correct assessment on my part?

ADDENDUM

Assuming the assignment operator is a special case, then my follow-up question is: Are there any other operators, or more generally any other language features, that would by necessity be affected in a开发者_StackOverflow similar way as the assignment operator? I would like to know how 'deep' the difference goes between Java and C++ regarding variables-as-values/references. i.e., in C++, variable tokens represent values (and note, even if the variable token was declared initially as a reference, it's still treated as a value essentially wherever it's used), whereas in Java, variable tokens represent honest-to-goodness references wherever the token is later used.


There is a big misconception when talking about similarities and differences between Java and C++, that arises in your question. C++ references and Java references are not the same. In Java a reference is a resettable proxy to the real object, while in C++ a reference is an alias to the object. To put it in C++ terms, a Java references is a garbage collected pointer not a reference. Now, going back to your example, to write equivalent code in C++ and Java you would have to use pointers:

int main() {
   type a(1), b(2);
   type *pa = &a, *pb = &b;
   pa = pb;
   // a is still 1, b is still 2, pa == pb == &b
}

Now the examples are the same: the assignment operator is being applied to the pointers to the objects, and in that particular case you cannot overload the operator in C++ either. It is important to note that operator overloading can be easily abused, and that is a good reason to avoid it in the first place. Now if you add the two different types of entities: objects and references, things become more messy to think about.

If you were allowed to overload operator= for a particular object in Java, then you would not be able to have multiple references to the same object, and the language would be crippled:

Type a = new Type(1);
Type b = new Type(2);
a = b;                 // dispatched to Type.operator=( Type )??
a.foo();
a = new Type(3);       // do you want to copy Type(3) into a, or work with a new object?

That in turn would make the type unusable in the language: containers store references, and they reassign them (even the first time just when an object is created), functions don't really use pass-by-reference semantics, but rather pass-by-value the references (which is a completely different issue, again, the difference is void foo( type* ) versus void foo( type& ): the proxy entity is copied, you cannot modify the reference passed in by the caller.

The problem is that the language is trying really hard to hide the fact that a and the object that a refers to are not the same thing (same happens in C#), and that in turn means that you cannot explicitly state that one operation is to be applied to the reference/referent, that is resolved by the language. The outcome of that design is that any operation that can be applied to references can never be applied to the objects themselves.

As of the rest of the operators, the decision is most probably arbitrary, because the language hides the reference/object difference, it could have been designed such that a+b was translated into type* operator+( type*, type* ) by the compiler. Since you cannot use arithmetic then there would be no problem, as the compiler would recognize that a+b is an operation that must be applied to the objects (it does not make sense with references). But then it could be considered a little awkward that you can overload +, but you cannot overload =, ==, !=...

That is the path that C# took, where assignment cannot be overloaded for reference types. Interestingly in C# there are value types, and the set of operators that can be overloaded for reference and value types are different. Not having coded C# in large projects, I cannot really tell whether that potential source of confusion is such or if people are just used to it (but if you search SO, you will find that a few people do ask why X cannot be overloaded in C# for reference types where X is one of the operations that can be applied to the reference itself.


That doesn't explain why they couldn't have allowed overloading of other operators like + or -. Considering James Gosling designed the Java language, and he said it was his personal choice, which he explains in more detail at the link provided in the question you linked, I think that's your answer:

There are some things that I kind of feel torn about, like operator overloading. I left out operator overloading as a fairly personal choice because I had seen too many people abuse it in C++. I've spent a lot of time in the past five to six years surveying people about operator overloading and it's really fascinating, because you get the community broken into three pieces: Probably about 20 to 30 percent of the population think of operator overloading as the spawn of the devil; somebody has done something with operator overloading that has just really ticked them off, because they've used like + for list insertion and it makes life really, really confusing. A lot of that problem stems from the fact that there are only about half a dozen operators you can sensibly overload, and yet there are thousands or millions of operators that people would like to define -- so you have to pick, and often the choices conflict with your sense of intuition. Then there's a community of about 10 percent that have actually used operator overloading appropriately and who really care about it, and for whom it's actually really important; this is almost exclusively people who do numerical work, where the notation is very important to appealing to people's intuition, because they come into it with an intuition about what the + means, and the ability to say "a + b" where a and b are complex numbers or matrices or something really does make sense. You get kind of shaky when you get to things like multiply because there are actually multiple kinds of multiplication operators -- there's vector product, and dot product, which are fundamentally very different. And yet there's only one operator, so what do you do? And there's no operator for square-root. Those two camps are the poles, and then there's this mush in the middle of 60-odd percent who really couldn't care much either way. The camp of people that think that operator overloading is a bad idea has been, simply from my informal statistical sampling, significantly larger and certainly more vocal than the numerical guys. So, given the way that things have gone today where some features in the language are voted on by the community -- it's not just like some little standards committee, it really is large-scale -- it would be pretty hard to get operator overloading in. And yet it leaves this one community of fairly important folks kind of totally shut out. It's a flavor of the tragedy of the commons problem.

UPDATE: Re: your addendum, the other assignment operators +=, -=, etc. would also be affected. You also can't write a swap function like void swap(int *a, int *b);. and other stuff.


Is this a correct assessment on my part?

The lack of operator in general is a "personal choice". C#, which is a very similar language, does allow operator overloading. But you still can't overload assignment. What would that even do in a reference-semantics language?

Are there any other operators, or more generally any other language features, that would by necessity be affected in a similar way as the assignment operator? I would like to know how 'deep' the difference goes between Java and C++ regarding variables-as-values/references.

The most obvious is copying. In a reference-semantics language, clone() isn't that common, and isn't needed at all for immutable types like String. But in C++, where the default assignment semantics are based around copying, copy constructors are very common. And automatically generated if you don't define one.

A more subtle difference is that it's a lot harder for a reference-semantics language to support RAII than a value-semantics language, because object lifetime is harder to track. Raymond Chen has a good explanation.


The reason why operator overloading is abused in C++ language is because it's too complex feature. Here's some aspects of it which makes it complex:

  1. expressions are a tree
  2. operator overloading is the interface/documentation for those expressions
  3. interfaces are basically invisible feature in c++
  4. free functions/static functions/friend functions are a big mess in C++
  5. function prototypes are already complex feature
  6. choice of the syntax for operator overloading is less than ideal
  7. there is no other comparable api in c++ language
  8. user-defined types/function names are handled differently than built-in types/function names in function prototypes
  9. it uses advanced math, like the operator<<(ostream&, ostream & (*fptr)(ostream &));
  10. even the simplest examples of it uses polymorphism
  11. It's the only c++ feature that has 2d array in it
  12. this-pointer is invisible and whether your operators are member functions or outside the class is important choice for programmers

Because of these complexity, very small number of programmers actually understand how it works. I'm probably missing many important aspects of it, but the list above is good indication that it is very complex feature.

Update: some explanation about the #4: the argument pretty much is as follows:

class A { friend void f(); }; class B { friend void f(); }
void f() { /* use both A and B members inside this function */ }

With static functions, you can do this:

class A { static void f(); }; void f() { /* use only class A here */ }

And with free functions:

class A { }; void f() { /* you have no special access to any classes */ }

Update#2: The #10, the example I was thinking looks like this in stdlib:

  ostream &operator<<(ostream &o, std::string s) { ... } // inside stdlib
  int main() { std::cout << "Hello World" << std::endl; }

Now the polymorphism in this example happens because you can choose between std::cout and std::ofstream and std::stringstream. This is possible because operator<< first parameter takes a reference to ostream. This is normal runtime polymorphism in this example.

Update #3: About the prototypes still. The real interaction between operator overloading and prototypes is because the overloaded operators becomes part of the class' interface. This brings us to the 2d array thing, because inside the compiler the class interface is a 2d data structure which has quite complex data in it, including booleans, types, function names. The rule #4 is needed so that you can choose when your operators are inside this 2d data structure and when they're outside of it. Rule #8 deals with the booleans stored in the 2d data structure. Rule #7 is because class' interface is used to represent elements of an expression tree.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜