Class vs data structure
In object oriented programming, a custom class (like Person class with data of Name, list of addresses, etc)holds data and can include collection objects too. A data structure is also used to hold data too. So, is a class considered advanced data structure co开发者_高级运维nceptually ? And in design of efficient systems (in object oriented world and large systems), are classes considered as similar to data structures and algorithmic analysis done for efficient classes designs for greater efficiency(in companies like google, facebook) ?
I recommend you to read Clean Code chapter 6: objects and data structures. The whole chapter is about this... You can read an abstract if you don't want to buy the book, it can be found here.
According to that, you can use classes efficiently in two different ways. This phenomenon is called data/object anti-symmetry. Depending on your goals, you have to decide whether your classes will follow the open/closed principle or not.
If they follow the OCP, they will be polymorph, and their instances will be used as objects. So they will hide data and implementation of a common interface, and it will be easy to add a new type which implements that interface as well. Most of the design patterns fulfill the OCP, for example MVC, IoC, every wrapper, adapter, etc...
If they don't follow the OCP, they won't be polymorph, their instances will be used as data structures. So they will expose data, and that data will be manipulated by other classes. This is a typical approach by procedural programming as well. There are several examples which don't use OCP, for example DTOs, Exceptions, config objects, visitor pattern etc...
Typical pattern when you should think about fulfilling OCP and move the code to a lower abstraction level:
class Manipulator {
doSomething(Object dataStructure){
if (dataStructure instanceof MyType1){
// doSomething implementation 1
}
else if (dataStructure instanceof MyType2)
{
// doSomething implementation 2
}
// ...
},
domSomethingElse(Object dataStructure){
if (dataStructure instanceof MyType1){
// domSomethingElse implementation 1
}
else if (dataStructure instanceof MyType2)
{
// domSomethingElse implementation 2
}
// ...
}
}
class MyType1 {}
class MyType2 {}
//if you want to add a new type, every method of the Manipulator will change
fix: moving implementation to a lower abstraction level and fulfill OCP
interface MyType {
doSomething();
domSomethingElse();
}
class MyType1 implements MyType {
doSomething(){
// doSomething implementation 1
},
domSomethingElse(){
// domSomethingElse implementation 1
}
}
class MyType2 implements MyType {
doSomething(){
// doSomething implementation 2
},
domSomethingElse(){
// domSomethingElse implementation 2
}
}
// the recently added new type
class MyType3 implements MyType {
doSomething(){
// doSomething implementation 3
},
domSomethingElse(){
// domSomethingElse implementation 3
}
}
Typical pattern when you should think about violating OCP and move the code to an higher abstraction level:
interface MyType {
doSomething();
domSomethingElse();
//if you want to add a new method here, every class which implements this interface, will be modified
}
class MyType1 implements MyType {
doSomething(){
// doSomething implementation 1
},
domSomethingElse(){
// domSomethingElse implementation 1
}
}
class MyType2 implements MyType {
doSomething(){
// doSomething implementation 2
},
domSomethingElse(){
// domSomethingElse implementation 2
}
}
or
interface MyType {
doSomething();
domSomethingElse();
}
class MyType1 implements MyType {
doSomething(){
// doSomething implementation 1
},
domSomethingElse(){
// domSomethingElse implementation 1
}
}
class MyType2 implements MyType {
doSomething(){
// doSomething implementation 2
},
domSomethingElse(){
// domSomethingElse implementation 2
}
}
//adding a new type by which one or more of the methods are meaningless
class MyType3 implements MyType {
doSomething(){
throw new Exception("Not implemented, because it does not make any sense.");
},
domSomethingElse(){
// domSomethingElse implementation 3
}
}
fix: moving implementation to a higher abstraction level and violate OCP
class Manipulator {
doSomething(Object dataStructure){
if (dataStructure instanceof MyType1){
// doSomething implementation 1
}
else if (dataStructure instanceof MyType2)
{
// doSomething implementation 2
}
// ...
},
domSomethingElse(Object dataStructure){
if (dataStructure instanceof MyType1){
// domSomethingElse implementation 1
}
else if (dataStructure instanceof MyType2)
{
// domSomethingElse implementation 2
}
// ...
},
// the recently added new method
doAnotherThing(Object dataStructure){
if (dataStructure instanceof MyType1){
// doAnotherThing implementation 1
}
else if (dataStructure instanceof MyType2)
{
// doAnotherThing implementation 2
}
// ...
}
}
class MyType1 {}
class MyType2 {}
or splitting up the classes into subclasses.
People usually follow OCP over the method count one or two because repeating the same if-else statements is not DRY enough.
I don't recommend you to use mixed classes which partially fulfill, partially violate the OCP, because then the code will be very hard maintainable. You should decide by every situation which approach you follow. This should be usually an easy decision, but if you make a mistake, you can still refactor your code later...
Whether a custom class is a data structure depends on whom you ask. At the very least, the yes people would acknowledge than it's a user-defined data structure which is more domain specific and less established than data structures such as arrays, linked lists or binary trees for example. For this answer, I consider them distinct.
While it's easy to apply Big O algorithm analysis to data structures, it's a little more complex for classes since they wrap many of these structures, as well as other instances of other classes... but a lot of operations on class instances can be broken down into primitive operations on data structures and represented in terms of Big O. As a programmer, you can endeavour to make your classes more efficient by avoiding unnecessary copying of members and ensuring that method invocations don't go through too many layers. And of course, using performant algorithms in your methods goes without saying, but that's not OOP specific. However, functionality, design and clarity should not be sacrificed in favour of performance unless necessary. And premature optimisation is the devil yada yada yada.
I'm certain that some academic, somewhere, has attempted to formulate a metric for quantifying class performance or even a calculus for classes and their operations, but I haven't come across it yet. However, there exists QA research like this which measures dependencies between classes in a project... one could possibly argue that there's a correlation between the number of dependencies and the layering of method invocations (and therefore lower class performance). But if someone has researched this, I'm sure you could find a more relevant metric which doesn't require sweeping inferences.
I would say that conceptually a class is NOT a data structure, a class represents well, a class of objects, and objects are abstract (in the english meaning of the word, not the C++ or C# meaning of the word) entities.
I'd say classes and objects are like the theory behind the practice, and the practice is the implementation of objects using methods and data. The data may be simple or complex (the so-called advanced data structure).
A class is simply a collection of data and methods which can act on that data. You can use a class to implement a data structure, but they are different things.
Take the Linked List for example. You can implement a Linked List data structure using a class, and in some languages this is the cleanest and most obvious way of doing it. It is not the only way to implement a Linked List, but it might be the best depending on the language.
A Linked List however, has nothing to do with being a class. A Linked List is instead a way of representing data as separate nodes where each node is linked to the next in some fashion.
A data structure is a conceptual way of modeling data, each different data structure having different properties and use cases. A class is a syntactic way that some languages offer to group data and methods.
Classes often lend themselves to being used to implement data structures, but it would be incorrect to say that a class == a data structure.
Classes describe a model/concept/type and defines the possible behavior and possible states of that (in your example, a Person
can have a name, address, etc.
A data structure is some type that can be used to organize and group uh...data, in some way. For example, vectors and linked lists are data structures that can used for storing data in an ordered way.
You can have a class that represents a data structure, like std::vector
in C++, or java.util.ArrayList
in Java.
Simply put, a class can be viewed as a syntactical tool provided by a given programming language, say Java, that bundles data and methods together for use in the implementation of concepts or objects in a program or application.
With a class you are able to implement a software component that is a representation of an idea or object in the real world. You do this by capturing the object's properties as member variables and its behavior or operations as the methods of the class.
Data structures on the other hand are basically models of handling data(Array, Linked List, Binary Search Tree). A Class is often used to implement data structures because of their unique way to capture both state and behavior of these structures.
The two are therefore distinct in this sense.
精彩评论