classes and objects

Classes and Objects

What is a class?

The fundamental building block of OO software.

A class defines a data type, much like a struct would be in C. In a computer science sense, a type consists of both a set of states and a set of operations which transition between those states. Thus int is a type because it has both a set of states and it has operations like i + j or i++, etc. In exactly the same way, a class provides a set of (usually public) operations, and a set of (usually non-public) data bits representing the abstract values that instances of the type can have.

You can imagine that int is a class that has member functions called operator++, etc. (int isn’t really a class, but the basic analogy is this: a class is a type, much like int is a type.)

Note: a C programmer can think of a class as a C struct whose members default to private. But if that’s all you think of a class, then you probably need to experience a personal paradigm shift.

What is an object?

A region of storage with associated semantics.

After the declaration int i; we say that “i is an object of type int.” In OO/C++, “object” usually means “an instance of a class.” Thus a class defines the behavior of possibly many objects (instances).

When is an interface “good”?

When it provides a simplified view of a chunk of software, and it is expressed in the vocabulary of a user (where a “chunk” is normally a class or a tight group of classes, and a “user” is another developer rather than the ultimate customer).

  • The “simplified view” means unnecessary details are intentionally hidden. This reduces the user’s defect-rate.
  • The “vocabulary of users” means users don’t need to learn a new set of words and concepts. This reduces the user’s learning curve.

What is encapsulation?

Preventing unauthorized access to some piece of information or functionality.

The key money-saving insight is to separate the volatile part of some chunk of software from the stable part. Encapsulation puts a firewall around the chunk, which prevents other chunks from accessing the volatile parts; other chunks can only access the stable parts. This prevents the other chunks from breaking if (when!) the volatile parts are changed. In context of OO software, a “chunk” is normally a class or a tight group of classes.

The “volatile parts” are the implementation details. If the chunk is a single class, the volatile part is normally encapsulated using the private and/or protected keywords. If the chunk is a tight group of classes, encapsulation can be used to deny access to entire classes in that group. Inheritance can also be used as a form of encapsulation.

The “stable parts” are the interfaces. A good interface provides a simplified view in the vocabulary of a user, and is designed from the outside-in (here a “user” means another developer, not the end-user who buys the completed application). If the chunk is a single class, the interface is simply the class’s public member functions and friend functions. If the chunk is a tight group of classes, the interface can include several of the classes in the chunk.

Designing a clean interface and separating that interface from its implementation merely allows users to use the interface. But encapsulating (putting “in a capsule”) the implementation forces users to use the interface.

How does C++ help with the tradeoff of safety vs. usability?

In C, encapsulation was accomplished by making things static in a compilation unit or module. This prevented another module from accessing the static stuff. (By the way, static data at file-scope is now deprecated in C++: don’t do that.)

Unfortunately this approach doesn’t support multiple instances of the data, since there is no direct support for making multiple instances of a module’s static data. If multiple instances were needed in C, programmers typically used a struct. But unfortunately C structs don’t support encapsulation. This exacerbates the tradeoff between safety (information hiding) and usability (multiple instances).

In C++, you can have both multiple instances and encapsulation via a class. The public part of a class contains the class’s interface, which normally consists of the class’s public member functions and its friend functions. The private and/or protected parts of a class contain the class’s implementation, which is typically where the data lives.

The end result is like an “encapsulated struct.” This reduces the tradeoff between safety (information hiding) and usability (multiple instances).

How can I prevent other programmers from violating encapsulation by seeing the private parts of my class?

Not worth the effort — encapsulation is for code, not people.

It doesn’t violate encapsulation for a programmer to see the private and/or protected parts of your class, so long as they don’t write code that somehow depends on what they saw. In other words, encapsulation doesn’t prevent people from knowing about the inside of a class; it prevents the code they write from becoming dependent on the insides of the class. Your company doesn’t have to pay a “maintenance cost” to maintain the gray matter between your ears; but it does have to pay a maintenance cost to maintain the code that comes out of your finger tips. What you know as a person doesn’t increase maintenance cost, provided the code you write depends on the interface rather than the implementation.

Besides, this is rarely if ever a problem. I don’t know any programmers who have intentionally tried to access the private parts of a class. “My recommendation in such cases would be to change the programmer, not the code” [James Kanze; used with permission].

Can a method directly access the non-public members of another instance of its class?

Yes.

The name this is not special. Access is granted or denied based on the class of the reference/pointer/object, not based on the name of the reference/pointer/object. (See below for the fine print.)

The fact that C++ allows a class’ methods and friends to access the non-public parts of all its objects, not just the this object, seems at first to weaken encapsulation. However the opposite is true: this rule preserves encapsulation. Here’s why.

Without this rule, most non-public members would need a public get method, because many classes have at least one method or friend that takes an explicit argument (i.e., an argument not called this) of its own class.

Huh? (you ask). Let’s kill the mumbo jumbo and work out an example:

Consider assignment operator Foo::operator=(const Foo& x). This assignment operator will probably change the data members in the left-hand argument, *this, based on the data members in the right-hand argument, x. Without the C++ rule being discussed here, the only way for that assignment operator to access the non-public members of x would be for class Foo to provide a public get method for every non-public datum. That would suck bigtime. (NB: “suck bigtime” is a precise, sophisticated, technical term; and I am writing this on April 1.)

The assignment operator isn’t the only one that would weaken encapsulation were it not for this rule. Here is a partial(!) list of others:

  • Copy constructor.
  • Comparison operators: ==, !=, <=, <, >=, >.
  • Binary arithmetic operators: x+y, x-y, x*y, x/y, x%y.
  • Binary bitwise operators: x^y, x&y, x|y.
  • Static methods that accepts an instance of the class as a parameter.
  • Static methods that creates/manipulates an instance of the class.
  • etc.

Conclusion: encapsulation would be shredded without this beneficial rule: most non-public members of most classes would end up having a public get method.

The Fine Print: There is another rule that is related to the above: methods and friends of a derived class can access the protected base class members of any of its own objects (any objects of its class or any derived class of its class), but not others. Since that is hopelessly opaque, here’s an example: suppose classes D1 and D2 inherit directly from class B, and base class B has protected member x. The compiler will let D1’s members and friends directly access the x member of any object it knows to be at least a D1, such as via a D1* pointer, a D1& reference, a D1 object, etc. However the compiler will give a compile-time error if a D1 member or friend tries to directly access the x member of anything it does not know is at least a D1, such as via a B* pointer, a B& reference, a B object, a D2* pointer, a D2& reference, a D2 object, etc. By way of (imperfect!!) analogy, you are allowed to pick your own pockets, but you are not allowed to pick your father’s pockets nor your brother’s pockets.

Is Encapsulation a Security device?

No.

Encapsulation != security.

Encapsulation prevents mistakes, not espionage.

What’s the difference between the keywords struct and class?

The members and base classes of a struct are public by default, while in class, they default to private. Note: you should make your base classes explicitly public, private, or protected, rather than relying on the defaults.

struct and class are otherwise functionally equivalent.

Enough of that squeaky clean techno talk. Emotionally, most developers make a strong distinction between a class and a struct. A struct simply feels like an open pile of bits with very little in the way of encapsulation or functionality. A class feels like a living and responsible member of society with intelligent services, a strong encapsulation barrier, and a well defined interface. Since that’s the connotation most people already have, you should probably use the struct keyword if you have a class that has very few methods and has public data (such things do exist in well designed systems!), but otherwise you should probably use the class keyword.

How do I define an in-class constant?

If you want a constant that you can use in a compile time constant expression, say as an array bound, use constexpr if your compiler supports that C++11 feature, otherwise you have two other choices:

class X {
    constexpr int c1 = 42; // preferred
    static const int c2 = 7;
    enum { c3 = 19 };

    array<char,c1> v1;
    array<char,c2> v2;
    array<char,c3> v3;

    // ...
};

You have more flexibility if the constant isn’t needed for use in a compile time constant expression:

    class Z {
        static char* p;     // initialize in definition
        const int i;        // initialize in constructor
    public:
        Z(int ii) :i(ii) { }
    };

    char* Z::p = "hello, there";

You can bind a reference to a static data member, or take its address, if (and only if) it has an out-of-class definition:

    class AE {
        // ...
    public:
        static const int c6 = 7;
        static const int c7 = 31;
    };

    const int AE::c7;   // definition

        void byref(const int&);

    int f()
    {
                byref(AE::c6);                  // error: c6 not an lvalue
                byref(AE::c7);                  // ok
        const int* p1 = &AE::c6;    // error: c6 not an lvalue
        const int* p2 = &AE::c7;    // ok
        // ...
    }

Why do I have to put the data in my class declarations?

You don’t. If you don’t want data in an interface, don’t put it in the class that defines the interface. Put it in derived classes instead. See, Why do my compiles take so long?.

Sometimes, you do want to have representation data in a class. Consider class complex:

    template<class Scalar> class complex {
    public:
        complex() : re(0), im(0) { }
        complex(Scalar r) : re(r), im(0) { }
        complex(Scalar r, Scalar i) : re(r), im(i) { }
        // ...

        complex& operator+=(const complex& a)
            { re+=a.re; im+=a.im; return *this; }
        // ...
    private:
        Scalar re, im;
    };

This type is designed to be used much as a built-in type and the representation is needed in the declaration to make it possible to create genuinely local objects (i.e. objects that are allocated on the stack and not on a heap) and to ensure proper inlining of simple operations. Genuinely local objects and inlining is necessary to get the performance of complex close to what is provided in languages with a built-in complex type.

How are C++ objects laid out in memory?

Like C, C++ doesn’t define layouts, just semantic constraints that must be met. Therefore different implementations do things differently. One good explanation is in a book that is otherwise outdated and doesn’t describe any current C++ implementation: The Annotated C++ Reference Manual (usually called the ARM). It has diagrams of key layout examples. There is a very brief explanation in Chapter 2 of TC++PL3.

Basically, C++ constructs objects simply by concatenating sub objects. Thus

        struct A { int a,b; };

is represented by two ints next to each other, and

        struct B : A { int c; };

is represented by an A followed by an int; that is, by three ints next to each other.

Virtual functions are typically implemented by adding a pointer (the “vptr”) to each object of a class with virtual functions. This pointer points to the appropriate table of functions (the “vtbl”). Each class has its own vtbl shared by all objects of that class.

Why is the size of an empty class not zero?

To ensure that the addresses of two different objects will be different. For the same reason, new always returns pointers to distinct objects. Consider:

    class Empty { };

    void f()
    {
        Empty a, b;
        if (&a == &b) cout << "impossible: report error to compiler supplier";

        Empty* p1 = new Empty;
        Empty* p2 = new Empty;
        if (p1 == p2) cout << "impossible: report error to compiler supplier";
    }   

There is an interesting rule that says that an empty base class need not be represented by a separate byte:

    struct X : Empty {
        int a;
        // ...
    };

    void f(X* p)
    {
        void* p1 = p;
        void* p2 = &p->a;
        if (p1 == p2) cout << "nice: good optimizer";
    }

This optimization is safe and can be most useful. It allows a programmer to use empty classes to represent very simple concepts without overhead. Some current compilers provide this “empty base class optimization”.

Moreover, “empty base class optimization” is no longer an optional optimization but a mandatory requirement on class layout as of C++11. Go beat up on your compiler vendor if it does not implement it properly.