Classes and Objects
What is a class?
The fundamental building block of OO software.
A class
defines a data type, much like a struct
would be in C. In a computer science sense, a type consists of both
a set of states and a set of operations which transition between those states. Thus int
is a type because it has
both a set of states and it has operations like i + j
or i++
, etc. In exactly the same way, a class
provides a
set of (usually public
) operations, and a set of (usually non-public
) data bits representing the abstract values
that instances of the type can have.
You can imagine that int
is a class
that has member functions called operator++
, etc. (int
isn’t really a
class
, but the basic analogy is this: a class
is a type, much like int
is a type.)
Note: a C programmer can think of a class
as a C struct
whose members default to private
. But if that’s all you
think of a class
, then you probably need to experience a personal paradigm shift.
What is an object?
A region of storage with associated semantics.
After the declaration int i;
we say that “i
is an object of type int
.” In OO/C++, “object” usually means “an
instance of a class.” Thus a class defines the behavior of possibly many objects (instances).
When is an interface “good”?
When it provides a simplified view of a chunk of software, and it is expressed in the vocabulary of a user (where a “chunk” is normally a class or a tight group of classes, and a “user” is another developer rather than the ultimate customer).
- The “simplified view” means unnecessary details are intentionally hidden. This reduces the user’s defect-rate.
- The “vocabulary of users” means users don’t need to learn a new set of words and concepts. This reduces the user’s learning curve.
What is encapsulation?
Preventing unauthorized access to some piece of information or functionality.
The key money-saving insight is to separate the volatile part of some chunk of software from the stable part. Encapsulation puts a firewall around the chunk, which prevents other chunks from accessing the volatile parts; other chunks can only access the stable parts. This prevents the other chunks from breaking if (when!) the volatile parts are changed. In context of OO software, a “chunk” is normally a class or a tight group of classes.
The “volatile parts” are the implementation details. If the chunk is a single class, the volatile part is normally
encapsulated using the private
and/or protected
keywords. If the chunk is a tight group of
classes, encapsulation can be used to deny access to entire classes in that group.
Inheritance can also be used as a form of encapsulation.
The “stable parts” are the interfaces. A good interface provides a simplified view in the vocabulary of a
user, and is designed from the outside-in (here a “user” means another
developer, not the end-user who buys the completed application). If the chunk is a single class, the interface is
simply the class’s public
member functions and friend
functions. If the chunk is a tight group of
classes, the interface can include several of the classes in the chunk.
Designing a clean interface and separating that interface from its implementation merely allows users to use the interface. But encapsulating (putting “in a capsule”) the implementation forces users to use the interface.
How does C++ help with the tradeoff of safety vs. usability?
In C, encapsulation was accomplished by making things static
in a compilation unit or module. This
prevented another module from accessing the static
stuff. (By the way, static
data at file-scope is now deprecated
in C++: don’t do that.)
Unfortunately this approach doesn’t support multiple instances of the data, since there is no direct support for making
multiple instances of a module’s static
data. If multiple instances were needed in C, programmers typically used a
struct
. But unfortunately C struct
s don’t support encapsulation. This exacerbates the tradeoff
between safety (information hiding) and usability (multiple instances).
In C++, you can have both multiple instances and encapsulation via a class. The public
part of a class contains the
class’s interface, which normally consists of the class’s public
member functions and its friend
functions. The private
and/or protected
parts of a class contain the class’s implementation, which
is typically where the data lives.
The end result is like an “encapsulated struct
.” This reduces the tradeoff between safety (information hiding) and
usability (multiple instances).
How can I prevent other programmers from violating encapsulation by seeing the private
parts of my class?
Not worth the effort — encapsulation is for code, not people.
It doesn’t violate encapsulation for a programmer to see the private
and/or protected
parts of
your class, so long as they don’t write code that somehow depends on what they saw. In other words, encapsulation
doesn’t prevent people from knowing about the inside of a class; it prevents the code they write from becoming
dependent on the insides of the class. Your company doesn’t have to pay a “maintenance cost” to maintain the gray matter
between your ears; but it does have to pay a maintenance cost to maintain the code that comes out of your finger tips.
What you know as a person doesn’t increase maintenance cost, provided the code you write depends on the interface rather
than the implementation.
Besides, this is rarely if ever a problem. I don’t know any programmers who have intentionally tried to access the
private
parts of a class. “My recommendation in such cases would be to change the programmer, not the code” [James
Kanze; used with permission].
Can a method directly access the non-public
members of another instance of its class?
Yes.
The name this
is not special. Access is granted or denied based on the class of the reference/pointer/object, not
based on the name of the reference/pointer/object. (See below for the fine print.)
The fact that C++ allows a class’ methods and friends to access the non-public
parts of all its objects, not just
the this
object, seems at first to weaken encapsulation. However the opposite is true: this rule preserves
encapsulation. Here’s why.
Without this rule, most non-public
members would need a public
get method, because many classes have at least one
method or friend that takes an explicit argument (i.e., an argument not called this
) of its own class.
Huh? (you ask). Let’s kill the mumbo jumbo and work out an example:
Consider assignment operator Foo::operator=(const Foo& x)
. This assignment operator will probably change the data
members in the left-hand argument, *this
, based on the data members in the right-hand argument, x
. Without the
C++ rule being discussed here, the only way for that assignment operator to access the non-public
members of x
would be for class Foo
to provide a public
get method for every non-public
datum. That would suck bigtime.
(NB: “suck bigtime” is a precise, sophisticated, technical term; and I am writing this on April 1.)
The assignment operator isn’t the only one that would weaken encapsulation were it not for this rule. Here is a partial(!) list of others:
- Copy constructor.
- Comparison operators:
==
,!=
,<=
,<
,>=
,>
. - Binary arithmetic operators:
x+y
,x-y
,x*y
,x/y
,x%y
. - Binary bitwise operators:
x^y
,x&y
,x|y
. - Static methods that accepts an instance of the class as a parameter.
- Static methods that creates/manipulates an instance of the class.
- etc.
Conclusion: encapsulation would be shredded without this beneficial rule: most non-public
members of most classes
would end up having a public
get method.
The Fine Print: There is another rule that is related to the above: methods and friends of a derived class can
access the protected
base class members of any of its own objects (any objects of its class or any derived class of
its class), but not others. Since that is hopelessly opaque, here’s an example: suppose classes D1
and D2
inherit
directly from class B
, and base class B
has protected
member x
. The compiler will let D1
’s members and friends
directly access the x
member of any object it knows to be at least a D1
, such as via a D1*
pointer, a D1&
reference, a D1
object, etc. However the compiler will give a compile-time error if a D1
member or friend tries to
directly access the x
member of anything it does not know is at least a D1
, such as via a B*
pointer, a B&
reference, a B
object, a D2*
pointer, a D2&
reference, a D2
object, etc. By way of (imperfect!!) analogy, you
are allowed to pick your own pockets, but you are not allowed to pick your father’s pockets nor your brother’s pockets.
Is Encapsulation a Security device?
No.
Encapsulation !=
security.
Encapsulation prevents mistakes, not espionage.
What’s the difference between the keywords struct
and class
?
The members and base classes of a struct
are public
by default, while in class
, they default to private
. Note:
you should make your base classes explicitly public
, private
, or protected
, rather than relying on the defaults.
struct
and class
are otherwise functionally equivalent.
Enough of that squeaky clean techno talk. Emotionally, most developers make a strong distinction between a class
and a
struct
. A struct
simply feels like an open pile of bits with very little in the way of encapsulation or
functionality. A class
feels like a living and responsible member of society with intelligent services, a strong
encapsulation barrier, and a well defined interface. Since that’s the connotation most people already have, you should
probably use the struct
keyword if you have a class that has very few methods and has public
data (such things do
exist in well designed systems!), but otherwise you should probably use the class
keyword.
How do I define an in-class constant?
If you want a constant that you can use in a compile time constant expression, say as an array bound, use constexpr
if your compiler supports that C++11 feature, otherwise you have two other choices:
class X {
constexpr int c1 = 42; // preferred
static const int c2 = 7;
enum { c3 = 19 };
array<char,c1> v1;
array<char,c2> v2;
array<char,c3> v3;
// ...
};
You have more flexibility if the constant isn’t needed for use in a compile time constant expression:
class Z {
static char* p; // initialize in definition
const int i; // initialize in constructor
public:
Z(int ii) :i(ii) { }
};
char* Z::p = "hello, there";
You can bind a reference to a static data member, or take its address, if (and only if) it has an out-of-class definition:
class AE {
// ...
public:
static const int c6 = 7;
static const int c7 = 31;
};
const int AE::c7; // definition
void byref(const int&);
int f()
{
byref(AE::c6); // error: c6 not an lvalue
byref(AE::c7); // ok
const int* p1 = &AE::c6; // error: c6 not an lvalue
const int* p2 = &AE::c7; // ok
// ...
}
Why do I have to put the data in my class declarations?
You don’t. If you don’t want data in an interface, don’t put it in the class that defines the interface. Put it in derived classes instead. See, Why do my compiles take so long?.
Sometimes, you do want to have representation data in a class. Consider class complex
:
template<class Scalar> class complex {
public:
complex() : re(0), im(0) { }
complex(Scalar r) : re(r), im(0) { }
complex(Scalar r, Scalar i) : re(r), im(i) { }
// ...
complex& operator+=(const complex& a)
{ re+=a.re; im+=a.im; return *this; }
// ...
private:
Scalar re, im;
};
This type is designed to be used much as a built-in type and the representation is needed in the declaration to make it possible to create genuinely local objects (i.e. objects that are allocated on the stack and not on a heap) and to ensure proper inlining of simple operations. Genuinely local objects and inlining is necessary to get the performance of complex close to what is provided in languages with a built-in complex type.
How are C++ objects laid out in memory?
Like C, C++ doesn’t define layouts, just semantic constraints that must be met. Therefore different implementations do things differently. One good explanation is in a book that is otherwise outdated and doesn’t describe any current C++ implementation: The Annotated C++ Reference Manual (usually called the ARM). It has diagrams of key layout examples. There is a very brief explanation in Chapter 2 of TC++PL3.
Basically, C++ constructs objects simply by concatenating sub objects. Thus
struct A { int a,b; };
is represented by two int
s next to each other, and
struct B : A { int c; };
is represented by an A
followed by an int
; that is, by three int
s next to each other.
Virtual functions are typically implemented by adding a pointer (the “vptr”) to each object of a class with virtual functions. This pointer points to the appropriate table of functions (the “vtbl”). Each class has its own vtbl shared by all objects of that class.
Why is the size of an empty class not zero?
To ensure that the addresses of two different objects will be different. For the same reason, new
always returns pointers to distinct objects. Consider:
class Empty { };
void f()
{
Empty a, b;
if (&a == &b) cout << "impossible: report error to compiler supplier";
Empty* p1 = new Empty;
Empty* p2 = new Empty;
if (p1 == p2) cout << "impossible: report error to compiler supplier";
}
There is an interesting rule that says that an empty base class need not be represented by a separate byte:
struct X : Empty {
int a;
// ...
};
void f(X* p)
{
void* p1 = p;
void* p2 = &p->a;
if (p1 == p2) cout << "nice: good optimizer";
}
This optimization is safe and can be most useful. It allows a programmer to use empty classes to represent very simple concepts without overhead. Some current compilers provide this “empty base class optimization”.
Moreover, “empty base class optimization” is no longer an optional optimization but a mandatory requirement on class layout as of C++11. Go beat up on your compiler vendor if it does not implement it properly.