CS 4448 - Fall 1998
Object-Oriented Programming and Design
Talk 12.3
By
Elena Teverovskaia

Encapsulation is a red herring ('encapsulation' no longer appropriate terminology)

By

Date, C.J.

Introduction: The term 'encapsulation' and the focus on what it describes, when a type in question has no visible components, is misleading and not very useful. Instead, the term 'scalar' captures much better the idea that encapsulation tries to express. Other than this nomenclature problem, there are other problems with encapsulation. The concept directly contradicts the need to perform ad hoc queries. Also, often types are referred to as encapsulated when they are not, nor would it be beneficial if they were. Just because data types are not encapsulated does not mean that any information is lost. Because of these points, the term 'encapsulation' should be avoided entirely.
How this article is organized: "Encapsulation [means each type has] a set of [operations and] a representation ... that is allocated for each of its instances. This representation is used to store the state of the object. Only the methods implementing operations for the objects are allowed to access the representation, thereby making it possible to change the representation without disturbing the rest of the system. Only the methods would need to be recoded. "["Fundamentals of Object-Oriented Databases."
"Encapsulation refers to the concept of including processing or behavior with the object instances defined by the class. Encapsulation allows code and data to be packaged together."[The Object Database Handbook] Abstraction, information hiding, and encapsulation are the three concepts that appear during the analysis and design of a system.

Some writers seem to think the concept refers specifically to the physical bundling, or "packaging," of data representation definitions and operator definitions.

But it seems to me that to interpret the term in this way is to mix model and implementation considerations. The user shouldn't care, and shouldn't need to care, whether or not code and data are "packaged together! "Thus, it's my belief that--from the user's point of view, at least, which is to say from the point of view of the model--encapsulation simply means what I said before: namely, that the data in question has no user-visible components and can be operated upon only by means of the pertinent operators.

BUT WHAT ABOUT AD HOC QUERIES?

As you might already know, the objective of encapsulation conflicts somewhat with the need to be able to perform ad hoc queries. After all, encapsulation means data can be accessed only via predefined operators, while an ad hoc query means, more or less by definition, that access is required in ways that can't have been predefined. For example, suppose we have a data type POINT; suppose we also have a (predefined) operator to "get"--that is, read or retrieve--the X coordinate of any given point, but no analogous operator to get the corresponding Y coordinate. Then even the following simple queries, and many others like them, obviously can't be handled:
* Get the Y coordinate of point P
* Get all points on the X axis
* Get all points with Y coordinate less than five.
In fact, they can't even be stated.
Solution: define operators that expose some possible representation for instances of that type. THE_X and THE_Y effectively expose a possible representation - namely, Cartesian coordinates X and Y - for points, thereby making it possible to perform ad hoc queries involving points. This fact doesn't mean that points are actually represented by Cartesian coordinates inside the system. This method does not violate data independence.

WE DON'T ALWAYS WANT ENCAPSULATION

Some types are definitely not encapsulated: certain "generated" types, such as ARRAY, LIST, TUPLE, and RELATION VAR POINT...RELATION{X..., Y...} user-visible components: the attributes X and Y. => ad hoc queries
Base types vs composite types. C++ takes intermediate approach.

SCALAR vs. NONSCALAR TYPES

Scalar as a generic way of referring to "simple" types (POINT, LENGTH, AREA, LINE, and so on--possibly even types like INTEGER, if the system doesn't provide them as built-in types)
The reasons to chose the term scalar:
* It was already available. (It's been used with the meaning we had in mind for many years in the programming languages world.)
* It did seem to be the obviously correct term to contrast with ones such as tuple and relation (and array and list and all the rest).
And I now observe that our term scalar means exactly the same thing as encapsulated.


Copyright © University of Colorado. All rights reserved.
Revised: November 17, 1998