Next: 3.2 Code Inheritance and
Up: 3 Customization of Components
Previous: 3 Customization of Components
Recall, that specifying a class invariant means that this invariant has to be preserved by all public methods of a program. Assume a module M with classes C1 to Cm. M does not import any modules, i.e., M is a basic component. Verifying M means to prove the triple above for each public method of M.
Consider a module N which imports M. N contains the classes Cm+1 to Cn. I.e., the resulting program P is a composition of the modules M and N containing the classes C1 to Cn. What triples have to be shown to verify P? Again, we have to prove that every public method of P preserves each invariant of any Ci. For most practical applications, this is equivalent to showing that (1) CM::m preserves INVM, (2) CM::m preserves INVN, (3) CN::m preserves INVM, and (4) CN::m preserves INVN, where CM::m and CN::m denote a public method m of a class declared in module M or N, respectively and INVM and INVN denote the conjunction of all class invariants in module M or N.
Obligation 1 is fulfilled as M is verified. Obligation 4 causes no further problems as it can be solved locally in module N (maybe using some specifications of M). Obligation 2 means to show properties of an imported module. This is not desirable as the imported code may come from a library and, thus, be not accessible to the verifier. Obligation 3 results in a huge amount of proof obligations for complex components as it contains one triple for each class in the import path of the module. So, in large systems, hundreds or thousands of triples have to be proved. In the next two paragraphs, we will discuss possible solutions to the problems caused by obligations 2 and 3.
Sharing Properties describe how objects in memory are shared by different object structures. E.g., the last object of a singly linked list is shared by all other list objects of the list structure. Specification of sharing properties is crucial for verification because, in general, modification of one object changes the abstract value of all structures that share this object. Basic constructs to express sharing properties are reachability of objects or disjointness of object structures.
These constructs are quite coarse grained. [PH95] shows, that more elaborated techniques can be used to specify sharing properties of certain data representations on an abstract level. Thus, we have to study typical storage layouts to develop such techniques. In particular, these techniques should allow us to prove the absence of sharing of certain objects and thus to deduce the preservation of invariants.
Invariance Properties are used to describe side-effects of methods by specifying which parts of the object environment remain unchanged under method execution. They can be specified in two ways: One can enumerate all objects that might be modified (or created or deleted) under method execution in a so-called modifies-clause. I.e., all objects not mentioned have to be left unchanged. An other possibility is to directly specify which parts of the object environment stay untouched. This can be done by relating the pre- and poststate of the method. [PH97b] introduces two relations between object environments stating that (a) all objects reachable from a certain object remain unchanged or (b) all objects reachable from any object of a certain type remain unchanged.
The advantages of the direct technique are: (1) Sharing can be taken into account. In general, modification of one object affects the abstract values of all object structures referencing this object. This can hardly be expressed by modifies-clauses. (2) Relations between the pre- and poststate can be included into the usual pre-post-specification and do not require an extra modifies-clause. This makes the use of invariance properties in correctness proofs easier as they fit into the schema of Hoare-triples. (3) Object modification, creation, and deletion can be handled by the same technique. To exploit these advantages, more sophisticated techniques for relating object environments are needed. In particular, we want to specify side-effects in a very detailed and abstract way.
Specification Methodology might be the most important research topic in this area. To bring specification and verification techniques to practice, we need guidelines for programmers telling which properties of methods have to be specified to enable verification. Most work on specification methodology focuses on the specification of functional behavior. But as we showed in the last two paragraphs, verification of component-based software requires the specification of invariance and sharing properties as well. Programmers usually have detailed knowledge about the algorithms they implement. But they are not really aware of side-effects and the sharing properties of the data structures they use. Thus, we have to develop specification techniques that allow them to intuitively specify these properties.
Not every first-order formula is acceptable as an invariant. E.g., invariants should not make statements about all allocated objects. In most cases, this would lead to unimplementable specifications. Thus, the expressivity of invariants has to be restricted. As such restrictions are semantical properties, they are called semantical constraints. A simple semantical constraint is that the invariant of an object X may only state properties of objects reachable from X.
By putting stronger constraints on invariants, we can achieve that the invariants of new classes cannot be violated by imported methods. E.g., if an invariant only states properties of objects of classes that are declared in the current module (i.e., not imported), this invariant can only be violated by methods of these classes or their subclasses, i.e., not by imported methods.
The development of such semantical constraints will allow one to add new classes to a program and neither have to revisit imported code nor have to prove the preservation of invariants based on the specification of imported methods. Besides the technical aspects, we have to study the influence of such restrictions on programming methodology.
Although it is theoretically possible to prove that every new method preserves the invariant of each imported class, this is not desired because of two reasons: (1) In large systems the proof obligations become too big and, thus, unmanageable. (2) To support encapsulation of implementations, it is desirable to hide implementation-dependent parts of invariants from client modules. In the next two paragraphs, we show how behavioral subtyping can be used to reduce verification effort, and present the idea of semantic module concepts that allow one to drop the proof obligations about semantically private classes.
Enforcing all subtypes to be behavioral subtypes reduces the effort of verification dramatically as it is sufficient to show the preservation of invariants of classes which are leaves of the subtyping tree. In usual programming languages, determining whether a class is a leaf is not possible until the whole program context -- and, thus, the whole subtyping tree -- is known. This is never the case for components. Therefore, we require all superclasses to be abstract. I.e., concrete classes can never be superclasses. Thus, all concrete classes are leaves of the subtyping tree. We do not have to prove the invariant of abstract classes as there is no implementation for that class and, thus, objects of this class will never exist. To sum up, enforcing behavioral subtyping reduces verification effort to proving the preservation of the invariant of all concrete classes. As a consequence of this technique, subtyping and code inheritance have to be discerned conceptually. We discuss this aspect in section 3.2.
In the context of semantical program development, enforcing behavioral subtyping is no further restriction to the programmer: Point (1) of the definition above is required anyway because due to dynamic binding, methods of subtypes can be executed in contexts where methods of the supertype were expected. Thus, they have to behave identically in these contexts. Otherwise, the context could not be verified. Aspect (2) is already implied by the semantics of invariants (see section 2.2). One can formally prove that the semantics presented in section 2.2 is equivalent to the proof obligations sketched in the last paragraph (the proof itself is beyond the scope of this paper and, thus, not presented here).
Semantical privacy can be achieved by enforcing certain syntactical restrictions of the implementation, by putting semantical constraints on invariants. To illustrate this approach, we introduce a predicate P on types. P(T) holds, iff T is concrete and P holds for the type of each attribute of T. The invariant of a type for which P holds can not be affected by program extensions as subtyping is ruled out. This simple syntactical limitation is of course too restrictive as it forbids the use of subtyping. By combining this idea with a simple constraint on the invariant, the situation becomes better: The invariant of T may only state properties of objects which are reachable via attributes whose types fulfill P. This combination enables semantical privacy and a restricted use of subtyping.
Pushing the constraints on invariants further leads to the following notion: The invariant of a class C has only access to the invariants of reachable objects, not to their implementation. In combination with behavioral subtyping, this constraint assures the preservation of the invariant. The disadvantage of this approach is that it leads to an increasing number of types as showed by the following example: Assume, we have a class fraction with two attributes numerator and denominator of type int. The invariant states that the denominator is non-zero. Now we implement a class C which has an attribute a of class fraction. We want to specify that the fraction stored in a is non-negative. Because of the constraint sketched above, this can not be specified in the invariant of C as it is a property of fraction. Thus, we have to build a new class which essentially behaves like fraction but has a stronger invariant. This approach fully supports subtyping.
This discussion reveals two interesting research topics: (1) We need an appropriate mix between syntactical and semantical constraints. Therefore, we have to study implementations of software libraries to find the balance between the (partly contradictory) aims of supporting subtyping, providing semantical private types, keeping the amount of types low, and making the whole approach applicable in practice. (2) As shown in the example above, constraining the invariants may lead to lots of types with different specifications but very similar (or identical) implementations. Thus, we need techniques to share code between classes and derive the correctness of an importing class from the correctness of the exporting class. This is discussed in the next section.
Peter Mueller and Arnd Poetzsch-Heffter