Constraint Validation

By Dan Corwin, draft of May 14, 2003
 
This subsystem supports the WORDS scripting and command language.  It validates incremental run-time mutations of a topic map already validated by TMCL or a comparable mechanism, with special focus on the Strings in occurrences and the RolePlayer topics in associations.

In both cases, ResourceData embedded in related topics holds annotated pointers to symbolic constraints on these String and Topic values.   Similar mechanisms can declare datatypes, naming patterns, cardinality, and behavioral options that keep many TM elements immutable.

Three main elements comprise this design, which logically extends a TM engine:

  • a PSI for a well-defined array of constraint types, each with a latest-release status
  • an API for a Java package implementing supported constraint types in the array
  • valid TMs whose topics may hold packed lists pointing to related constraints
In practice, their development must be iterative.  But it is easier to introduce these designs in sequential steps, so that is what appears below:
 

Step 1.  Define at a PSI an array of planned Constraint Types

This might be based on the union of constraints found in several diverse specs:
  1. Those defined in OWL-lite, an emerging industry-standard baseline
  2. This systematic outline of suggested Use Cases for TMCL
  3. Time-tested standbys suggested by (e.g) these Frame-Slot specs
Suggesting candidates is a start.  Next we must select some small subset to do first, and a larger one proposed for a first release.  This takes non-trivial tech writing, case-by-case analysis, debates, etc.  But fortunately, the design options seem fairly flexible here. 

This design thus envisions Constraints[], a published array, yet tries to stay insensitive to its details.  It imposes only this documentation rule, so indices into the array can link to predicates at steps 2-6: once a constraint type is posted, it cannot be deleted or moved.

At some point, the PSI array (at least its lower elements) must drive working code or the whole design is pointless.  Specific predicates to be built early are discussed below.
 

Step 2.  Define a matching array of Member-validation predicates

Many of the most interesting predicates will (can) have a signature like this example, where ## is the index into Constraints[], and REQ is a (packed) String which further describes (##-specific)  requirements which must be satisfied by the mix of remaining arguments: 

  predicate_##(String REQ,                // encoded requirements 
              Association assoc,         // the one being tested 
              Topic roleSpec,            // the one to be (re)set
              Topic player)              // the role's player
     throws Exception

Each such predicate must decide (by using REQ) if the given player is acceptable as a value for roleSpec within the given assoc.  If so, it need do nothing.  If not, it must throw a Java exception that complains about the problem and includes specific details via ## and REQ.

Each kind of constraint (##) gets its own separate predicate, plus fine-tuning (REQ).that varies with topics.  Many distinct problems could cause exceptions -- wrong cardinality, wrong domain, wrong range, etc..  Each constraint type gets unique logic, but one method signature.  This lets them be built faster (factory-like), then called through one API.
 

Step 3.  Define a matching array of String-validation predicates

Many elements of Constraints[] will involve Strings, not associations.  They can be handled by other predicates of this similar signature, focused on occurrences

  predicate_##(String REQ,                // encoded requirements 
              Topic topic,               // that being modeled 
              Topic propType,            // occurrence class
              String resource)           // the data, URI, etc 
     throws Exception

Each such predicate must decide by using REQ if the given resource is acceptable as a value for propType within the given topic.  If so, it need do nothing.  If not, if must throw a Java exception that complains about the problem and gives specific details via ## and REQ.  Many of these predicates will involve validating XML schema types of resources.
 

Step 4.  Define Extra Predicate Signatures for Multiple values

Many predicates will need extra signatures to address multiple players or resources.  This is not hard, but does involve several issues.  Ideally, it could merely require substitution of a Set in the final argument of both signatures above, plus an added internal iteration loop. 

If the cardinality for a given roleSpec or propType is multiple, we might always require the multiple-value signature to be called.  Or maybe not.  Such rules may take some debate.

A bigger issue for multiple values is the nature of their Collection.  Sometimes it helps to preserve order, so perhaps a List or Array version of the signature would also be needed.

Such details are not show stoppers, but they need to be resolved cleanly and consistently. Consensus may be hard to reach if some constraint - like a List argument - implies related changes to TM engine code, or to other basic resources of the TM community.
 

Step 5.  Define Predicates to handle Scope, Typing, Names

WORDS handles scopes and typing in its interpreter code.  End users and scripts see them as immutable, so no constraints and predicates are required.  (This could be relaxed later, after TMCL better defines how they might best be handled in general.)

Meanwhile, TMCL can help out by validating a WORDS topic map as or before it loads into an engine, and by establishing a more general set of design patterns. To implement them later, we conceptually reserve special occurrences as storage space in each topic for related constraint references and their REQ data, plus similar character string codes.

BaseNames and variantNames are special cases, mutable by WORDS users. We could define new predicates to validate names, but an early shortcut may be to reuse the signature of step 3, citing propTypes for reserved "baseName" and "variantName" Topics that signal the need to retrieve a related constraint String from an atypical place in the TM.   
 

Step 6.  Assembling the above into a WORDS Constraint Language

Under this design, a simple spec can emerge for a constraint language:  define its core form as a comma-separated list of sub-Strings stored in occurrences of closely related topics:

Syntax For each Topic used as a roleSpec or a propType (etc.) that should be restricted by the constraint language, we need a list of these pairs:
  • a constraint type applying, via its ## - a locality-insensitive index 
  • a optional REQ String that fine tunes related logic in predicate ## 
Since it plus Constraints[] fully defines all restrictions on usage for that Topic, such a comma-separated list really IS the core constraint language.

Semantics By calling the appropriate signature below for validate(..) at appropriate times, one can test all related constraints by triggering this internal logic: 
  • retrieve the special ResourceData, if any, for its first argument
  • if none, return.  No stored constraint refs means no checking
  • otherwise, separate it into [##  REQ] pairs by using the commas
  • call each indicated predicate in order, passing along its REQ
If none of the called predicates throws an exception, all constraints were met. Otherwise, the caller gets back an explanation of the (first) unmet constraint.
 


Adding a WORDS Constraint Language can be Straightforward

Here are the minimum new public methods that a WORDS engine has to support:  Note that with some iterative driving logic, such calls might be applied to an entire topic map, or any specific parts,  to validate all encountered Occurrences and Members:

  validate(Topic propType, Topic topic, String resource)
   throws Exception

  validate(Topic roleSpec, Association assoc, Topic player)
   throws Exception

 [add similar signatures for multiple values after step 4]

Naturally, these functions get internally more complex as the total size and complexity of Constraints[]  grows, due to extra hidden predicates.  But that effect should stay invisible to callers of the above, and be relatively easy to handle with incremental method additions.

This Design and its Simplicity have Practical Benefits

The design concept is basically an organized bag of semi-independent constraint checkers.  Once we clarify this design pattern, constraint types might be defined and built in parallel in several  independent sections to speed up a first release.  In addition:
  1. Each constraint list is compactly stored inside a related Topic as normal ResourceData, and moves freely about within it, even in XTM files.
  2. Each such String seems widely portable among natural languages, as well as insensitive to the programming language used to build the TM engine
  3. It takes no new TM engine functions to store or recall constraints (but higher level tools to help ontology authors compose them would clearly be nice).
  4. Each such constraint String can itself be validated by the first of the public signatures above, making all such future tools (by design) easier to write and test.
  5. A limited set of most-urgent, most-needed Constraints[] can be implemented and released quickly after a relatively modest initial R&D effort.
  6. The top level Constraints[] array can then grow indefinitely over time to handle new and expanded versions, with incremental code changes.