WORDS MODELER:    


Upper Ontology

This page informally introduces the class schema for all Contexts, based on the XTM 1.0 Standard from ISO for Topic Maps. Relevant links are under each section header,

Unlike many ontologies, a Context blends multiple Java object classes into its in-memory graph structures. This imparts a decidedly high-level, linguistic character to them and their HTML charts, which human minds may find easier to grasp and more familiar than a uniform space of RDF triples.

Yet they ARE nearly equivalent. At export time, a Context can be decomposed into equivalent low-level RDF, which meets the needs of reasoners (for which it was designed). Meanwhile, humans can browse higher level web-page Topic charts, and review them as models cast into larger TMAPI object types, from which all contexts are assembled.

Neither formalism is easy. But Topic Maps may be slightly easier to grasp and use, as they can be more directly related to the grammar structures of English.


Topics

Each Topic - like an English noun phrase - digitally represents one real or imagined subject. Internally, it depicts the real or imagined Aspects of such a subject, by collecting typed "slots" defining its various types of Properties, Association Roles, and Names.

IMPORTANT: a Topic always represents some thing. The other types of object below may help model subjects, but they properly represent only their features or associations - nothing which can easily stand alone as an isolated concept with a distinct and persistent identity.

Look closely, and each Topic is really just a binding point for data that identifies its subject, onto which one can then add arbitrary models for Aspects. Especially powerful are associations, which can bidirectionally link any Topic to a set of other Topics modeled in similar ways.

The overall graph - a Context - holds one or more merged Topic Maps, each of which was separately built by using this modeling philosophy, or imported into it by using smart transliterators from RDF files or other supported exchange formats.


Occurrence

Character Strings offer hugely flexible ways to model the properties of things. Each Occurrence type is really one open set of such Strings, unique in how it works; what it means; and the limits and services which exist within its details

One broad and often relevant set of String classes are those standardized for XSD. As a group, they populate many XML files, and comprise a good basic set of quantities.

They are far from exhaustive, of course. Consider the ISBN, currency models, zip codes, phone numbers, industrial product IDs, and the physical attributes for all material substances. All of these are potentially types of Occurrences.

Because of their huge diversity, such Strings cannot be centrally standardized. But each type used can be publically and formally defined, and all compliant Contexts will do this. Typically, they add a URI into each Occurrence type, pointing to a web page documenting it as a dataType.

The URI string used to do this is itself a built-in primitive type of Occurrence, which is quite precisely specified by W3C. It gets internal XTM-engine support as a pointer to some external resource of characters or bits. Generally, XTM standards know this as one of several types of locator.

All of them indicate where data came from; where various specs are posted; and/or where references "occur" to the abstract subject of the related Topic.

Historically, early Topic Maps heavily used such Strings to denote "Occurrences". and the terminology has stuck. Recent XTM specs distinguish locators and dataTypes, but collectively refer to them as Occurrences.


Role

A Role type, philosophically, is a Topic depicting some named class of niche in a prototype association. Surprisingly, such named niches may account for most of the nouns in a common English vocabulary list.

On a less philosophical level, each Role Type resembles a "slot" within frame theory, or the middle element of an RDF triple, or simply the key to some RAM-resident hash map holding associated Topics.

Effectively, if each Association type names/models a sterotyped situation, then each Role type names and models one stereotype niche to be filled and played by another Topic (descriptively known as the "Role-Player").


Association

Under specs of XTM 1.0 and TMAPI 1.0, each association is logically a typed hashtable of Topics, accessed by using Role types as its keys.

Another Topic denotes the type of such an association. Its aspects may record facts applicable to all of its roles, such as contraints or default expectations on the kinds of Topic-Subject pair able to play each role.

This association model works particularly well for recording the semantics of English sentences. For more about this specific use case, see Lexikos specs on CTM, an XTM-based interpretation of Conceptual Graphs.


Naming

A name String may be associated persistently with any topic-subject pair, not just in this web application, but in any natural-language vocabulary.

Such associations are generally many-to-many types, and each name String involved may have several variant forms, so this can get very confusing. The XTM 1.0 Specs try to partially formalize this, but many NLP effects like roots and stemming get ignored. Future context releases will add them.

Today, each topic in any context has some name set, and each name within it can in theory be used to find every topic it may signify. The scoping on each name form, meanwhile, helps to define how/when/where it is meant to be used.

SPECIAL NOTES: Under XTM 1.1, new rules to denote name usage via "typed names" are coming. They may help reduce the high inate complexity of assigning to all subjects formal names customized to selected purposes and languages.

The inverse task of finding an intended subject-topic pair for any usage of a name is often difficult, even when it is surrounded by the context of a full paragraph. Indeed, this "lexical ambiguity" problem may well be the single biggest barrier to practical NLP systems,

The partial answer that Lexikos is testing involves limiting the search for the speaker-intended referent subject to a well-defined context, or at worst, some small active set of them. This may help in a suitably narrow domain of discourse, where normal human behavior is to avoid assigning any word or name to multiple subjects.



Confidential; all rights reserved © 2005 Lexikos Corporation
Portland, ME & Knoxville, TN Email: Dan@Lexikos.com