|
Upper Ontology
This page informally introduces the class schema for all Contexts, based on
the XTM 1.0 Standard from ISO for Topic Maps.
Relevant links are under each section header,
Unlike many ontologies, a Context blends multiple Java object classes into its
in-memory graph structures. This imparts a decidedly high-level, linguistic
character to them and their HTML charts, which human minds may find easier to
grasp and more familiar than a uniform space of RDF triples.
Yet they ARE nearly equivalent. At export time, a Context can be decomposed
into equivalent low-level RDF, which meets the needs of reasoners (for which it
was designed). Meanwhile, humans can browse higher level web-page Topic charts,
and review them as models cast into larger TMAPI
object types, from which all contexts are assembled.
Neither formalism is easy. But Topic Maps may be slightly easier to grasp and
use, as they can be more directly related to the grammar structures of English.
Each Topic - like an English noun phrase - digitally represents one real
or imagined subject. Internally, it depicts the real or imagined Aspects
of such a subject, by collecting typed "slots" defining its various types
of Properties, Association Roles, and Names.
IMPORTANT: a Topic always represents some thing. The other types of
object below may help model subjects, but they properly represent only
their features or associations - nothing which can easily stand alone
as an isolated concept with a distinct and persistent identity.
Look closely, and each Topic is really just a binding point for data that
identifies its subject, onto which one can then add arbitrary models for
Aspects. Especially powerful are associations, which can bidirectionally
link any Topic to a set of other Topics modeled in similar ways.
The overall graph - a Context - holds one or more merged Topic Maps, each
of which was separately built by using this modeling philosophy, or
imported into it by using smart transliterators from RDF files or other
supported exchange formats.
Character Strings offer hugely flexible ways to model the properties of things.
Each Occurrence type is really one open set of such Strings, unique in how it
works; what it means; and the limits and services which exist within its details
One broad and often relevant set of String classes are those standardized for
XSD.
As a group, they populate many XML files, and comprise a good basic set of quantities.
They are far from exhaustive, of course. Consider the ISBN, currency models,
zip codes, phone numbers, industrial product IDs, and the physical attributes
for all material substances. All of these are potentially types of Occurrences.
Because of their huge diversity, such Strings cannot be centrally standardized. But each
type used can be publically and formally defined, and all compliant Contexts will do this.
Typically, they add a URI into each Occurrence type, pointing to a web page documenting it
as a dataType.
The URI string used to do this is itself a built-in primitive type of Occurrence, which
is quite precisely specified by W3C. It gets internal XTM-engine support as a pointer to
some external resource of characters or bits. Generally, XTM standards know this
as one of several types of locator.
All of them indicate where data came from; where various specs are posted; and/or where
references "occur" to the abstract subject of the related Topic.
Historically, early Topic Maps heavily used such Strings to denote "Occurrences". and
the terminology has stuck. Recent XTM specs distinguish locators and dataTypes,
but collectively refer to them as Occurrences.
A Role type, philosophically, is a Topic depicting some named class of
niche in a prototype association. Surprisingly, such named niches may
account for most of the nouns in a common English vocabulary list.
On a less philosophical level, each Role Type resembles a "slot" within frame
theory, or the middle element of an RDF triple, or simply the key to some
RAM-resident hash map holding associated Topics.
Effectively, if each Association type names/models a sterotyped situation,
then each Role type names and models one stereotype niche to be filled and
played by another Topic (descriptively known as the "Role-Player").
Under specs of XTM 1.0 and TMAPI 1.0, each association is logically a
typed hashtable of Topics, accessed by using Role types as its keys.
Another Topic denotes the type of such an association. Its aspects may record
facts applicable to all of its roles, such as contraints or default expectations
on the kinds of Topic-Subject pair able to play each role.
This association model works particularly well for recording the semantics of English
sentences. For more about this specific use case, see Lexikos specs on
CTM, an XTM-based
interpretation of Conceptual Graphs.
A name String may be associated persistently with any topic-subject pair, not
just in this web application, but in any natural-language vocabulary.
Such associations are generally many-to-many types, and each name String
involved may have several variant forms, so this can get very confusing.
The XTM 1.0 Specs try to partially formalize this, but many NLP effects like
roots and stemming get ignored. Future context releases will add them.
Today, each topic in any context has some name set, and each name within it
can in theory be used to find every topic it may signify. The scoping on each
name form, meanwhile, helps to define how/when/where it is meant to be used.
SPECIAL NOTES: Under XTM 1.1, new rules to denote name usage via "typed names"
are coming. They may help reduce the high inate complexity of assigning to
all subjects formal names customized to selected purposes and languages.
The inverse task of finding an intended subject-topic pair for any usage of
a name is often difficult, even when it is surrounded by the context of a
full paragraph. Indeed, this "lexical ambiguity" problem may well be the
single biggest barrier to practical NLP systems,
The partial answer that Lexikos is testing involves limiting the search for
the speaker-intended referent subject to a well-defined context, or at worst,
some small active set of them. This may help in a suitably narrow domain of
discourse, where normal human behavior is to avoid assigning any word
or name to multiple subjects.
|