What's in CONTEXT?
MODELER is a framework designed to efficiently
transliterate English grammar forms into isomorphic semantic nets,
whose meaning is formalized by the ISO standard paradigm of
the Topic Map.
This conversion is usually done in bulk, with operator guidance, under
an interactive UI resembling that of a spelling corrector.
An
active topic map called CONTEXT records the created semantic net using
topics and associations, driven by our NLP utilities for discourse
models and self-expanding semantic lexicons. Input paragraphs are in
everyday English, but for two non-trivial requirements, the first of which eases in version 2.0:
- Grammar forms are parenthesized. (Our parser will later automate this.).
- Each word or idiom used must be defined by a lexicon in CONTEXT.
CONTEXT is configured by our fixed base of scripted CONCEPT topics, plus those in your personal extension lexicons. The combination defines a flexible ontology letting MODELER
map any legal WORDS expression into new Topic characteristics. Each expression is a character string - one of two kinds:
-
TERM - any related spelling (root, inflection) for a CONCEPT. A
hashmap of these spellings indexes all loaded lexicons, and it lets
MODELER find the expected CONCEPTS for any spelling in its domain-specific vocabulary:
- Entering an isolated TERM returns a list of all related CONCEPTS
- But in a FORM one gets picked (if necessary by the operator)
-
FORM - a
grammatical list of TERMs and sub-FORMs in parentheses.
Operators and scripts both use FORMS to access CONTEXT. Every
English grammar structure is defined, possibly in a simplified
version. Each FORM type in WORDS simulates its equivalent in
English:
- Noun phrases (NPs) find or build a topic for each subject cited
- Modifying phrases adjust characteristics of these NP topics
- Sentences associate the NP topics using case frame templates
The topics
in CONTEXT hold WORDS scripts that make MODELER flexible and
semi-intelligent. Used as needed, they help it find or build
the one contextual topic which best
represents the referent
of its input expression under our simulated rules of discourse. Two kinds of topic may qualify:
- CONCEPT -
models a TERM sense using part-of-speech codes from our PSI-backed Lexicon,
plus semantic codes which are the 1,000+ categories in Roget's Thesaurus.
Combined, these symbols can uniquely identify any common English word sense:
- All other characteristics must extend these unique semantics
- So too must those of all IMAGES it dynamically instantiates
- A CONCEPT author ensures this by adding scripts and other data
- IMAGE-
models a FORM meaning by building a structure of Topics.
The CONCEPT for its head word guides this process as a dynamic template -
like a kind of scripted blueprint. Each paragraph
of such IMAGES is then charted into XTM or RDF streams, meant for use downstream
by applications, inference engines or intelligent agents:
- In any format, a WORDS semantic net should seem sensible
- If it is not, some CONCEPT is malfunctioning and needs repair
Aided
by public groups working on case frames and conceptual graphs,
Lexikos will adjust MODELER's fixed base of CONCEPTS and help its users
add new ones, for private use or open-source publication.
MODELER's world knowledge, linguistic range and ease of use should
improve steadily as these refinements occur, but at all points the same
simple goal applies: the subject of each IMAGE topic, under the Topic Map
paradigm, should be the same as the intuitive referent, to an English reader,
of the grammar FORM that was interpreted to build it.
If this goal succeeds, then MODELER will understand the FORM as a human
would. That is the formal Q/A test we want it to meet, as often as possible.
|