logo

Home | Products | Research | Projects | History


NLP + XTM + FOL: How MODELER Unifies Them

The XTM and NLP communities are both small, insular, and good at what they do.  If they swapped ideas on knowledge representation (a singular point of overlap), many goals of both camps would advance. But terms, tools and paradigms differ, so few people yet realize how usefully these ideas blend:
  • the logical form - (output format from NLP)  This is a formal-language equivalent to some natural language input sentence - one which "says the same thing" with less ambiguity, often in first order logic (FOL).

  • the association template - (input format for topic maps)  A constrained pattern of roles one can instantiate to formally model assertions.  Standards for templates are still lacking, but many examples exist.

WORDS scripts fill both roles with one notation - nested ASCII expressions built from English grammar forms.  When interpreted, they expand into topic structures in CONTEXT - any WORDS-scriptable TM engine.

MODELER represents in CONTEXT all referents of all phrases and clauses in any given English paragraph  - with minimal loss and distortions  It does not mine English text as much as reformat it, seeking to expose:
  1. each referent (subject) mentioned or implied by the top-level inputs
  2. each (scoped) assertion made or asked in inputs about that topic
  3. the inferred speech acts of the paragraph's author in creating it
CONTEXT gets expanded by such data. Since it was acquired linguistically, the related process simulates human language understanding.

In MODELER this process ends when it returns to its client a chart of the new data, written in XTM, RDF, CGIF or any exchange notation which CONTEXT and/or MODELER can export.

Client software then assumes responsibility for the chart, and must react to its content as its own rules, design goals and operational logic dictate.

MODELER (or a TM engine) may offer utilities to simplify such programming, but clients must decide how/if they are used.  More on this "reacting" process is said later, after a short detour to explore the benefit(s) of charts.

The XML Chart Exposes a Paragraph Analysis

Philosophically, our main goal for MODELER's understanding process is to help client code accurately convert any English paragraph from a string of ASCII letters into a chart that exposes what it says.

Charts involve case frames and speech acts, and resemble Conceptual Graphs, a sugar-coated FOL notation. But they start out as flat ASCII paragraph strings. To become a chart, they must pass through two intermediate conversions: first into WORDS scripts (to clarify syntax), then into constrained topics (to expose contextual meaning).

When MODELER interprets WORDS scripts, it does not use their syntax alone.  It also makes queries of CONTEXT itself (and of the scripted lexicons inside it) to cull out several types of ambiguity logic cannot accept, including:
  • Lexical - which concept is the intended sense of each input term?
  • Referential - what topic is meant by a pronoun or definite phrase?
  • Elision - what topic(s) NOT in the input should be assumed present?
  • Behavioral - what reaction(s) did a paragraph author hope to elicit?
NLP can do much of this disambiguation with syntax and lexicons even before a WORDS script is composed.  But some must be deferred to run time, where heuristic rules on recently cited topics and queries to the client's ontologies provide the needed answers.

It is in these semantic areas that Topic Map models seem especially useful, as they are expressive enough to model nearly anything, robust enough to support the large-scale ontologies needed for language, and flexible to store embedded scripts that add concept-specific processing power to ontologies.

CONTEXT:  Modeling a Domain of Discourse

Using XTM at ontology levels lets English word semantics be modeled not in FOL assertions, but within the media of XML files, TM schema, and scripts, which most programmers find far simpler.

And the rest of CONTEXT gets simplified too, as scripted topics can now model discourse history; the speaker's dialect, biases, and known goals; expectations and goals of his audience; recent guesses on intended word meanings; etc. Such linguistic effects would be very hard to model in FOL even if you knew it fluently.

No TM-engine comes with such linguistic-I/O modules pre-installed, but all engines can merge in portable XTM files to define them.  Like any custom ontology, they will extend the basic TM paradigm, and should generally get loaded into CONTEXT at boot time, along with lexikons modeling vocabulary.

Our WORDS ontology provides default models for all such linguistic infrastructure. You can ignore it, exploit it, or add and use customized extensions. Your main goal, however, is to add companion ontologies that model your domain of interest, so MODELER has something to discuss besides its own linguistic infrastructure.

Domain modeling should be no burden, as doing it more easily is probably what led you to MODELER in the first place. The main constraint we add is that IF you want to refer to any subject in WORDS scripts, its type must have regular English spellings and a part of speech in a WORDS lexikon. MODELER includes nice tools to help, which make this extremely easy.

MODELER's scripted topic functionality also lets such lexikon-based concepts call WORDS scripts or modeled applications at appropriate times.  This lets expected behavior, not merely expected topics, be pre-defined in lexikons at boot time.

Most common UI and IT programming tasks can be fully handled using such techniques, which take minimal R&D resources because WORDS scripts are easy to write.  If a higher level simulation of intelligence is needed, MODELER can handle it too, but the R&D needed on scripted lexikons for infrastructure will increase, as support for various AI paradigms will then be needed.

CONTEXT:  What Else Might be Required?

Many visions exist that try to answer that question, but these examples are especially worth reading.  Although neither simple nor brief, they do illustrate the types of linguistic infrastructure models that may soon help establish the state-of-the-art within advanced WORDS lexikons:
Intelligent Agents, per John Sowa, must handle messages in several languages from sources both internal and external.  Messages then compete for limited resources and attention in a generalized problem solver.  This very readable paper unifies ideas from AI research that have passed the test of time, and it explains well how IA software might react to (versus produce) charts of sentences.  It also shows  how the IA's reacting machinery gets conceptually re-used by its understanding subsystems, which have to run first.

Situational Awareness, a PDF by M. R Endsly, outlines the contextual mental states of humans, especially those involved with teams, crises or complex vehicles, all with an eye toward how their awareness of the changing state of situations can be augmented by IA assistants.  Its quantitative assessments of the causes for errors are of special interest.  So are the diagrams showing decision making as a distinct back-end process, which fails unless accurate situational awareness is first established by independent understanding processes.

We live in an age where visions of intelligent machines are no longer academic, and where AI software once aimed at supercomputers can now be cobbled together on any PC from open source Java tools and topic map lexicons that control them. To get involved in building either, please write me.

Dan Corwin, CTO


Lexikos Corporation
Boston & Knoxville
Email: Dan@Lexikos.com