logo

Home | Products | Research | Projects | History

Semantics Summary of 2/9/03

[Editorial note: WORDS is a work in progress. I am still struggling with vocabulary and longer range plans for the project. Suggestions, corrections and active help on each thought from others, especially those more fluent in nuances of XTM, would be enormously welcome.]

A. Three major types of association seem to be implied by WORDS syntax. Effectively, they show what can be expressed in the WORDS language. Here are the three initial subtypes:

  1. desc: built from the descriptorSet within an NP
  2. prep: built from the prepPhrase within any PP
  3. thot: built from a sentence (simple only in WORDS)
Below, I try to show that by formally defining sub-types of them, in a systematic process, we can make WORDS more closely resemble English. In fact, I think it can be made to approach real English to nearly any degree desired, under costs which seem not unreasonable.

B. All three types need IDs, autogenerated for convenience, I plan to use a unique type-specific code that somehow incorporates a data source and a time stamp. Each association ID thus encodes (probably indirectly) what form was input, by whom, and when. Given D and E, this seems a useful start on building helpful discourse models.

C. All three association types must be reified to get more characteristics. This is not normally done in WORDS, however, unless some grammatical form per se becomes a true semantic focus within an ongoing dialog. The special syntax and mechanics of doing this are to be defined later.

D. Rather, each such association becomes a (typed) occurance of another XTM topic which symbolizes its intended referrent - some thing formally modeled under the Inglish ontology. The occurance of a grammatical form is meant to linguistically refer to it. MODELER has only two choices:

  1. Build this other topic by transforming the given association
  2. Find this other topic by exploring models of the recent dialog
The essence of WORDS semantics is to do both usefully. This is actually the core of what MODELER seeks to accomplish as a WORDS listener.

E. Any topic which includes such occurances can be informally called an "idea", which is the most general term within English for a mental model. General mental models are exactly what descriptive languages such as WORDS, Inglish, or English should be able to create in the minds of their listener. MODELER is therefore a simulated listener who thinks about simulated ideas. As is typical of all listeners, simulated or real, this occurs quite automatically whenever linguistic references are made via NP, PP, or Sentence forms. MODELER can do it, but cannot linguistically explain how the process internally works.

F. WORDS is far simpler than English, especially within its sentences, whch have to tense, multitoken verbals, idioms, conjunctions, or relative clauses. It is rooted in a subset of real English, so it feels fairly natural, like a computerized "baby talk". WORDS exercises all the machinery needed to map grammar forms interactively into Topic Maps, so it is a good place to start. But once it works, we will want to expand MODELER to handle more complex sentences, by adding all the features above. That expansion is really what the Inglish Ontology is all about.


G. Inglish gets more realistic (and makes MODELER more complicated, and powerful) by defining the missing features of sentences and thots. This work focuses formally on on case frame and tense models. Neither can be illustrated using in BNF-like notation, so what follows is a highly truncated introduction only, but it should convey the basic notions of what is involved in Inglish.

H. The first key steps are to expand the (over)simplistic "copverb" mechanism of WORDS to include variations that signal tense, and to conceptually allow the full set of English verbal forms and roots - or at least the specific subset of them that MODELER is expected to handle in its dialogs. In the quasi-syntactci model below, it is obvious that a lot of work is needed, and most of it relates to defining additional verbs:

Inglish predicate forms include infinitives, gerunds, particles, tense:

  • copulative :=: NP having descriptor*
  • having :=: has | have | had | will have | would have been having | ....
  • verbal :=: to see | buy | bought | bought into | will be buying into |....
I. Adding tense is no picnic, but it can be handled for ALL verbs at once by making it a separate role of each thot, with any of a growing set of enumerated values. A similar role in each prep can refine its modeled relationship, and in each desc can refine if and when the cited descriptors applied. All such refinements deal with timing of the cited situation relative to the speaker and to other contextual events, and/or its modality. Such issues are separate from the general expansion of Inglish verb roots, so they can and will be initially handled independently.

J. Sentences in real English convey a "complete thought". Even within a tenseless syntax, Inglish gives WORDS a much richer hierarchy of thot sub-types, able to convey a vast set of typed events, processes, states, changes, and relationships, each applying in a certain kind of situation which involves certain stereotypical types of complements, like the "buyer" and "seller" within the "to purchase" verb. It is at this level that Inglish (and English) differ from WORDS. They include far more detail on such pre-defined thot types. A listener can exploit it to fill in the "slots" of referring dialog occurances with various complements meeting the known semantic constraints.

Inglish sentences grow more normal via verbals, case roles, etc.:

  • sentence :=: copulative | np verbal complement*
  • complement :=: casemarker? NP | PP | relpro? verbal complement*
  • casemarker :=: in | on | by | with | over ....
  • relpro :=: that | which | who | where | when | ....
K. At the syntax level, to help MODELER, Inglish defines conventional English casemarkers for the corresponding NP forms that match expected case frame roles. This takes quite a bit of detailed modeling work, but Lexikos has some tricks that make it less, and plans an open source project for tapping Internet based resources, including volunteers, to help. Properly applied, the combination should allow WORDS to "grow up" into Inglish at a reasonable rate.

L. It seems worth the effort, as with the addition of these thot types, Inglish users (real or simulated) should be able to concisely and accurately model complex situations in MODELER at the rate of one per simple sentence, just as real people do when using real English on one another. That isn't exactly "programming in English", but it is "formal modeling in English", and it should let XTM files be generated with relative ease that model a very wide range of interesting things.



Lexikos Corporation
Boston & Knoxville
Email: Dan@Lexikos.com