Semantics Summary of 2/9/03
[Editorial note: WORDS is a work in progress. I am still struggling with
vocabulary and longer range plans for the project. Suggestions, corrections
and active help on each thought from others, especially those more fluent
in nuances of XTM, would be enormously welcome.]
A. Three major types of association seem to be implied by
WORDS syntax. Effectively, they show what can be expressed in the WORDS
language. Here are the three initial subtypes:
- desc: built from the descriptorSet within an NP
- prep: built from the prepPhrase within any PP
- thot: built from a sentence (simple only in WORDS)
Below, I try to show that by formally defining sub-types of them, in a systematic
process, we can make WORDS more closely resemble English. In fact, I think it can
be made to approach real English to nearly any degree desired, under costs
which seem not unreasonable.
B. All three types need IDs, autogenerated for convenience, I plan to
use a unique type-specific code that somehow incorporates a data source
and a time stamp. Each association ID thus encodes (probably indirectly)
what form was input, by whom, and when. Given D and E, this seems a useful
start on building helpful discourse models.
C. All three association types must be reified to get more characteristics.
This is not normally done in WORDS, however, unless some grammatical form
per se becomes a true semantic focus within an ongoing dialog.
The special syntax and mechanics of doing this are to be defined later.
D. Rather, each such association becomes a (typed) occurance of another
XTM topic which symbolizes its intended referrent - some thing formally
modeled under the Inglish ontology. The occurance of a grammatical form is
meant to linguistically refer to it. MODELER has only two choices:
- Build this other topic by transforming the given association
- Find this other topic by exploring models of the recent dialog
The essence of WORDS semantics is to do both usefully. This is actually the core of
what MODELER seeks to accomplish as a WORDS listener.
E. Any topic which includes such occurances can be informally called an "idea",
which is the most general term within English for a mental model. General mental
models are exactly what descriptive languages such as WORDS, Inglish, or
English should be able to create in the minds of their listener. MODELER is
therefore a simulated listener who thinks about simulated ideas. As is typical
of all listeners, simulated or real, this occurs quite automatically whenever
linguistic references are made via NP, PP, or Sentence forms. MODELER can do
it, but cannot linguistically explain how the process internally works.
F. WORDS is far simpler than English, especially within its sentences, whch have to
tense, multitoken verbals, idioms, conjunctions, or relative clauses. It is rooted
in a subset of real English, so it feels fairly natural, like a computerized "baby
talk". WORDS exercises all the machinery needed to map grammar forms interactively
into Topic Maps, so it is a good place to start. But once it works, we will want to
expand MODELER to handle more complex sentences, by adding all the features above.
That expansion is really what the Inglish Ontology is all about.
G. Inglish gets more realistic (and makes MODELER more complicated, and powerful) by
defining the missing features of sentences and thots. This work focuses formally on
on case frame and tense models. Neither can be illustrated using in BNF-like notation,
so what follows is a highly truncated introduction only, but it should convey the basic
notions of what is involved in Inglish.
H. The first key steps are to expand the (over)simplistic "copverb" mechanism of WORDS to
include variations that signal tense, and to conceptually allow the full set of
English verbal forms and roots - or at least the specific subset of them that MODELER is
expected to handle in its dialogs. In the quasi-syntactci model below, it is obvious
that a lot of work is needed, and most of it relates to defining additional verbs:
Inglish predicate forms include infinitives, gerunds, particles, tense:
- copulative :=: NP having descriptor*
- having :=: has | have | had | will have | would have been having | ....
- verbal :=: to see | buy | bought | bought into | will be buying into |....
I. Adding tense is no picnic, but it can be handled for ALL verbs at once by making it
a separate role of each thot, with any of a growing set of enumerated values. A similar
role in each prep can refine its modeled relationship, and in each desc can refine if and
when the cited descriptors applied. All such refinements deal with timing of the cited
situation relative to the speaker and to other contextual events, and/or its modality.
Such issues are separate from the general expansion of Inglish verb roots, so they
can and will be initially handled independently.
J. Sentences in real English convey a "complete thought". Even within a tenseless
syntax, Inglish gives WORDS a much richer hierarchy of thot sub-types, able to convey
a vast set of typed events, processes, states, changes, and relationships, each
applying in a certain kind of situation which involves certain stereotypical types
of complements, like the "buyer" and "seller" within the "to purchase" verb. It is
at this level that Inglish (and English) differ from WORDS. They include far more
detail on such pre-defined thot types. A listener can exploit it to fill in the
"slots" of referring dialog occurances with various complements meeting the known
semantic constraints.
Inglish sentences grow more normal via verbals, case roles, etc.:
- sentence :=: copulative | np verbal complement*
- complement :=: casemarker? NP | PP | relpro? verbal complement*
- casemarker :=: in | on | by | with | over ....
- relpro :=: that | which | who | where | when | ....
K. At the syntax level, to help MODELER, Inglish defines conventional English casemarkers
for the corresponding NP forms that match expected case frame roles. This takes quite a bit
of detailed modeling work, but Lexikos has some tricks that make it less, and plans an open
source project for tapping Internet based resources, including volunteers, to help. Properly
applied, the combination should allow WORDS to "grow up" into Inglish at a reasonable rate.
L. It seems worth the effort, as with the addition of these thot types, Inglish users (real or
simulated) should be able to concisely and accurately model complex situations in MODELER at
the rate of one per simple sentence, just as real people do when using real English on one another.
That isn't exactly "programming in English", but it is "formal modeling in English", and it should
let XTM files be generated with relative ease that model a very wide range of interesting things.
|