Theories Behind WORDS
Its Basic Operation
Core Design Premises
Useful Modeling Paradigms
Open Source Business Model
Basic Operations
Making WORDS lexicons executable is not hard. In fact, once we decided
to model bulk words and their meanings as Java objects delegating to Topics, it became hard to keep them from
seeming executable. And at that point, adding topic-embedded scripts seemed obvious.
The (CONTEXT)
module into which they load is really just a JVM with servlet support and a half a dozen well-built web apps for managing bulk lexical data, ontologies, scripts, etc.. Chief among these are:
-
A Topic Map storage engine able to import metadata,
then query, update or export merged results under advanced APIs
-
Display code for loaded TMs. Optopia's Omnigator
is one popular free example, but you can add others of any design you like
-
Our interpreter, MODELER, whose discourse model and flexible scripts (embedded, batched or typed) can readily drive
all of the above, PLUS
-
Any other Java code or web apps you wish, provided the available service calls are
documented for scripts to invoke
For error-free English transcription, a human operator is generally required to
provide inputs, fix misspellings, and guide processing, but in batch mode, text
streams and heuristics may be substituted.
Core Design Premises
Several exist, easiest to concisely explain by using a 3-way analogy:
LISP functions WORDS scripts Conceptual Graphs
------------------- :=: ------------------- :=: -------------------------
Frame Theory Topic Charting Predicate Calculus
Each of the three columns shows some kind of knowledge representation system (a simulated human mind), underneath some
simple ASCII notation (a simulated human language) that real people or digital programs can use externally to express or
manipulate or query concepts within that system.
Our design sits in the middle, striving to functionally resemble those on both sides.
In particular, its WORDS syntax leans toward that of CGs, while its internal models
lean toward frames.
Follow the link on Topic Charting, however, and you'll see that in fact these internal models mimic
both sides in turn, because WORDS scripts get processed in several stages - first like LISP, then like
Topic Maps (a relative newcomer to KR), and eventually by using rule-based logic.
Useful Modeling Paradigms
Behind all the Java objects lie the paradigms we want to use and integrate in
modeling what English words and grammar forms mean. Picking the best ones is hard, as many
people have done good work on machine intelligence. In the end, we selected those ideas that can best simplify our bulk scripting
of semantic models and constraints for English concepts and speech acts, and thus optimize our own ROI
and that of all other WORDS authors:
-
Limiting input style helps. Controlled English lets people more easily write assertions in Common Logic.
Similarly, MODELER began life by mapping WORDS inputs that resemble Conceptual Graphs
into specific expressions in Common Lisp. They can drive FOL
engines, but also the other processing paradigms below.
- Public lexicons help.
Ontologies are taking off.
IEEE's SUO-KIF is
emerging as the standard abstract upper taxonomy, while
Case Roles
and Speech Acts
remain the best way to (de)serialize English clauses.
Meanwhile, Roget's Thesaurus can be viewed as
an intuitive bulk catalogue of all characteristics which English might encode.
- Emerging metadata standards help. The best storage medium for
concepts or topics of a discourse is ISO's standard Topic Map
- a portable declarative graph (similar to RDF).
Such graphs, especially if constrained by OWL, can model fixed aspects of things nicely.
- Computed aspects help. Multi-facet properties have
modernized Library Science, and O-O methods have modernized Frames.
Aspects are a slot-like hybrid model for parts or properties likely to change, able to compute, constrain or assume their values.
Adding WORDS scripts to Aspect classes makes it easy to implement the needed methods.
- Standard interfaces help. Web Services based on HTTP are (almost) all
we need to model a discourse for each client, Their Command I/O pattern enables both
undo services and efficient file-based data persistence. Using Java Servlets adds the rest, allowing MODELER to stay fast, free, secure, and easy to zip up, download and reinstall.
Open Source Business Model
Most of the above is open source software. We'll integrate these pieces, then
add our own contributions. For small projects, once it all gets beta tested, MODELER
will then become an open source Java toolkit.
For larger projects, once we reach critical mass, we can then become self sustaining by
offering bulk scripted lexicons of core English vocabulary, plus the special tools, training and
consulting most larger organizations will need to extend them for particular domains.
We can also help on deploying the combination internally or to their clients, who may
then extend it further still.
But to get MODELER properly documented and released, Lexikos needs larger R&D partners
and sponsors with seed funding and specific goals willing help us get this toolkit off
the ground, and to test its early design elements in pilot projects. These small missions
will let us shake out any design flaws, provide benchmarks on its utility, and establish
a history of results that will let us attract both new staff and new customers.
To join us at any of these levels, please email. We would not only truly welcome your
involvement, but for WORDS to succeed we require it..
|