logo

Home | Products | Research | Projects | History

Theories Behind WORDS

Its Basic Operation
Core Design Premises
Useful Modeling Paradigms
Open Source Business Model

Basic Operations

Making WORDS lexicons executable is not hard.  In fact, once we decided to model bulk words and their meanings as Java objects delegating to Topics, it became hard to keep them from seeming executable. And at that point, adding topic-embedded scripts seemed obvious.

The (CONTEXT) module into which they load is really just a JVM with servlet support and a half a dozen well-built web apps for managing bulk lexical data, ontologies, scripts, etc.. Chief among these are:

  • A Topic Map storage engine able to import metadata, then query, update or export merged results under advanced APIs

  • Display code for loaded TMs. Optopia's Omnigator is one popular free example, but you can add others of any design you like

  • Our interpreter, MODELER, whose discourse model and flexible scripts (embedded, batched or typed) can readily drive all of the above, PLUS

  • Any other Java code or web apps you wish, provided the available service calls are documented for scripts to invoke

For error-free English transcription, a human operator is generally required to provide inputs, fix misspellings, and guide processing, but in batch mode, text streams and heuristics may be substituted.

Core Design Premises

Several exist, easiest to concisely explain by using a 3-way analogy:

  LISP functions         WORDS scripts           Conceptual Graphs
-------------------    :=:    -------------------    :=:    -------------------------
   Frame Theory          Topic Charting            Predicate Calculus

Each of the three columns shows some kind of knowledge representation system (a simulated human mind), underneath some simple ASCII notation (a simulated human language) that real people or digital programs can use externally to express or manipulate or query concepts within that system.

Our design sits in the middle, striving to functionally resemble those on both sides. In particular, its WORDS syntax leans toward that of CGs, while its internal models lean toward frames.

Follow the link on Topic Charting, however, and you'll see that in fact these internal models mimic both sides in turn, because WORDS scripts get processed in several stages - first like LISP, then like Topic Maps (a relative newcomer to KR), and eventually by using rule-based logic.

Useful Modeling Paradigms

Behind all the Java objects lie the paradigms we want to use and integrate in modeling what English words and grammar forms mean.  Picking the best ones is hard, as many people have done good work on machine intelligence.  In the end, we selected those ideas that can best simplify our bulk scripting of semantic models and constraints for English concepts and speech acts, and thus optimize our own ROI and that of all other WORDS authors:

  1. Limiting input style helps. Controlled English lets people more easily write assertions in Common Logic. Similarly, MODELER began life by mapping WORDS inputs that resemble Conceptual Graphs into specific expressions in Common Lisp.  They can drive FOL engines, but also the other processing paradigms below.

  2. Public lexicons helpOntologies are taking off.  IEEE's SUO-KIF is emerging as the standard abstract upper taxonomy, while Case Roles and Speech Acts remain the best way to (de)serialize English clauses. Meanwhile, Roget's Thesaurus can be viewed as an intuitive bulk catalogue of all characteristics which English might encode.

  3. Emerging metadata standards help.  The best storage medium for concepts or topics of a discourse is ISO's standard Topic Map - a portable declarative graph (similar to RDF).  Such graphs, especially if constrained by OWL, can model fixed aspects of things nicely.

  4. Computed aspects helpMulti-facet properties have modernized Library Science, and O-O methods have modernized Frames. Aspects are a slot-like hybrid model for parts or properties likely to change, able to compute, constrain or assume their values. Adding WORDS scripts to Aspect classes makes it easy to implement the needed methods.

  5. Standard interfaces helpWeb Services based on HTTP are (almost) all we need to model a discourse for each client,  Their Command I/O pattern enables both undo services and efficient file-based data persistence. Using Java Servlets adds the rest, allowing MODELER to stay fast, free, secure, and easy to zip up, download and reinstall.


Open Source Business Model

Most of the above is open source software.  We'll integrate these pieces, then add our own contributions. For small projects, once it all gets beta tested, MODELER will then become an open source Java toolkit.

For larger projects, once we reach critical mass, we can then become self sustaining by offering bulk scripted lexicons of core English vocabulary, plus the special tools, training and consulting most larger organizations will need to extend them for particular domains. We can also help on deploying the combination internally or to their clients, who may then extend it further still.

But to get MODELER properly documented and released, Lexikos needs larger R&D partners and sponsors with seed funding and specific goals willing help us get this toolkit off the ground, and to test its early design elements in pilot projects. These small missions will let us shake out any design flaws, provide benchmarks on its utility, and establish a history of results that will let us attract both new staff and new customers.

To join us at any of these levels, please email. We would not only truly welcome your involvement, but for WORDS to succeed we require it..







Lexikos Corporation
Boston & Knoxville
Email: Dan@Lexikos.com