AVAILABLE
KERNEL SOFTWARE PRODUCTS
In the course of its NLP consulting activities, Lexikos Corporation has
developed a suite of advanced kernel software components which jointly can
analyze English input text to produce formal models of its words, phrase
structure, and meaning.
We can license these components to application developers in integrated
systems, or to other research groups as pre-tested kernel software modules they
can integrate with their own.
Customization and installation services are available on request.
The
Lexikon
To analyze a paragraph, one must first understand all its words. Getting accurate lexical data on individual
words is usually the most tedious and costly part of any technical undertaking
involving English text, but one which is absolutely vital.
We can supply high quality bulk data on isolated English words, names,
phrases, and idioms, cleanly packaged to meet the on-line needs of our clients'
software systems. Our delivery software
provides a detailed model of any token in context, optionally learning as it
goes. (See separate Technical Summary
of the Lexikon).
The
Parser
Words combine into phrases and phrases into sentences. In both cases, the details are governed by
the rules of English grammar. Our parser
can interpret formal rules of a (government-binding) grammar to deduce and
model the deep and surface structures, including movement effects, for most
input sentences and phrases.
This parsing system is syntax-oriented and semi-deterministic: it is based on Marcus's now-classic design,
but it does backtrack to handle lexical and structural ambiguity when
deterministic methods fail. For maximum
accuracy in its output, one can configure it to seek operator help when it gets
confused. The result is a fast but very
powerful parser, able to unpack the structure of real-life English text.
The
Case-Frame Thesaurus
To pull meaning from text, one must know what each word denotes. Case-frame technology is a powerful way to
translate the structure of text, as deduced by a parser, into semantic models
of the topics and situations which it describes.
Our Case-Frame Thesaurus uses methods and principles of knowledge
engineering to represent "naive semantic" descriptions of the
situations denoted by English word senses.
They model the expected "case role" complements of the
situation, and the expectations offer defaults as well as added constraints on
the parsing of each sentence. When
descriptions of the actual complements within any parse tree replace them, the
result is a "logical form" model of overall sentence semantics.
Semantics is a difficult area of NLP, often deemed "too big"
to be well handled except in restricted domains. Lexikos has made giant strides in easing this
limit. By directly exploiting the fixed
word-categories in a large, specific edition of Roget's Thesaurus, we
can potentially handle semantics for vocabularies as general as that in the
published book (an on-line copy of which is available as an option).
The
Text Transcriber
The Lexikon, Parser, and Case-Frame Thesaurus can be integrated into a
single package, along with discourse-modeling code to handle anaphora and I/O
utilities to read a specific type of input document and produce a custom output
file for it. The result is one instance
of a generic tool that we call a "Text Transcriber."
Each Transcriber works to rewrite English input text into a formal
output language. The output language
should make clear to some specific application package the structure and/or topics
of each textual input, so they can be processed further. Though each application will be different,
the basic goal of any Text Transcriber is thus roughly to turn a document into
a usable, application-specific data base.
Lextend
To properly exploit any system like our Parser or Transcriber,
it is often useful or necessary to create a domain-specific lexicon. Such a lexicon extends the core vocabulary in
our Lexikon and Case Frame Thesaurus and specializes their general word models
by expanding on specific word senses that are common and expected in a given
semantic domain. Lextend semi-automates
the process of creating and expanding such domain-specific lexicons within a
client's NLP system. It can be used in
"training runs" during development by development staff, and/or
directly added as part of the run-time user interface for clerical-level system
operators.