AVAILABLE KERNEL SOFTWARE PRODUCTS

 

In the course of its NLP consulting activities, Lexikos Corporation has developed a suite of advanced kernel software components which jointly can analyze English input text to produce formal models of its words, phrase structure, and meaning.

 

We can license these components to application developers in integrated systems, or to other research groups as pre-tested kernel software modules they can integrate with their own.  Customization and installation services are available on request.

 

 

                                           The Lexikon

 

To analyze a paragraph, one must first understand all its words.  Getting accurate lexical data on individual words is usually the most tedious and costly part of any technical undertaking involving English text, but one which is absolutely vital.

 

We can supply high quality bulk data on isolated English words, names, phrases, and idioms, cleanly packaged to meet the on-line needs of our clients' software systems.  Our delivery software provides a detailed model of any token in context, optionally learning as it goes.  (See separate Technical Summary of the Lexikon).

 

 

                                            The Parser

 

Words combine into phrases and phrases into sentences.  In both cases, the details are governed by the rules of English grammar.  Our parser can interpret formal rules of a (government-binding) grammar to deduce and model the deep and surface structures, including movement effects, for most input sentences and phrases.

 

This parsing system is syntax-oriented and semi-deterministic:  it is based on Marcus's now-classic design, but it does backtrack to handle lexical and structural ambiguity when deterministic methods fail.  For maximum accuracy in its output, one can configure it to seek operator help when it gets confused.  The result is a fast but very powerful parser, able to unpack the structure of real-life English text.

 

 


                                The Case-Frame Thesaurus

 

To pull meaning from text, one must know what each word denotes.  Case-frame technology is a powerful way to translate the structure of text, as deduced by a parser, into semantic models of the topics and situations which it describes.

 

Our Case-Frame Thesaurus uses methods and principles of knowledge engineering to represent "naive semantic" descriptions of the situations denoted by English word senses.  They model the expected "case role" complements of the situation, and the expectations offer defaults as well as added constraints on the parsing of each sentence.  When descriptions of the actual complements within any parse tree replace them, the result is a "logical form" model of overall sentence semantics.

 

Semantics is a difficult area of NLP, often deemed "too big" to be well handled except in restricted domains.  Lexikos has made giant strides in easing this limit.  By directly exploiting the fixed word-categories in a large, specific edition of Roget's Thesaurus, we can potentially handle semantics for vocabularies as general as that in the published book (an on-line copy of which is available as an option).

 

 

                                    The Text Transcriber

 

The Lexikon, Parser, and Case-Frame Thesaurus can be integrated into a single package, along with discourse-modeling code to handle anaphora and I/O utilities to read a specific type of input document and produce a custom output file for it.  The result is one instance of a generic tool that we call a "Text Transcriber."

 

Each Transcriber works to rewrite English input text into a formal output language.  The output language should make clear to some specific application package the structure and/or topics of each textual input, so they can be processed further.  Though each application will be different, the basic goal of any Text Transcriber is thus roughly to turn a document into a usable, application-specific data base.

 

 

                                              Lextend

 

To properly exploit any system like our Parser or Transcriber, it is often useful or necessary to create a domain-specific lexicon.  Such a lexicon extends the core vocabulary in our Lexikon and Case Frame Thesaurus and specializes their general word models by expanding on specific word senses that are common and expected in a given semantic domain.  Lextend semi-automates the process of creating and expanding such domain-specific lexicons within a client's NLP system.  It can be used in "training runs" during development by development staff, and/or directly added as part of the run-time user interface for clerical-level system operators.