THE LEXIKON

 

An extensive lexicon is essential to advanced text-analysis software.  The lexicon acts like a data base on the linguistic features and meanings of input words.  It must describe them accurately, because in most analysis systems, almost all processing decisions will be based directly on the lexicon's data.

 

Until now, developers needing this kind of software "expertise" had to get it from human linguists, who would assemble its data painstakingly at great effort and cost.  Now, Lexikos has pre-packaged this unique type of knowledge into a portable, modular software tool that is easily incorporated into new applications.

 

Our "Lexikon" is a unique commercial software product: a standalone virtual data base package that models tens of thousands of common English words.  When it is given a line or paragraph of text, the Lexikon looks up each word, then computes and returns a "map" of all the roots, syntactic features and semantic markers which could apply to that word in that specific context.  Applications code uses the combined data returned for all the words to do its intended task.

 

 

                                                      THE BENEFITS TO ITS USERS

 

The Lexikon lets advanced software development proceed at maximum efficiency.  By providing its users with accurate lexical data in adequate bulk, it can quickly advance any software project involving the content, structure, or meaning of English text.  Developers focus on their own goals, not on assembling a lexicon.

 

The Lexikon lets developers confront the true issues of real-world text analysis.  The availability of good lexical data early on can prevent the misperceptions and wasted man-years that may result when work is based on a toy system lexicon.

 

The Lexikon's data is of very high quality.  The system of features and markers which the Lexikon uses to describe words and meanings was designed by Lexikos linguists specifically for use by other software.  This avoids many problems that come from reusing data from dictionaries meant for people, or (even worse) from using lexical data which was input by programmers without linguistic training.

 

The Lexikon costs far less than building a lexicon internally.  Lexikos is an industrial supplier of tools for natural language software development, so we can afford the specialized people, tools, and testing demanded in lexicon creation.  Lexikos clients take advantage of this, reduce their overhead in these areas, and save considerable time and money by exploiting the natural division of labor.

 

The Lexikon can be put to work quickly.  We can offer it ready to run on a large-RAM 80x86 PC, a PC-resident 80386 co-processor board, a Mac II, an engineering workstation, or any other host computer supporting Common Lisp.

 

Overall, if our Lexikon is used as the "front end" to your parser, expert system, text indexer, or other English-analysis application, the total benefits will be considerable:  your application system will be deployed sooner, with better results, at much lower net R&D costs.


                                              TECHNICAL SUMMARY OF THE LEXIKON

 

In operation, the Lexikon automatically turns ASCII characters (keyboard inputs or the lines of a text file) into a detailed lexical model of the English phrase or paragraph they represent.  The output separately depicts each word, modeled in context.  These word models are easy to use, linguistically correct, and surprisingly unambiguous.  Each representation of a word includes:

 

*The syntactic features for the word, including all its expected complement patterns, at a level of detail enabling a sophisticated syntax analysis.

 

*The semantic class markers for the word, which describe its possible denotations and provide for application-specific data inheritance.

 

*The morphological changes for inflected, derived and irregular forms.  The Lexikon automatically finds each root and adjusts its data as required.

 

*Links to multi-token words, idioms, and names of which the input word may be a part, under methods that aid the on-line parsing of these constructs.

 

*Reduced lexical ambiguity, due to partial parsing rules that work invisibly to exploit constraints on the capitalization and context of each word.

 

The Lexikon provides excellent coverage of common English vocabulary.  Even so, no vocabulary covers all possible inputs, so the Lexikon also includes an interactive module called the "Scanner" which extends its fully automatic logic.

 

The Scanner is a control and debugging shell which lets users flexibly query the Lexikon's word models, either in isolation or constrained in a paragraph.  It also provides a set of controls on the processing strategies and output formats of the combined system, some of which may cause prompts for on-line operator inputs:

 

*Logic for on-line word-learning can be enabled to help a clerical operator expand the Lexikon's vocabulary as needed during each job, so every word and name is properly modeled in the output.  As a bonus, it also helps the operator find and fix misspellings or other flaws in the input text.

 

*Other controls cause the Scanner to consult with the operator via menus to interactively remove lexical ambiguity that may remain in the output of the Lexikon.  These extra human inputs can let the logic of a follow-on parser be simplified, or in some applications even substitute for parsing code.

 

*The Scanner's interactive logic may itself be augmented by a full on-line copy of Roget's Thesaurus.  This option can make production-scale use of the complete Lexikon system still easier, by expanding its vocabulary and  making operator inputs to the Scanner even simpler and less frequent.

 

When richly detailed data from the Lexikon is combined in the Scanner with the on-line guidance of an operator, the net output stream can become very complete and precise, exhibiting no gaps, no lexical ambiguity, and a human-like sophistication in the contextual interpretation of words.

 

This kind of accuracy from today's text-analysis software is unique.  We think it will greatly aid current development work in natural language processing and help produce a new spurt of growth in practical text-processing applications.