A Validating Generator for BioPAX Triples


Posting triples (alone or in bulk) to a validating web conversion utility may be the optimum way to turn existing data sets into any ontology-based exchange format.

The reasons: this method gives data providers minimum learning curves and minimal effort. They need no details about OWL or RDF format, but merely script SQL queries to collect triples under a best-fit match of all individuals to a target ontology's classes and properties.

[NOTE: This MOCKUP shows the idea for an open collection of triples very common in BioPAX. It emits Turtle translations for input triples, but does not yet save or validate them.]


Providers: Please enter a triple below, to be converted to BioPAX and logged.  Fill text boxes with IDs from your name space, after selecting their BioPAX sub-classes.

Interaction:

PARTICIPANT:    

physEntity:

Validation errors (if any) will be returned below, formatted for easy screen scraping.  Just resubmit to correct errors. All valid triples recently logged will download on your separate web request, cast by default into a BioPAX 2.x exchange file.




Discussion: Driving this JSP manually can assist in user training, debugging, and Q/A, but its main use case is to accept triples posted from upload scripts (in Perl, etc.) driven by bulk provider data sets. Valid triples get saved to disk as CSV files (which can also be posted or read on demand from a provider's site).

Providers can each write all scripts in their own favorite language once this conversion form and other details get finalized. (The arguments of GET requests and some sample CSV files constitute interface specs.) With SAIDs we can simplify the whole mapping process.

Saved CVS files can export into SEVERAL legal exchange formats on consumer demand, using shared converters. BioPAX.org must assign domain and range constraints to each PARTICIPANT type by flagging legal combos of SAIDs in the namespace file populating the 3 pull-down lists above.

This same information, plus English comments, must be published anyway as human-readable specs on interactions covered by the BioPAX ontology. This utility merely reuses that same doc in ways more pro-actively helpful to data providers.

A similar web form and validator seem needed for physicalEntity sub-classes, in which various dataProperty triples would be checked and converted. In practice, 3-4 web forms total can handle every BioPAX triple required.




This upper ontology registry can define words, phrases and sentences so that software "understands" what they denote. In June 2011, it won U.S. patent 7,962,328.

SAIDs can add speed and accuracy to many semantic applications - ours or yours. For more about using them, write Dan Corwin