CPONT Ontology Documentation

Welcome to the CPONT documentation!

Getting started with the Critical Path Ontology (CPONT)

CPONT is an application ontology that provides a "semantic layer" for the Critical Path Institute (C-Path) data. In particular, it provides a logical structure for the controlled vocabulary terms used to refer to concepts like diseases, phenotypes, assays and more. Before we dive into the details, let us clarify some key concepts:

A term from a controlled vocabulary is denoted by an identifier and always means the same thing.
It is, in particular, allowed to refer to the same thing using a variety of labels, or names. In the ontology-world, we usually have a single preferred label and a number of synonyms, including abbreviations, acronyms, layperson synonyms and more.
A very important implication of the above is that the terms may change their labels. A typical example is term usage in an international context: in some places, users refer to the term using a label in English, in other places, in French.
Terms are linked to other terms using logical relationships. Logical relationships may have explicit "semantics". For example you may say that the "part of" relationship is transitive: which means that, for example, if Athens is part of Greece and Greece is part of Europe, then Athens is part of Europe.
An ontology comprises a set of terms (classes, relationships) and statements about them that specify their semantics. In particular, the ontology specifies how the various classes (diseases, assays) relate to each other. For a more detailed introduction refer to the OBOOK introduction to ontologies.
For a definition of knowledge graph and how to best distinguish it from "ontology" see here.

The basic premise of using CPONT is that the C-Path data is annotated using controlled vocabulary terms (such as OMOP), which are then linked up to ontological standard concepts (such as SNOMED, Mondo or OBI). Having all three "ingredients" (the data, the mappings and the ontology) allows us to then access the data through powerful semantic queries.

The value proposition of CPONT is that it enables the semantic analysis of C-Path data. The concept of "semantic analysis" is hard to define exhaustively, but it is probably best understood by distinguishing semantic querying, i.e. the ability to extract data according to the semantics of the concepts referred to by the data, from semantic data analysis, i.e. the ability to utilise advanced graph-based analysis techniques such as Graph Neural Networks, Knowledge Graph Embeddings and semantic similarity to learn interesting patterns in the data, match and compare similar medical scenarios. In the next section, we will go through a simple example that illustrates the value proposition with regards to semantic querying.

Semantic querying example: gonococcal conjunctivitis

Here is a (hopefully) simple example. Let's say we have aggregated data from various sources about infectious diseases. Our goal is to obtain all of the known condition occurrences across all data sources for which a "bacterial infectious disease" was recorded that affects some "part of the eye". Let us say in our raw SDTM data set we have a patient who was observed at some occasion with "gonococcal conjunctivitis". Without context, the name "gonococcal conjunctivitis" does not tell you anything about whether it is a bacterial infectious disease, nor whether it affects some part of the eye. However, if we take into account the "semantic layer" provided by CPONT, we can get this information by "exploiting the semantics".

This works like this:

The term "gonococcal conjunctivitis" in our SDTM data is mapped to OMOP:4335889. This mapping has been computed by a service provider hired by C-Path.
OMOP:4335889 is mapped to SNOMED:231858009 via the official OMOP mappings, and to MONDO:0015455 through an external OMOP2OBO mapping provider. The latter has also been confirmed by C-Path's Ontology, Semantics and Metadata team as part of some exploratory manual mapping processes. Currently, CPONT is only concerned with the OBO terms in the upper layer (for a discussion on OMOP vs OBO see here).
Because MONDO:0015455 is a (subClassOf) [MONDO:0005113 ("bacterial infectious disease")], searching for "MONDO:0005113 and all its children" will return MONDO:0015455. Here we "exploited the semantics of the subClassOf relation" to get, transitively, all children of "bacterial infectious disease". The same can be quite cumbersome in a relational database!
From our ontology, we know that MONDO:0015455 has inflammation site UBERON:0001811 ("conjunctiva"). (As an aside, it is pretty impressive that UBERON is managed by an entirely different team of anatomy specialists and MONDO by disease specialists and such inter-ontology links are nevertheless pretty consistent.) UBERON:0001811 ("conjunctiva") is part of [UBERON:0005908 ("conjunctival sac")] which in turn is part of the [UBERON:0000019 ("camera-type eye")]. Due to the explicit semantics declared on the part of relation (transitivity), we can now infer that "conjunctiva" is indeed a part of the eye: so our disease gonococcal conjunctivitis does indeed affect a part of the eye.

The Goal of CPONT

The goal of CPONT is to:

Enable powerful semantic analysis (querying & data analysis) of the data sets contributed to C-Path.
In particular, serve as a "semantic layer" for the C-Path Knowledge Graph, enabling semantic querying/aggregation.
Cover all concepts and their relationships that are relevant to C-Path data.