Wednesday, July 08, 2015

SHACL4P: Shapes Constraint Language (SHACL) Plugin for Protégé Ontology Editor

Context & Motivation

Semantic Web is Tim Berners-Lee's vision to have information on the World Wide Web understandable for both humans and machines. It could also be seen as a web evolution approach that aims to help users by making use of machines to process, deliver, understand, and interpret large amounts of data for various purposes in a meaningful way.

Ontologies, as one of the main pillars of Semantic Web Technologies (SWT), are an explicit specification of a conceptualization to enable knowledge sharing and reuse. An ontology can also be seen as a way of organizing and categorizing information with the goal of facilitating its access, use, and distribution among communities. The World Wide Web Consortium (W3C), as international organization that develops Web standards, has defined several important W3C recommendations related to Semantic Web, such as Resource Description Framework (RDF), RDF-Schema (RDFS), Web Ontology Language (OWL) or SPARQL Query Language (SPARQL).

There are two important characteristics of current ontology languages in the Semantic Web: Open World Assumption (OWA) and Non Unique Name Assumption (Non-UNA).
  • OWA means that one cannot infer that a statement is false based on its absence (it may exist in the real world but was not explicated yet). This is the opposite of the Closed World Assumption (CWA), where non-explicit information is expressed as false.
  • Non-UNA implies that individuals could have more than one identifier. In other words, two or more identifiers could point to the same entity. This is different to other knowledge representations (e.g., relational databases), which typically operate under Unique Name Assumption (UNA).
Figure 1 shows the effect of having these two characteristics (OWA and Non-UNA) within ontologies. Depending on the use cases, some applications may require ontology axioms to validate its constraints according to certain constraint definitions, instead of inferring new information from the axioms.
Figure 1 Constraint Checking Example using Current RDF Vocabularies

The Challenges

Semantic Web Technologies have been widely used within data centric applications in various areas, e.g., in Bioinformatics, System Engineering and Cultural Heritage. SWT provide an excellent platform to capture the axiomatic (structural) definition of data (e.g., SKOS for hierarchies and vocabularies; Resource Description Framework Schema and Web Ontology Language for class, property, and relationships definitions). 

Within several of these data centric applications, there is desire to go beyond axiomatic definitions. One example is the definition of constraints that must be satisfied by the instance data. However, due to the Open World Assumption and the absence of the Unique Name Assumption in SWT, it is difficult to define structural constraints of an RDF graph with the current set of RDF vocabularies (e.g., RDFS, SKOS, and OWL).

Recently, Shapes Constraint Language (SHACL) was proposed to address this challenge. There are other approaches aiming to address a similar challenge, most notably Pellet-ICV, RDFUnit, and SPIN. None of them have been accepted as a W3C recommendation, which is specifically important for practitioners since they want to rely on stable specifications for their applications. 

Shapes Constraint Language (SHACL) is an RDF vocabulary for describing RDF structural constraints. SHACL allows validation of RDF graph data according to its SHACL constraint definitions. At the moment, SHACL is a work in progress of the W3C RDF Data Shapes WG.

Currently, the integration of such structural constraints definitions and validations in ontology editors is only available using specific commercial products (e.g., Pellet-ICV within StarDog and SPIN within TopBraid Composer

SHACL4P: The SHACL Protégé Plugin

In order to address the challenge of providing support of defining and validating structural constraints over RDF graphs using an open source ontology editor like Protègè, we will introduce the novel SHACL4P, a SHACL plugin for Protégé. The source code of SHACL4P is available under github as two separate Java projects, the Protégé plugin and the SHACL engine.

Our approach

The basic idea behind our approach was to integrate Ontologies and their respective SHACL constraints in order to perform constraint violation checking through the SHACL validation engine (see Figure 2).
Figure 2: Basic concepts of the validation process
We designed the SHACL engine in a modular fashion to increase reusability, maintainability, and readability. Our implementation is focused around three main components that will fulfill the required functionalities, (1) the SHACL user interface, (2) the data transformation component and (3) the SHACL validation engine.
  1. The SHACL user interface inside Protégé is the point of engagement to the user. It provides a way to add SHACL constraints and other vocabulary to existing ontologies. Furthermore, it visualizes the results of the validation process in an user-friendly format.
  2. Data transformation is performed as the intermediate step between the input gathered from the user interface and the actual validation. As we are dealing with two different sets of data, the ontology resulted from Protégé ontology editor which is to be validated and the SHACL constraints to be validated against, it is necessary to bring them in a uniform format before the actual validation. Additionally, it is also because Protégé and SHACL engine are using different API (Protégé uses OWL API, while SHACL uses Apache Jena). In the inverse direction, the component is responsible for transforming the results of the validation engine, which are described in RDF form, to a POJO.
  3. The SHACL validation engine provides a framework that allows constraint checking on a testing domain. The result of this process is a RDF representation that either describes the constraint violations or evaluates the domain to be valid.
Graphically, SHACL4P is implemented as a workspace tab in Protégé consisting of 6 different views, as displayed in the following screenshot (Figure 3):
Figure 3: Screenshot of the SHACL Protégé Plugin
  1. The class hierarchy view (upper left corner) displays all classes of the ontology.
  2. Instances of classes are displayed on the lower left corner.
  3. The SHACL text editor (middle upper view) is used to define SHACL constraints and to start the validation by pressing the button Execute on the lower left corner of the view.
  4. The logging view (middle lower view) informs the user about validation errors, failed syntax checks and other similar events.
  5. The Turtle rendering view (upper right corner) represents the current ontology in Turtle syntax for easier detection of the needed vocabulary.
  6. The prefix view (lower right corner) lists all defined prefixes of the ontology. This view can be helpful when defining the prefixes for the SHACL definition.

Conclusion & Future Work

We developed a plugin implementation that we built on top of Protégé to provide users with the means to define, implement, and execute SHACL constraints against their Semantic Web data within an open source ontology editor.

As for future work, we are looking forward to implement the full (final) version of SHACL within our plugin. Furthermore, we want to perform a user-study evaluation to test usability of the solution prototype.

No comments:

Post a Comment