Context & Motivation
Semantic Web is Tim Berners-Lee's vision to have information on the World Wide Web
understandable for both humans and machines. It could
also be seen as a web evolution approach that aims to help users by
making use of machines to process, deliver, understand, and interpret
large amounts of data for various purposes in a meaningful way.
Ontologies, as one of the main pillars of Semantic Web Technologies (SWT), are an explicit specification of a conceptualization to enable
knowledge sharing and reuse. An ontology can also be seen as a
way of organizing and categorizing information with the goal of
facilitating its access, use, and distribution among communities. The World Wide
Web Consortium (W3C), as international organization that develops
Web standards, has defined several important W3C recommendations
related to Semantic Web, such as Resource Description Framework
(RDF), RDF-Schema (RDFS), Web Ontology Language (OWL) or SPARQL
Query Language (SPARQL).
There
are two important characteristics of current ontology languages in the
Semantic Web: Open World Assumption (OWA) and Non Unique Name
Assumption (Non-UNA).
- OWA means that one cannot infer that a statement is false based on its absence (it may exist in the real world but was not explicated yet). This is the opposite of the Closed World Assumption (CWA), where non-explicit information is expressed as false.
- Non-UNA implies that individuals could have more than one identifier. In other words, two or more identifiers could point to the same entity. This is different to other knowledge representations (e.g., relational databases), which typically operate under Unique Name Assumption (UNA).
Figure 1 shows
the effect of having these two characteristics (OWA and Non-UNA)
within ontologies. Depending on the use cases, some applications may
require ontology axioms to validate its constraints according to
certain constraint definitions, instead of inferring new information
from the axioms.
The Challenges
Semantic
Web Technologies have been widely used within data centric
applications in various areas, e.g., in Bioinformatics, System
Engineering and Cultural Heritage. SWT provide an excellent
platform to capture the axiomatic (structural) definition of data
(e.g., SKOS for hierarchies and vocabularies; Resource Description
Framework Schema and Web Ontology Language for class,
property, and relationships definitions).
Within several of these data centric applications, there is desire to go beyond axiomatic definitions. One example is the definition of constraints that must be satisfied by the instance data. However, due to the Open World Assumption and the absence of the Unique Name Assumption in SWT, it is difficult to define structural constraints of an RDF graph with the current set of RDF vocabularies (e.g., RDFS, SKOS, and OWL).
Recently, Shapes Constraint Language (SHACL) was proposed to address this challenge. There are other approaches aiming to address a similar challenge, most notably Pellet-ICV, RDFUnit, and SPIN. None of them have been accepted as a W3C recommendation, which is specifically important for practitioners since they want to rely on stable specifications for their applications.
Shapes
Constraint Language (SHACL) is an RDF vocabulary for describing RDF
structural constraints. SHACL allows validation of RDF graph data
according to its SHACL constraint definitions. At the moment, SHACL
is a work in progress of the W3C RDF Data Shapes WG.
Currently, the integration of such structural constraints definitions and validations in ontology editors is only available using specific commercial products (e.g., Pellet-ICV within StarDog and SPIN within TopBraid Composer.
Within several of these data centric applications, there is desire to go beyond axiomatic definitions. One example is the definition of constraints that must be satisfied by the instance data. However, due to the Open World Assumption and the absence of the Unique Name Assumption in SWT, it is difficult to define structural constraints of an RDF graph with the current set of RDF vocabularies (e.g., RDFS, SKOS, and OWL).
Recently, Shapes Constraint Language (SHACL) was proposed to address this challenge. There are other approaches aiming to address a similar challenge, most notably Pellet-ICV, RDFUnit, and SPIN. None of them have been accepted as a W3C recommendation, which is specifically important for practitioners since they want to rely on stable specifications for their applications.
Currently, the integration of such structural constraints definitions and validations in ontology editors is only available using specific commercial products (e.g., Pellet-ICV within StarDog and SPIN within TopBraid Composer.
SHACL4P: The SHACL Protégé Plugin
In
order to address the challenge of providing support of defining and validating structural constraints over RDF graphs
using an open source ontology editor like Protègè, we will introduce the novel
SHACL4P, a SHACL plugin for Protégé. The source code of SHACL4P
is available under github as two separate Java projects, the Protégé plugin and the SHACL engine.
Our approach
The
basic idea behind our approach was to integrate Ontologies and their respective
SHACL constraints
in order to perform constraint violation checking through the SHACL
validation engine (see Figure 2).
We
designed the SHACL engine in a modular fashion to increase
reusability, maintainability, and readability. Our implementation is
focused around three main components that will fulfill the required
functionalities, (1) the SHACL user interface, (2) the data
transformation component and (3) the SHACL validation engine.
- The SHACL user interface inside Protégé is the point of engagement to the user. It provides a way to add SHACL constraints and other vocabulary to existing ontologies. Furthermore, it visualizes the results of the validation process in an user-friendly format.
- Data transformation is performed as the intermediate step between the input gathered from the user interface and the actual validation. As we are dealing with two different sets of data, the ontology resulted from Protégé ontology editor which is to be validated and the SHACL constraints to be validated against, it is necessary to bring them in a uniform format before the actual validation. Additionally, it is also because Protégé and SHACL engine are using different API (Protégé uses OWL API, while SHACL uses Apache Jena). In the inverse direction, the component is responsible for transforming the results of the validation engine, which are described in RDF form, to a POJO.
- The SHACL validation engine provides a framework that allows constraint checking on a testing domain. The result of this process is a RDF representation that either describes the constraint violations or evaluates the domain to be valid.
Graphically,
SHACL4P is implemented as a workspace tab in Protégé consisting of
6 different views, as displayed in the following screenshot (Figure 3):
- The class hierarchy view (upper left corner) displays all classes of the ontology.
- Instances of classes are displayed on the lower left corner.
- The SHACL text editor (middle upper view) is used to define SHACL constraints and to start the validation by pressing the button Execute on the lower left corner of the view.
- The logging view (middle lower view) informs the user about validation errors, failed syntax checks and other similar events.
- The Turtle rendering view (upper right corner) represents the current ontology in Turtle syntax for easier detection of the needed vocabulary.
- The prefix view (lower right corner) lists all defined prefixes of the ontology. This view can be helpful when defining the prefixes for the SHACL definition.
Conclusion & Future Work
We developed a plugin implementation that we
built on top of Protégé to provide users with the means to define,
implement, and execute SHACL constraints against their Semantic Web
data within an open source ontology editor.
No comments:
Post a Comment