Updating Linked Data practices for FAIR Digital Object principles
Stian Soiland-Reyes,Leyla Jael Castro,Daniel Garijo,Marc Portier,Carole Goble,Paul Groth
DOI: https://doi.org/10.3897/rio.8.e94501
2022-10-13
Research Ideas and Outcomes
Abstract:BackgroundThe FAIR principles (Wilkinson et al. 2016) are fundamental for data discovery, sharing, consumption and reuse; however their broad interpretation and many ways to implement can lead to inconsistencies and incompatibility (Jacobsen et al. 2020).The European Open Science Cloud (EOSC) has been instrumental in maturing and encouraging FAIR practices across a wide range of research areas. Linked Data in the form of RDF (Resource Description Framework) is the common way to implement machine-readability in FAIR, however the principles do not prescribe RDF or any particular technology (Mons et al. 2017).FAIR Digital ObjectFAIR Digital Object (FDO) (Schultes and Wittenburg 2019) has been proposed to improve researcher's access to digital objects through formalising their metadata, types, identifiers and exposing their computational operations, making them actionable FAIR objects rather than passive data sources. FDO is a set of principles (Bonino et al. 2019), implementable in multiple ways. Current realisations mostly use Digital Object Interface Protocol (DOIPv2) (DONA Foundation 2018), with the main implementation CORDRA. We can consider DOIPv2 as a simplified combination of object-oriented (CORBA, SOAP) and document-based (HTTP, FTP) approaches.More recently, the FDO Forum has prepared detailed recommendations, currently open for comments, including a DOIP endorsement and updated FDO requirements. These point out Linked Data as another possible technology stack, which is the focus of this work.Linked DataLinked Data standards (LD), based on the Web architecture, are commonplace in sciences like bioinformatics, chemistry and medical informatics – in particular to publish Open Data as machine-readable resources. LD has become ubiquitous on the general Web, the schema.org vocabulary is used by over 10 million sites for indexing by search engines – 43% of all websites use JSON-LD.Although LD practices align to FAIR (Hasnain and Rebholz-Schuhmann 2018), they do not fully encompass active aspects of FDOs. The HTTP protocol is used heavily for applications (e.g. mobile apps and cloud services), with REST APIs of customised JSON structures. Approaches that merge the LD and REST worlds include Linked Data Platform (LDP), Hydra and Web Payments.Meeting FDO principles using Linked Data standardsConsidering the potential of FDOs when combined with the mature technology stack of LD, here we briefly discuss how FDO principles in Bonino et al. (2019) can be achieved using existing standards. The general principles (G1–G9) apply well: Open standards with HTTP being stable for 30 years, JSON-LD is widely used, FAIR practitioners mainly use RDF, and a clear abstraction between the RDF model with stable bindings available in multiple serialisations. However, when considering the specific principles (FDOF1–FDOF12) we find that additional constraints and best practices need to be established – arbitrary LD resources cannot be assumed to follow FDO principles. This is equivalent to how existing use of DOIP is not FDO-compliant without additional constraints.Namely, persistent identifiers (PIDs) (McMurry et al. 2017) (FDOF1) are common in LD world (e.g. using http://purl.org/ or https://w3id.org/), however they don't always have a declared type (FDOF2), or the PID may not even appear in the metadata. URL-based PIDs are resolvable (FDOF3), typically over HTTP using redirections and content-negotiation. One great advantage of RDF is that all attributes are defined semantic artefacts with PIDs (FDOF4), and attributes can be reused across vocabularies. While CRUD operations (FDOF6) are supported by native HTTP operations (GET/PUT/POST/DELETE) as in LDP , there is little consistency on how to define operation interfaces in LD (FDOF5). Existing REST approaches like OpenAPI and URI templates are mature and good candidates, and should be related to defined types to support machine-actionable composition (FDOF7). HTTP error code 410 Gone is used in tombstone pages for removed resources (FDOF12), although more frequent is 404 Not Found.Metadata is resolved to HTTP documents with their own URIs, but these frequently don't have their own PID (FDOF8). RDF-Star and nanopublications (Kuhn et al. 2021) give ways to identify and trace provenance of individual assertions. Different metadata levels (FDOF9) are frequently developed for LD vocabularies across different communities (FDOF10), such as FHIR for health data, Bioschemas for bioinformatics and >1000 more specific bioontologies. Increased declaration and navigation of profiles is therefore essential for machine-actionability and consistent consumption across FAIR endpoints. Several standards exist for rich collections (FDOF11), e.g. OAI-ORE, DCAT, RO-Crate, LDP. These are used and extended heterogeneously across the Web, but consistent machine-actionable FDOs will need specific choices of core -Abstract Truncated-