View markdown source on GitHub

Integrate and query local datasets and distant RDF data with AskOmics using Semantic Web technologies

Contributors

Questions

Objectives

last_modification Published: Jul 17, 2020
last_modification Last Updated: Jul 27, 2021

How to explore data


Requirements

Study of biological mechanisms requires to:

.image-00[ Local data tables and remote data with gene and proteins is combined in data integration. Now a graph is produced with differential expression pointing to a gene which points to a protein. Next the data is queried and results produced.]


What is the Semantic Web?


Semantic Web

Set of recommendations to integrate data, to integrate domain knowledge and to perform query and reasoning.


RDF


RDF: Set of triples

.image-01[ a small graph, subject points to object with an arrow labelled predicate.]

nextprot:P01137 :hasTaxon taxon:9606 .
nextprot:P01137 :hasSequence "MPPSGLRLLL" .

RDF: triples form a labeled directed graph

# Description
nextprot:P01137 rdf:type nextprot:Protein .
taxon:9606 rdf:type nextprot:Organism .
# Data
nextprot:P01137 :hasTaxon taxon:9606 .
nextprot:P01137 :hasSequence "MPPSGLRLLL" .

.image-02[ A graphic with two regions, data description and data. In the data is a circle labelled nextprot:P01137 which points to a sequence via a hasSequence arrow. The nextprot points to a taxon:9606 with a hasTaxon arrow. The taxon points to a nextprot:Organism in the data description region. The nextprot protein points to nextprot:Protein in the data description region via an rdf:type arrow.]


SPARQL

SELECT ?gene
WHERE {
    ?gene rdf:type :Gene .
    ?gene :hasTaxon taxon:9606 .
}

SPARQL: entity matching allow federated queries

.pull-left[

]

.pull-right[ .image-03[ Dataset 1 and 2 are shown as two silos, each with different small graphs. Each has a red node. Those nodes are connected via a dashed line. A picture of a cloud points at the two datasets, and their individual graphs collapsed into one larger graph. A query is sent to this cloud which comes out as a result table.] ]


What is AskOmics?


AskOmics

Web software for data integration and query using Semantic Web. The main functionalities are:

AskOmics can be used as a standalone software, or with Galaxy


Data integration with AskOmics


Local data integration


AskOmics generates the graph of data and the abstraction

AskOmics uses the file structure (e.g. header of TSV files) to generate the graph of data description: the abstraction

.image-04[ Two tables are provided, pointing to RDF abstraction with a small graph of DE, Gene, and their attributes. And RDF data which has the same graph as abstraction, but with real identifiers.]

The rest of the files is converted to RDF triples that correspond to the data.


Distant RDF data integration

pip3 install abstractor
abstractor -s https://sparql.nextprot.org/sparql -o nextprot_abstraction.ttl -f turtle

Query multiple data sources with AskOmics


Query composition

.image-06[ A picture of an RDF graph with many nodes. On the right is a query interface of some sort.]


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.