Data2RDF#

In a nutshell#

Breaking this package down to the basic functionalities, one can describe it with the following bulletpoints:

With Data2RDF we want to…

  • express information available as:

    • metadata (key-value-pairs)

    • metadata of dataframe (tabular data)

    … from a data source (file or Python-dict) into OWL/RDF.

  • parse the metadata and dataframe of this data source and make it available to other 3rd party software for further data storage and processing.

  • express the SI-units of certain quantities through the QUDT ontology.

For the OWL/RDFS generation we consider:

  • to express the content of the data file/ Python dict in a dedicated subgraph (called data graph here) using established ontologies (like PROVO or CSVW ). This data graph is created on the fly while parsing the data source

  • to add additional information about the dataset on top of this data graph by adding further triples and using an ontology of your choice (called method graph here).

For this RDF generation we need …

  • either a file in the following media types:

    • csv/tsv

    • json

    • xlsx/xls

    • Python-dict

  • a curated ontology or vocabulary with OWL/RDFS classes describing the concepts in our metadata source

  • need a 1:1 mapping of value locations (metadata and/or dataframe) for the creation of the data graph (explained above).

  • optionally a mapping for the SI-Units of the individually mapped concepts, either coming from a certain location in the file or by leaving a statement of a IRI (e.g. qudt)

  • optionally an OWL/RDF with additional triples for the method graph (explained above).

Limitations (Read before using the pipeline !)#

  • The pipeline can only convert data that can be parsed by the provided parsers. The parsers currently support xlsx/xls, csv/tsv, json and Python dict objects.

  • We consider that only one dataset is expressed through the resulting OWL/RDFS of pipeline. Hence, if you have multiple datasets stored in one file, you would need to either split up the file or run the pipeline multiple times over this file with multiple mappings.

Installation#

From source#

  • git clone git@github.com/MI-FraunhoferIWM/data2rdf

  • cd data2rdf

  • pip install .

From pypi#

pip install data2rdf

Improvements#

If there is something unclear in this docs please provide feedback using the GitHub issue system.