API Reference

Contents

API Reference#

Pipeline#

Abox template#

abox template module#

class data2rdf.abox_template_generation.ABoxScaffoldPipeline(xml_path, mod_xml_path=None, ttl_path=None)[source]#

Bases: object

add_individual_labels()[source]#
change_namespace(ttl_path_ns, unique_uri='http://test.org#')[source]#
create_output_next_to_file()[source]#
run_chowlk()[source]#
run_pipeline()[source]#
set_output_paths(output_folder)[source]#
xml_conversion()[source]#
data2rdf.abox_template_generation.add_abox_individuals_to_ontology(ontology_ttl_input, ontology_ttl_output, placeholder_namespace='http://abox-namespace-placeholder.org/')[source]#

Most ontologies only contain classes. In order to have individuals that can be used in the ontopanel workflow this function generates for each class an individual instance. It also adds the label of the class as label of the individual.

data2rdf.abox_template_generation.add_individual_labels(abox_template_graph_file_input, abox_template_graph_file_output)[source]#
data2rdf.abox_template_generation.convert_abox_namespace(abox_template_graph_file_input, abox_template_graph_file_output, unique_uri='http://test.org#', abox_method_tag='http://abox-namespace-placeholder.org/')[source]#
data2rdf.abox_template_generation.merge_graphs(graph_01, graph_02, merged_graph)[source]#

Merges both graphs into merged_graph

data2rdf.abox_template_generation.run_chowlk(inputfile, outputfile)[source]#

Parser code#

csv_parser module#

excel_parser module#

class data2rdf.excel_parser.ExcelParser(f_path, location_mapping_f_path, server_f_path=None, data_storage_path=None, data_storage_group_name='df', namespace='http://www.test.de')[source]#

Bases: DataParser

Generates the excel input sheet that can be used as input for abox skeleton files.

f_path#

The file path for the csv file used as input for the parser. This path gets also stored as dcat:downloadURL attribute for the created dcat:Dataset individual by the rdf_generation class.

Type:

str

location_mapping_f_path#

Path to the excel file, that holds the location of the meta data and column data cells that should be extracted.

Type:

str

server_f_path#

By default the file path for the csv file (f_path) gets used as dcat:downloadURL attribute for the created dcat:Dataset individual. On a server the actual download url of the file should be used

Type:

str

(e.g. on the DSMS https

//127.0.0.1/api/knowledge/data-files/764f6e51-a244-42f9-a754-c3e2861f63e4/raw_data/excel_file.xlsx).

data_storage_path#

Optional different storage location for the hdf5 file holding the data. Default is the same location as the input file.

Type:

str

data_storage_group_name#

Name of the group in the hdf5 to store the data. Using the data_storage_path and the data_storage_group_name multiple datasets can be stored in the same hdf5 file.

Type:

str

namespace#

The namespace that will be used by the rdf_generation class to construct the abox individuals.

Type:

str

generate_column_df()[source]#

Extracts meta data from the excel worksheet using the location mapping information from the meta_mapping_df

generate_data_storage()[source]#
generate_excel_spreadsheet(output_path)[source]#
load_file()[source]#
load_mapping_file()[source]#
parse_meta_data()[source]#

Extracts meta data from the excel worksheet using the location mapping information from the meta_mapping_df

parse_table()[source]#
parser_data()[source]#
split_meta_df()[source]#

Split the meta data into basic description and quantities

RDF Generator code#

rdf_generation module#

class data2rdf.rdf_generation.RDFGenerator(f_path, only_use_base_iri, data_download_iri)[source]#

Bases: object

Transforms the generic excel sheet to RDF

generate_column_json()[source]#
generate_file_json()[source]#

Generates the basic json-ld data model schema

generate_meta_json()[source]#
to_json_ld(f_path)[source]#
to_ttl(f_path)[source]#

Mapper code#

mapper module#

class data2rdf.mapper.Mapper(data_graph_path, method_graph_path, mapping_path)[source]#

Bases: object

create_mapping_template(worksheet='sameas')[source]#
export_mapping_as_ttl(output_file)[source]#
export_merged_mapping_table(output_file)[source]#
map_data_and_abox(worksheet='sameas')[source]#
update_mapping_template(worksheet='sameas')[source]#
data2rdf.mapper.assign_namespace(entity, namespace_mapping_dict)[source]#

Generates new entity with extended namespace If no namespace is defeined return the same entity

Parameters:
  • entity (str) – prefix:relation

  • namespace_mapping_dict (str) – dict with prefix as key and namespace as value

Returns:

(str): namespace#relation

Return type:

new_entity

data2rdf.mapper.convert_mapping2graph(merged_mapping, mapping_output_file)[source]#

Converts the data frame with mappings to a graph and exports the serialization.

Parameters:
  • merged_mapping – (pd.DataFrame): DataFrame with mappings. Inner join of all data graph, method graph and mapping.

  • mapping_output_file – (str): Path to the ttl file of the graph

data2rdf.mapper.create_mapping_template(data_graph_file, method_graph_file, mapping_output, worksheet='sameas')[source]#
data2rdf.mapper.map_data2method(data_graph_file, method_graph_file, mapping_file, worksheet)[source]#

Generates a Dataframe where the individuals of the data graph are mapped to the individuals of the method graph based on the mapping defined in the mapping file. The mapping of the data individuals is based on their label or their csvw title relation (rdfs:label|csvw:title). The mapping of the method graph is based on the class relation (rdf:type).

Although the terms data graph and method graph are used this function can be used to match any two graphs with each other. Provided the mapping is given in the mapping_file.

Note: 1) A even more generic approach would allow to define the relations used for the mapping. 2) Since data_graph_file, method_graph_file are only used to fetch the graphs, the mapping could also be adjusted to work on other sparql endpoints (e.g. for files that are already in a tripplestore)

Parameters:
  • data_graph_file (str) – Path to the data graph (ttl)

  • method_graph_file (str) – Path to the method graph (ttl)

  • mapping_file (str) – Path to the mapping file (xlsx)

  • worksheet (str) – the worksheet in the mapping file that should be used for the mapping

Returns:

(pd.DataFrame): Dataframe with mappings. Inner join of all data graph, method graph and mapping.

Return type:

merged_mapping

data2rdf.mapper.mapping_file2df(mapping_file, worksheet)[source]#

Generates a dataframe from the mapping file. Assigns the prefixes to the relations

Parameters:

mapping_file (str) – Path to the mapping file (xlsx)

Returns:

(pd.DataFrame): Dataframe with assigned prefixes

Return type:

df

data2rdf.mapper.merge_same_as_individuals(graph, convert_data_label_to_alt_label=True)[source]#

The mapper creates an OWL.sameAs relation between the data individuals and the method individuals. In some cases it is better to merge the individuals into one. This makes it simpler to navigate the graph. The merging is done by copying all relation from and to the data individuals onto the representative method individual.

Explanation: In the mat-o-lab pipeline the data and method individuals are separated, however this makes the generated graph rather difficult to navigate.

Example: Input: >>> fileid:column-0 a ns3:DataInstance ;

rdfs:label “Force” ; ns11:EMMO_67fc0a36_8dcb_4ffa_9a43_31074efa3296 fileid:unit-0,

fileid:unitliteral-0 ;

owl:sameAs sdi:TimeSeries_X_Individual .

sdi:TimeSeries_X_Individual a owl:NamedIndividual ;

rdfs:label “TimeSeries_X_Individual” .

<<<

Output: >>> sdi:TimeSeries_X_Individual a ns3:DataInstance,

owl:NamedIndividual ;

rdfs:label “TimeSeries_X_Individual” ; ns11:EMMO_67fc0a36_8dcb_4ffa_9a43_31074efa3296 fileid:unit-0,

fileid:unitliteral-0 ;

ns6:altLabel “Force” . <<<

Parameters:
  • graph (rdflib.Graph) – The graph to convert

  • convert_data_label_to_alt_label (bool) – To avoid two RDFS.label for the method individual it is convenient to change the data label to SKOS.altLabel

data2rdf.mapper.report_merge_result(merged_mapping_df)[source]#

Report the number of successfully mapped data individuals.

Emmo utils#

data2rdf.emmo_lib.emmo_utils.simple_unit_lookup(parsed_unit)[source]#

Very simple function that assignes EMMO unit classes for the prefix and the unit for a parsed unit. Assuming, that the prefix and unit match the EMMO annotation, that follows SI standards. Could be improved using unit matching libs like pint.

Parameters:

parsed_unit (str) – Parsed unit from the data (e.g. mm)

Returns:

The EMMO class of the prefix (or none) unit_class (str): The EMMO class of the unit (or none)

Return type:

prefix_class (str)