API Reference

Contents

API Reference#

Pipeline#

Data2RDF ABox pipeline

class data2rdf.pipelines.main.Data2RDF(*, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], parser: ~data2rdf.parsers.Parser, parser_args: ~typing.Dict[str, ~typing.Any] = {}, config: ~typing.Dict[str, ~typing.Any] | ~data2rdf.config.Config = <factory>, additional_triples: str | ~rdflib.graph.Graph | None = None)[source]#

Bases: BaseModel

Data2rdf pipeline.

Parameters: - raw_data (Union[str, bytes, Dict[str, Any]]):

In case of a csv: str with the file path or the content of the file itself. In case of a json file: dict for the content of the file of str for the file content or file path. In case of an excel file: btyes for the content or str for the file path

  • mapping (Union[str, Dict[str, Any]]): File path to the mapping file to be parsed or a dictionary with the mapping.

  • parser (Parser): Parser to be used depending on the type of raw data file.

  • parser_args (Dict[str, Any]): A dictionary with specific arguments for the parser. These are passed to the parser

as keyword arguments. - config (Union[Dict[str, Any], Config]): Configuration object. Defaults to a new instance of Config. - additional_triples (Optional[Union[str, Graph]]): File path or rdflib-object for a Graph with extra triples for the resulting pipeline graph.

additional_triples: str | Graph | None#
config: Dict[str, Any] | Config#
property dataframe: Dict[str, Any]#

Return dataframe

property dataframe_metadata: List[BasicConceptMapping]#

Return list object with dataframe metadata

property general_metadata: List[BasicConceptMapping]#

Return list object with general metadata

property graph: Graph#

Returns a graph object based on the pipeline’s JSON-LD data.

The graph object is created with the identifier specified through the pipeline. It is then populated with the JSON-LD data from the pipeline, and if additional triples are provided, they are validated and added to the graph.

Returns:

A graph object containing the pipeline’s data.

Return type:

Graph

property json_ld: Dict[str, Any]#

Returns a dictionary of JSON-LD for the graph based on the pipeline mode.

If the pipeline mode is ABOX, it returns a dictionary containing the context, id, type, and distribution information of the dataset. If the suppress_file_description config is False, it also includes the file description. Otherwise, it returns the JSON-LD of the ABox parser.

If the pipeline mode is TBOX, it returns the JSON-LD of the TBox parser.

Parameters:

None

Returns:

A dictionary of JSON-LD for the graph.

Return type:

Dict[str, Any]

mapping: str | List[Any]#
mode: PipelineMode#
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'use_enum_values': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parser: Parser#
parser_args: Dict[str, Any]#
property plain_metadata: Dict[str, Any]#

Metadata as flat json - without units and iris. Useful e.g. for the custom properties of the DSMS.

raw_data: str | bytes | Dict[str, Any] | List[Dict[str, Any]]#
classmethod run_pipeline(self: Data2RDF) Data2RDF[source]#

Run pipeline.

property time_series: Dict[str, Any]#
property time_series_metadata: List[BasicConceptMapping]#
to_dict(schema: Callable | None = None) List[Dict[str, Any]][source]#

Return list of general metadata as DSMS custom properties

classmethod validate_config(value: Dict[str, Any] | Config) Config[source]#

Validate configuration

Mappings and graph models#

Base#

Basic data2rdf models

class data2rdf.models.base.BaseConfigModel(*, config: ~data2rdf.config.Config = <factory>)[source]#

Bases: BaseModel

Basic model for holding the data2rdf config

config: Config#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_config(value: Dict[str, Any] | Config) Config[source]#

Validate configuration

class data2rdf.models.base.BasicConceptMapping(*, config: ~data2rdf.config.Config = <factory>, key: str | None = None)[source]#

Bases: BaseConfigModel

Basic mapping for a concept in a file

key: str | None#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class data2rdf.models.base.BasicGraphModel(*, config: ~data2rdf.config.Config = <factory>, key: str | None = None)[source]#

Bases: BasicConceptMapping

Basic model for merging data with mappings to become a graph

property graph: Graph#

Return graph object based on json-ld

abstract property json_ld: Dict[str, Any]#

Return dict for json-ld of graph

model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class data2rdf.models.base.BasicSuffixModel(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None)[source]#

Bases: BaseConfigModel

Pydantic BaseModel for suffix and type of a class instance

iri: str | AnyUrl | List[str | AnyUrl]#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

suffix: str | None#
classmethod validate_iri(value: AnyUrl | List[AnyUrl]) AnyUrl[source]#

Make sure that there are not blank spaces in the IRI

classmethod validate_suffix(self: BasicSuffixModel) BasicSuffixModel[source]#

Return suffix for individal

class data2rdf.models.base.RelationType(value)[source]#

Bases: str, Enum

Relation Type of TBox modellings

ANNOTATION_PROPERTY = 'annotation_property'#
DATA_PROPERTY = 'data_property'#
OBJECT_PROPERTY = 'object_property'#
PROPERTY = 'property'#

Graph#

Models for graph construction from semantic concepts

class data2rdf.models.graph.ClassTypeGraph(*, config: ~data2rdf.config.Config = <factory>, key: str | None = None, suffix: str, rdfs_type: str = 'owl:Class', annotation_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None, object_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None, data_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None, rdfs_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None)[source]#

Bases: BasicGraphModel

Graph of a potential concept or class in the T Box.

annotation_properties: List[ValueRelationMapping] | None#
data_properties: List[ValueRelationMapping] | None#
property json_ld: Dict[str, Any]#

Return dict for json-ld of graph

model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object_properties: List[ValueRelationMapping] | None#
rdfs_properties: List[ValueRelationMapping] | None#
rdfs_type: str#
suffix: str#
class data2rdf.models.graph.MeasurementUnit(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl, label: str | None = None, symbol: str | None = None, namespace: str | None = None)[source]#

Bases: BaseConfigModel

iri: str | AnyUrl#
label: str | None#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

namespace: str | None#
symbol: str | None#
classmethod validate_measurement_unit(self) MeasurementUnit[source]#
class data2rdf.models.graph.PropertyGraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, value: str | int | float | bool | ~pydantic.networks.AnyUrl | ~data2rdf.models.graph.PropertyGraph | ~data2rdf.models.graph.QuantityGraph | None = None, annotation: str | ~pydantic.networks.AnyUrl | None = None, value_relation: str | ~pydantic.networks.AnyUrl | None = 'rdfs:label', value_relation_type: ~data2rdf.models.base.RelationType | None = None, value_datatype: str | None = None)[source]#

Bases: BasicGraphModel, BasicSuffixModel

Mapping for an individual with arbitrary property. E.g. the name of a tester or a testing facility. The value must not have a discrete value but can also be a reference to a column in a table or dataframe.

annotation: str | AnyUrl | None#
property json_ld: Dict[str, Any]#

Return dict of json-ld for graph

model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property types_json: Dict[str, Any]#

Dict of json-ld for class types of the individual

classmethod validate_annotation(value: AnyUrl) AnyUrl[source]#

Make sure that there are not blank spaces in the IRI

classmethod validate_property_graph(self: PropertyGraph) PropertyGraph[source]#

Validate property graph in order to generate annotations

classmethod validate_value(self: PropertyGraph) PropertyGraph[source]#

Validate value of a property graph.

In case the value is a property graph or a quantity graph, make sure that the config is set correctly.

value: str | int | float | bool | AnyUrl | PropertyGraph | QuantityGraph | None#
value_datatype: str | None#
property value_json: Dict[str, str] | None#
value_relation: str | AnyUrl | None#
value_relation_type: RelationType | None#
class data2rdf.models.graph.QuantityGraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, unit: str | ~pydantic.networks.AnyUrl | None = None, value: int | float | str | None = None, unit_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:hasUnit', value_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:value', measurement_unit: ~data2rdf.models.graph.MeasurementUnit | None = None)[source]#

Bases: BasicGraphModel, BasicSuffixModel

Quantity with or without a discrete value and a unit E.g. a quantity with a single value and unit _or_ a quantity describing a column of a dataframe or table with a unit.

property json_ld: Dict[str, Any]#

Return dict of json-ld for graph

measurement_unit: MeasurementUnit | None#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

unit: str | AnyUrl | None#
property unit_json: Dict[str, Any]#

Return json with unit definition

unit_relation: str | AnyUrl | None#
classmethod validate_quantity_graph(self) QuantityGraph[source]#
classmethod validate_unit(value: str | AnyUrl, info: ValidationInfo) AnyUrl | None[source]#
classmethod validate_value(value: int | float | str) int | float[source]#
value: int | float | str | None#
property value_json: Dict[str, Any]#

Return json with value definition

value_relation: str | AnyUrl | None#
class data2rdf.models.graph.ValueRelationMapping(*, value: str | int | float | bool | AnyUrl, relation: str | AnyUrl, datatype: str | None = None)[source]#

Bases: BaseModel

Mapping between a object/data/annotation property and a value resolved from a location in the data file

datatype: str | None#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

relation: str | AnyUrl#
value: str | int | float | bool | AnyUrl#

Mapping#

Mapping models for data2rdf

class data2rdf.models.mapping.ABoxBaseMapping(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, unit: str | ~pydantic.networks.AnyUrl | None = None, annotation: str | ~pydantic.networks.AnyUrl | None = None, custom_relations: ~typing.List[~data2rdf.models.mapping.CustomRelation] | None = None, source: str | None = None, value_location: str | None = None, unit_location: str | None = None, value_relation: str | ~pydantic.networks.AnyUrl | None = None, value_relation_type: ~data2rdf.models.base.RelationType | None = None, value_datatype: str | None = None, unit_relation: str | ~pydantic.networks.AnyUrl | None = None, suffix_from_location: bool = False)[source]#

Bases: BasicConceptMapping, BasicSuffixModel

Base class for mapping during A Box modelling

annotation: str | AnyUrl | None#
custom_relations: List[CustomRelation] | None#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: str | None#
suffix_from_location: bool#
unit: str | AnyUrl | None#
unit_location: str | None#
unit_relation: str | AnyUrl | None#
classmethod validate_annotation(value: str | AnyUrl | None) AnyUrl | None[source]#
classmethod validate_model(self: ABoxBaseMapping) ABoxBaseMapping[source]#

Validate model

value_datatype: str | None#
value_location: str | None#
value_relation: str | AnyUrl | None#
value_relation_type: RelationType | None#
class data2rdf.models.mapping.ABoxExcelMapping(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, unit: str | ~pydantic.networks.AnyUrl | None = None, annotation: str | ~pydantic.networks.AnyUrl | None = None, custom_relations: ~typing.List[~data2rdf.models.mapping.CustomRelation] | None = None, source: str | None = None, value_location: str | None = None, unit_location: str | None = None, value_relation: str | ~pydantic.networks.AnyUrl | None = None, value_relation_type: ~data2rdf.models.base.RelationType | None = None, value_datatype: str | None = None, unit_relation: str | ~pydantic.networks.AnyUrl | None = None, suffix_from_location: bool = False, dataframe_start: str | None = None, worksheet: str | None = None)[source]#

Bases: ABoxBaseMapping

A special model for mapping from excel files to semantic concepts in the ABox

dataframe_start: str | None#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

worksheet: str | None#
class data2rdf.models.mapping.CustomRelation(*, relation: str | AnyUrl, object_location: str | None, object_data_type: str | CustomRelationPropertySubgraph | CustomRelationQuantitySubgraph | None = None, relation_type: RelationType | None = None)[source]#

Bases: BaseModel

Custom relation model

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object_data_type: str | CustomRelationPropertySubgraph | CustomRelationQuantitySubgraph | None#
object_location: str | None#
relation: str | AnyUrl#
relation_type: RelationType | None#
class data2rdf.models.mapping.CustomRelationPropertySubgraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, concatenate: bool | None = False, value_relation: str | None = 'rdfs:label')[source]#

Bases: PropertySubgraphBaseModel

model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value_relation: str | None#
class data2rdf.models.mapping.CustomRelationQuantitySubgraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, concatenate: bool | None = False, unit_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:hasUnit', value_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:value', unit: str | ~pydantic.networks.AnyUrl | None = None)[source]#

Bases: PropertySubgraphBaseModel

model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

unit: str | AnyUrl | None#
unit_relation: str | AnyUrl | None#
value_relation: str | AnyUrl | None#
class data2rdf.models.mapping.PropertySubgraphBaseModel(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, concatenate: bool | None = False)[source]#

Bases: BasicSuffixModel

concatenate: bool | None#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class data2rdf.models.mapping.TBoxBaseMapping(*, config: ~data2rdf.config.Config = <factory>, key: str, relation: str | ~pydantic.networks.AnyUrl, relation_type: ~data2rdf.models.base.RelationType, datatype: str | None = None)[source]#

Bases: BasicConceptMapping

Mapping between a object/data/annotation property and a value under a location in the data file. This

datatype: str | None#
key: str#
model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

relation: str | AnyUrl#
relation_type: RelationType#

Configuration#

class data2rdf.config.Config(_case_sensitive: bool | None = None, _nested_model_default_partial_update: bool | None = None, _env_prefix: str | None = None, _env_file: DotenvType | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_ignore_empty: bool | None = None, _env_nested_delimiter: str | None = None, _env_nested_max_split: int | None = None, _env_parse_none_str: str | None = None, _env_parse_enums: bool | None = None, _cli_prog_name: str | None = None, _cli_parse_args: bool | list[str] | tuple[str, ...] | None = None, _cli_settings_source: CliSettingsSource[Any] | None = None, _cli_parse_none_str: str | None = None, _cli_hide_none_type: bool | None = None, _cli_avoid_json: bool | None = None, _cli_enforce_required: bool | None = None, _cli_use_class_docs_for_groups: bool | None = None, _cli_exit_on_error: bool | None = None, _cli_prefix: str | None = None, _cli_flag_prefix_char: str | None = None, _cli_implicit_flags: bool | None = None, _cli_ignore_unknown_args: bool | None = None, _cli_kebab_case: bool | Literal['all', 'no_enums'] | None = None, _cli_shortcuts: Mapping[str, str | list[str]] | None = None, _secrets_dir: PathType | None = None, *, qudt_units: str | AnyUrl = 'http://qudt.org/2.1/vocab/unit', qudt_quantity_kinds: str | AnyUrl = 'http://qudt.org/vocab/quantitykind/', language: str = 'en', base_iri: str | AnyUrl = 'https://www.example.org', prefix_name: str = 'fileid', separator: str = '/', encoding: str = 'utf-8', data_download_uri: str | AnyUrl = 'https://www.example.org/download', graph_identifier: str | AnyUrl | None = None, namespace_placeholder: str | AnyUrl = 'http://abox-namespace-placeholder.org/', remove_from_unit: List[str] = ['[', ']', '"', ' '], mapping_csv_separator: str = ';', remove_from_datafile: List[str] = ['"', '\r', '\n'], suppress_file_description: bool = False, exclude_ontology_title: bool = False)[source]#

Bases: BaseSettings

Data2RDF configuration

base_iri: str | AnyUrl#
data_download_uri: str | AnyUrl#
encoding: str#
exclude_ontology_title: bool#
graph_identifier: str | AnyUrl | None#
language: str#
mapping_csv_separator: str#
model_config: ClassVar[SettingsConfigDict] = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': '', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

namespace_placeholder: str | AnyUrl#
prefix_name: str#
qudt_quantity_kinds: str | AnyUrl#
qudt_units: str | AnyUrl#
remove_from_datafile: List[str]#
remove_from_unit: List[str]#
separator: str#
suppress_file_description: bool#
classmethod validate_config(self: Config) Config[source]#

Parsers#

Base parser module#

Data2RDF base model for parsers

class data2rdf.parsers.base.ABoxBaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>)[source]#

Bases: AnyBoxBaseParser

Basic Parser for ABox mode

property dataframe: pd.DataFrame#

Return times series found in the data as pd.DataFrame

property dataframe_metadata: List[BasicConceptMapping]#

Return list object with general metadata

property general_metadata: List[BasicConceptMapping]#

Return list object with general metadata

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property plain_metadata: List[Dict[str, Any]]#
to_dict(schema: Callable | None = None) Dict[str, Any] | List[Dict[str, Any]][source]#

Return general metadata as a list of dictionaries.

The list contains dictionaries, where the key is the label of the metadata, and the value is a dictionary with the keys ‘label’ and ‘value’. If the metadata has a measurement unit associated with it, the dictionary will also contain the key ‘measurement_unit’ with the value of the measurement unit.

If the schema parameter is provided, it will be used to transform the metadata list. The schema should be a callable which takes the list of metadata dictionaries and returns the transformed metadata.

If no schema is provided, the function will return a dictionary where the keys are the labels of the metadata, and the values are the dictionaries from the list.

Parameters:

schema – A callable which takes a list of dictionaries and returns the transformed metadata.

Returns:

A dictionary or list of dictionaries with the metadata.

class data2rdf.parsers.base.AnyBoxBaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>)[source]#

Bases: BaseParser

Basic parser for A Box or T Box producing an RDF

property graph: Graph#

Return RDF Graph from the parsed data.

abstract property json_ld: Dict[str, Any]#

Return dict for json-ld for the graph

abstract property mapping_model: BaseParser#

Pydantic model for validating mapping. Must be a subclass of ABoxBaseParser or TBoxBaseParser.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod run_parser(self: BaseParser) BaseParser[source]#

Runs the parser for the given data file and mapping.

This function is a class method that takes in a self parameter, which is an instance of the BaseParser class. It loads the data file using the _load_data_file method and loads the mapping file using the load_mapping_file function. It then runs the parser using the _run_parser method and returns the parsed BaseParser instance.

Parameters:

self (BaseParser) – The instance of the BaseParser class.

Returns:

The parsed BaseParser instance.

Return type:

BaseParser

class data2rdf.parsers.base.BaseFileParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#

Bases: BaseParser

Base model for data files which can be run in abox or tbox mode. The respective ABoxBaseParser and TBoxBaseParser must be set as properties for this model. The childclasses of this BaseFileParser will be directly used by the main Data2RDF class later.

property abox: ABoxBaseParser#

Return instance of the abox_parser after model validation

property dataframe: Dict[str, Any]#

Return dataframe

property dataframe_metadata: List[BasicConceptMapping]#

Return dataframe metadata

classmethod execute_parser(self: BaseFileParser) BaseFileParser[source]#

Validates the parser model and executes the parser based on the specified mode.

Parameters:

self – An instance of the BaseFileParser class.

Returns:

An instance of the BaseFileParser class with the parser executed.

property general_metadata: List[BasicConceptMapping]#

Return list object with general metadata

property graph: Graph#

Return RDFlib Graph

property json_ld: Dict[str, Any]#

Return JSON LD representation of graph

abstract property media_type: str | AnyUrl#

IANA Media type definition of the resources to be parsed.

mode: PipelineMode#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

parser_args: Dict[str, Any]#
property plain_metadata: Dict[str, Any]#

Metadata as flat json - without units and iris. Useful e.g. for the custom properties of the DSMS.

property tbox: TBoxBaseParser#

Return instance of the tbox_parser after model validation

to_dict(schema: Callable | None = None) List[Dict[str, Any]][source]#

Return list of general metadata as DSMS custom properties

class data2rdf.parsers.base.BaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>)[source]#

Bases: BaseModel

Basic Parser for any data file and mode

config: Config#
dropna: bool#
mapping: str | List[Any]#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_data: str | bytes | Dict[str, Any] | List[Dict[str, Any]]#
classmethod validate_config(value: Dict[str, Any] | Config) Config[source]#

Validate configuration

class data2rdf.parsers.base.TBoxBaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '')[source]#

Bases: AnyBoxBaseParser

Basic Parser for TBox mode

authors: List[str] | None#
property classes: List[BasicConceptMapping]#

Return list object with class models

fillna: Any | None#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

ontology_iri: str | AnyUrl | None#
ontology_title: str | None#
rdfs_type_location: str | None#
suffix_location: str#
version_info: str | None#

CSV parser module#

CSV Parser for data2rdf

class data2rdf.parsers.csv.CSVABoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.ABoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, metadata_sep: str | None = None, metadata_length: int, dataframe_sep: str | None = None, dataframe_header_length: int = 2, fillna: ~typing.Any | None = '')[source]#

Bases: ABoxBaseParser

CSV file parser in abox mode

dataframe_header_length: int#
dataframe_sep: str | None#
fillna: Any | None#
property json_ld: Dict[str, Any]#

Returns a JSON-LD representation of the CSV data in ABox mode.

This method generates a JSON-LD object that describes the CSV data, including its metadata, dataframe data, and relationships between them.

The returned JSON-LD object is in the format of a csvw:TableGroup, which contains one or more csvw:Table objects. Each csvw:Table object represents a table in the CSV data, and contains information about its columns, rows, and relationships to other tables.

The JSON-LD object also includes context information, such as namespace prefixes and base URLs, to help with serialization and deserialization.

Returns: Dict[str, Any]: A JSON-LD object representing the CSV data in ABox mode.

mapping: str | List[ABoxBaseMapping]#
property mapping_model: ABoxBaseMapping#

ABox Mapping Model for CSV Parser

metadata_length: int#
metadata_sep: str | None#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class data2rdf.parsers.csv.CSVParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#

Bases: BaseFileParser

Parser for CSV/TSV files

property media_type: str#

IANA Media type definition of the resource to be parsed.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class data2rdf.parsers.csv.CSVTBoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.TBoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '', column_sep: str | None = ',', header_length: int = 1)[source]#

Bases: TBoxBaseParser

CSV file parser in tbox mode

column_sep: str | None#
fillna: Any | None#
header_length: int#
property json_ld: Dict[str, Any]#

Make the json-ld if pipeline is in abox-mode

mapping: str | List[TBoxBaseMapping]#
property mapping_model: TBoxBaseMapping#

TBox Mapping Model for CSV Parser

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

Excel parser module#

Data2rdf excel parser

class data2rdf.parsers.excel.ExcelABoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.ABoxExcelMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, unit_from_macro: bool = False, unit_macro_location: int = -1)[source]#

Bases: ABoxBaseParser

Parses a data file of type excel in a box mode

property json_ld: Dict[str, Any]#

Returns the JSON-LD representation of the data in ABox mode.

The JSON-LD is constructed based on the metadata and dataframe data. If the file description is not suppressed, it includes the metadata and dataframe data tables. Otherwise, it returns a list of JSON-LD representations of the individual models.

Returns:

A dictionary representing the JSON-LD data.

mapping: str | List[ABoxExcelMapping]#
property mapping_model: ABoxExcelMapping#

Mapping Model

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

unit_from_macro: bool#
unit_macro_location: int#
class data2rdf.parsers.excel.ExcelParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#

Bases: BaseFileParser

Parser for excel files

property media_type: str#

IANA Media type definition of the resource to be parsed.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class data2rdf.parsers.excel.ExcelTBoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.TBoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '', sheet: str, header_length: int = 1)[source]#

Bases: TBoxBaseParser

Parses a data file of type excel in b box mode

header_length: int#
property json_ld: Dict[str, Any]#

Make the json-ld if pipeline is in abox-mode

mapping: str | List[TBoxBaseMapping]#
property mapping_model: TBoxBaseMapping#

TBox Mapping Model

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

sheet: str#

Json parser module#

Data2rdf excel parser

class data2rdf.parsers.json.JsonABoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.ABoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, expand_array: bool = False)[source]#

Bases: ABoxBaseParser

Parser for JSON in ABox mode

expand_array: bool#
property json_ld: Dict[str, Any]#

Returns the JSON-LD representation of the parser’s data.

This method generates the JSON-LD representation of the parser’s data, including the context, id, type, and members. The members are generated based on the general metadata and dataframe metadata.

The method returns a dictionary containing the JSON-LD representation.

Returns:

A dictionary containing the JSON-LD representation.

Return type:

Dict[str, Any]

mapping: str | List[ABoxBaseMapping]#
property mapping_model: ABoxBaseMapping#

ABox mapping model

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class data2rdf.parsers.json.JsonParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#

Bases: BaseFileParser

Parses a data file of type json

property media_type: str#

IANA Media type definition of the resource to be parsed.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class data2rdf.parsers.json.JsonTBoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.TBoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '')[source]#

Bases: TBoxBaseParser

Parser for JSON in TBox mode

property json_ld: Dict[str, Any]#

Return JSON-LD in TBox mode

mapping: str | List[TBoxBaseMapping]#
property mapping_model: TBoxBaseMapping#

TBox mapping model

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.