API Reference#
Pipeline#
Data2RDF ABox pipeline
- class data2rdf.pipelines.main.Data2RDF(*, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], parser: ~data2rdf.parsers.Parser, parser_args: ~typing.Dict[str, ~typing.Any] = {}, config: ~typing.Dict[str, ~typing.Any] | ~data2rdf.config.Config = <factory>, additional_triples: str | ~rdflib.graph.Graph | None = None)[source]#
Bases:
BaseModelData2rdf pipeline.
Parameters: - raw_data (Union[str, bytes, Dict[str, Any]]):
In case of a csv: str with the file path or the content of the file itself. In case of a json file: dict for the content of the file of str for the file content or file path. In case of an excel file: btyes for the content or str for the file path
mapping (Union[str, Dict[str, Any]]): File path to the mapping file to be parsed or a dictionary with the mapping.
parser (Parser): Parser to be used depending on the type of raw data file.
parser_args (Dict[str, Any]): A dictionary with specific arguments for the parser. These are passed to the parser
as keyword arguments. - config (Union[Dict[str, Any], Config]): Configuration object. Defaults to a new instance of Config. - additional_triples (Optional[Union[str, Graph]]): File path or rdflib-object for a Graph with extra triples for the resulting pipeline graph.
- additional_triples: str | Graph | None#
- property dataframe: Dict[str, Any]#
Return dataframe
- property dataframe_metadata: List[BasicConceptMapping]#
Return list object with dataframe metadata
- property general_metadata: List[BasicConceptMapping]#
Return list object with general metadata
- property graph: Graph#
Returns a graph object based on the pipeline’s JSON-LD data.
The graph object is created with the identifier specified through the pipeline. It is then populated with the JSON-LD data from the pipeline, and if additional triples are provided, they are validated and added to the graph.
- Returns:
A graph object containing the pipeline’s data.
- Return type:
Graph
- property json_ld: Dict[str, Any]#
Returns a dictionary of JSON-LD for the graph based on the pipeline mode.
If the pipeline mode is ABOX, it returns a dictionary containing the context, id, type, and distribution information of the dataset. If the suppress_file_description config is False, it also includes the file description. Otherwise, it returns the JSON-LD of the ABox parser.
If the pipeline mode is TBOX, it returns the JSON-LD of the TBox parser.
- Parameters:
None –
- Returns:
A dictionary of JSON-LD for the graph.
- Return type:
Dict[str, Any]
- mapping: str | List[Any]#
- mode: PipelineMode#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'use_enum_values': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- parser: Parser#
- parser_args: Dict[str, Any]#
- property plain_metadata: Dict[str, Any]#
Metadata as flat json - without units and iris. Useful e.g. for the custom properties of the DSMS.
- raw_data: str | bytes | Dict[str, Any] | List[Dict[str, Any]]#
- property time_series: Dict[str, Any]#
- property time_series_metadata: List[BasicConceptMapping]#
Mappings and graph models#
Base#
Basic data2rdf models
- class data2rdf.models.base.BaseConfigModel(*, config: ~data2rdf.config.Config = <factory>)[source]#
Bases:
BaseModelBasic model for holding the data2rdf config
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class data2rdf.models.base.BasicConceptMapping(*, config: ~data2rdf.config.Config = <factory>, key: str | None = None)[source]#
Bases:
BaseConfigModelBasic mapping for a concept in a file
- key: str | None#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class data2rdf.models.base.BasicGraphModel(*, config: ~data2rdf.config.Config = <factory>, key: str | None = None)[source]#
Bases:
BasicConceptMappingBasic model for merging data with mappings to become a graph
- property graph: Graph#
Return graph object based on json-ld
- abstract property json_ld: Dict[str, Any]#
Return dict for json-ld of graph
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class data2rdf.models.base.BasicSuffixModel(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None)[source]#
Bases:
BaseConfigModelPydantic BaseModel for suffix and type of a class instance
- iri: str | AnyUrl | List[str | AnyUrl]#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- suffix: str | None#
- classmethod validate_iri(value: AnyUrl | List[AnyUrl]) AnyUrl[source]#
Make sure that there are not blank spaces in the IRI
- classmethod validate_suffix(self: BasicSuffixModel) BasicSuffixModel[source]#
Return suffix for individal
Graph#
Models for graph construction from semantic concepts
- class data2rdf.models.graph.ClassTypeGraph(*, config: ~data2rdf.config.Config = <factory>, key: str | None = None, suffix: str, rdfs_type: str = 'owl:Class', annotation_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None, object_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None, data_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None, rdfs_properties: ~typing.List[~data2rdf.models.graph.ValueRelationMapping] | None = None)[source]#
Bases:
BasicGraphModelGraph of a potential concept or class in the T Box.
- annotation_properties: List[ValueRelationMapping] | None#
- data_properties: List[ValueRelationMapping] | None#
- property json_ld: Dict[str, Any]#
Return dict for json-ld of graph
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- object_properties: List[ValueRelationMapping] | None#
- rdfs_properties: List[ValueRelationMapping] | None#
- rdfs_type: str#
- suffix: str#
- class data2rdf.models.graph.MeasurementUnit(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl, label: str | None = None, symbol: str | None = None, namespace: str | None = None)[source]#
Bases:
BaseConfigModel- iri: str | AnyUrl#
- label: str | None#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- namespace: str | None#
- symbol: str | None#
- classmethod validate_measurement_unit(self) MeasurementUnit[source]#
- class data2rdf.models.graph.PropertyGraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, value: str | int | float | bool | ~pydantic.networks.AnyUrl | ~data2rdf.models.graph.PropertyGraph | ~data2rdf.models.graph.QuantityGraph | None = None, annotation: str | ~pydantic.networks.AnyUrl | None = None, value_relation: str | ~pydantic.networks.AnyUrl | None = 'rdfs:label', value_relation_type: ~data2rdf.models.base.RelationType | None = None, value_datatype: str | None = None)[source]#
Bases:
BasicGraphModel,BasicSuffixModelMapping for an individual with arbitrary property. E.g. the name of a tester or a testing facility. The value must not have a discrete value but can also be a reference to a column in a table or dataframe.
- annotation: str | AnyUrl | None#
- property json_ld: Dict[str, Any]#
Return dict of json-ld for graph
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property types_json: Dict[str, Any]#
Dict of json-ld for class types of the individual
- classmethod validate_annotation(value: AnyUrl) AnyUrl[source]#
Make sure that there are not blank spaces in the IRI
- classmethod validate_property_graph(self: PropertyGraph) PropertyGraph[source]#
Validate property graph in order to generate annotations
- classmethod validate_value(self: PropertyGraph) PropertyGraph[source]#
Validate value of a property graph.
In case the value is a property graph or a quantity graph, make sure that the config is set correctly.
- value: str | int | float | bool | AnyUrl | PropertyGraph | QuantityGraph | None#
- value_datatype: str | None#
- property value_json: Dict[str, str] | None#
- value_relation: str | AnyUrl | None#
- value_relation_type: RelationType | None#
- class data2rdf.models.graph.QuantityGraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, unit: str | ~pydantic.networks.AnyUrl | None = None, value: int | float | str | None = None, unit_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:hasUnit', value_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:value', measurement_unit: ~data2rdf.models.graph.MeasurementUnit | None = None)[source]#
Bases:
BasicGraphModel,BasicSuffixModelQuantity with or without a discrete value and a unit E.g. a quantity with a single value and unit _or_ a quantity describing a column of a dataframe or table with a unit.
- property json_ld: Dict[str, Any]#
Return dict of json-ld for graph
- measurement_unit: MeasurementUnit | None#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- unit: str | AnyUrl | None#
- property unit_json: Dict[str, Any]#
Return json with unit definition
- unit_relation: str | AnyUrl | None#
- classmethod validate_quantity_graph(self) QuantityGraph[source]#
- value: int | float | str | None#
- property value_json: Dict[str, Any]#
Return json with value definition
- value_relation: str | AnyUrl | None#
- class data2rdf.models.graph.ValueRelationMapping(*, value: str | int | float | bool | AnyUrl, relation: str | AnyUrl, datatype: str | None = None)[source]#
Bases:
BaseModelMapping between a object/data/annotation property and a value resolved from a location in the data file
- datatype: str | None#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- relation: str | AnyUrl#
- value: str | int | float | bool | AnyUrl#
Mapping#
Mapping models for data2rdf
- class data2rdf.models.mapping.ABoxBaseMapping(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, unit: str | ~pydantic.networks.AnyUrl | None = None, annotation: str | ~pydantic.networks.AnyUrl | None = None, custom_relations: ~typing.List[~data2rdf.models.mapping.CustomRelation] | None = None, source: str | None = None, value_location: str | None = None, unit_location: str | None = None, value_relation: str | ~pydantic.networks.AnyUrl | None = None, value_relation_type: ~data2rdf.models.base.RelationType | None = None, value_datatype: str | None = None, unit_relation: str | ~pydantic.networks.AnyUrl | None = None, suffix_from_location: bool = False)[source]#
Bases:
BasicConceptMapping,BasicSuffixModelBase class for mapping during A Box modelling
- annotation: str | AnyUrl | None#
- custom_relations: List[CustomRelation] | None#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- source: str | None#
- suffix_from_location: bool#
- unit: str | AnyUrl | None#
- unit_location: str | None#
- unit_relation: str | AnyUrl | None#
- classmethod validate_model(self: ABoxBaseMapping) ABoxBaseMapping[source]#
Validate model
- value_datatype: str | None#
- value_location: str | None#
- value_relation: str | AnyUrl | None#
- value_relation_type: RelationType | None#
- class data2rdf.models.mapping.ABoxExcelMapping(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, key: str | None = None, unit: str | ~pydantic.networks.AnyUrl | None = None, annotation: str | ~pydantic.networks.AnyUrl | None = None, custom_relations: ~typing.List[~data2rdf.models.mapping.CustomRelation] | None = None, source: str | None = None, value_location: str | None = None, unit_location: str | None = None, value_relation: str | ~pydantic.networks.AnyUrl | None = None, value_relation_type: ~data2rdf.models.base.RelationType | None = None, value_datatype: str | None = None, unit_relation: str | ~pydantic.networks.AnyUrl | None = None, suffix_from_location: bool = False, dataframe_start: str | None = None, worksheet: str | None = None)[source]#
Bases:
ABoxBaseMappingA special model for mapping from excel files to semantic concepts in the ABox
- dataframe_start: str | None#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- worksheet: str | None#
- class data2rdf.models.mapping.CustomRelation(*, relation: str | AnyUrl, object_location: str | None, object_data_type: str | CustomRelationPropertySubgraph | CustomRelationQuantitySubgraph | None = None, relation_type: RelationType | None = None)[source]#
Bases:
BaseModelCustom relation model
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- object_data_type: str | CustomRelationPropertySubgraph | CustomRelationQuantitySubgraph | None#
- object_location: str | None#
- relation: str | AnyUrl#
- relation_type: RelationType | None#
- class data2rdf.models.mapping.CustomRelationPropertySubgraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, concatenate: bool | None = False, value_relation: str | None = 'rdfs:label')[source]#
Bases:
PropertySubgraphBaseModel- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value_relation: str | None#
- class data2rdf.models.mapping.CustomRelationQuantitySubgraph(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, concatenate: bool | None = False, unit_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:hasUnit', value_relation: str | ~pydantic.networks.AnyUrl | None = 'qudt:value', unit: str | ~pydantic.networks.AnyUrl | None = None)[source]#
Bases:
PropertySubgraphBaseModel- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- unit: str | AnyUrl | None#
- unit_relation: str | AnyUrl | None#
- value_relation: str | AnyUrl | None#
- class data2rdf.models.mapping.PropertySubgraphBaseModel(*, config: ~data2rdf.config.Config = <factory>, iri: str | ~pydantic.networks.AnyUrl | ~typing.List[str | ~pydantic.networks.AnyUrl], suffix: str | None = None, concatenate: bool | None = False)[source]#
Bases:
BasicSuffixModel- concatenate: bool | None#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class data2rdf.models.mapping.TBoxBaseMapping(*, config: ~data2rdf.config.Config = <factory>, key: str, relation: str | ~pydantic.networks.AnyUrl, relation_type: ~data2rdf.models.base.RelationType, datatype: str | None = None)[source]#
Bases:
BasicConceptMappingMapping between a object/data/annotation property and a value under a location in the data file. This
- datatype: str | None#
- key: str#
- model_config: ClassVar[ConfigDict] = {'exclude': {'config'}}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- relation: str | AnyUrl#
- relation_type: RelationType#
Configuration#
- class data2rdf.config.Config(_case_sensitive: bool | None = None, _nested_model_default_partial_update: bool | None = None, _env_prefix: str | None = None, _env_file: DotenvType | None = PosixPath('.'), _env_file_encoding: str | None = None, _env_ignore_empty: bool | None = None, _env_nested_delimiter: str | None = None, _env_nested_max_split: int | None = None, _env_parse_none_str: str | None = None, _env_parse_enums: bool | None = None, _cli_prog_name: str | None = None, _cli_parse_args: bool | list[str] | tuple[str, ...] | None = None, _cli_settings_source: CliSettingsSource[Any] | None = None, _cli_parse_none_str: str | None = None, _cli_hide_none_type: bool | None = None, _cli_avoid_json: bool | None = None, _cli_enforce_required: bool | None = None, _cli_use_class_docs_for_groups: bool | None = None, _cli_exit_on_error: bool | None = None, _cli_prefix: str | None = None, _cli_flag_prefix_char: str | None = None, _cli_implicit_flags: bool | None = None, _cli_ignore_unknown_args: bool | None = None, _cli_kebab_case: bool | Literal['all', 'no_enums'] | None = None, _cli_shortcuts: Mapping[str, str | list[str]] | None = None, _secrets_dir: PathType | None = None, *, qudt_units: str | AnyUrl = 'http://qudt.org/2.1/vocab/unit', qudt_quantity_kinds: str | AnyUrl = 'http://qudt.org/vocab/quantitykind/', language: str = 'en', base_iri: str | AnyUrl = 'https://www.example.org', prefix_name: str = 'fileid', separator: str = '/', encoding: str = 'utf-8', data_download_uri: str | AnyUrl = 'https://www.example.org/download', graph_identifier: str | AnyUrl | None = None, namespace_placeholder: str | AnyUrl = 'http://abox-namespace-placeholder.org/', remove_from_unit: List[str] = ['[', ']', '"', ' '], mapping_csv_separator: str = ';', remove_from_datafile: List[str] = ['"', '\r', '\n'], suppress_file_description: bool = False, exclude_ontology_title: bool = False)[source]#
Bases:
BaseSettingsData2RDF configuration
- base_iri: str | AnyUrl#
- data_download_uri: str | AnyUrl#
- encoding: str#
- exclude_ontology_title: bool#
- graph_identifier: str | AnyUrl | None#
- language: str#
- mapping_csv_separator: str#
- model_config: ClassVar[SettingsConfigDict] = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': '', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- namespace_placeholder: str | AnyUrl#
- prefix_name: str#
- qudt_quantity_kinds: str | AnyUrl#
- qudt_units: str | AnyUrl#
- remove_from_datafile: List[str]#
- remove_from_unit: List[str]#
- separator: str#
- suppress_file_description: bool#
Parsers#
Base parser module#
Data2RDF base model for parsers
- class data2rdf.parsers.base.ABoxBaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>)[source]#
Bases:
AnyBoxBaseParserBasic Parser for ABox mode
- property dataframe: pd.DataFrame#
Return times series found in the data as pd.DataFrame
- property dataframe_metadata: List[BasicConceptMapping]#
Return list object with general metadata
- property general_metadata: List[BasicConceptMapping]#
Return list object with general metadata
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- property plain_metadata: List[Dict[str, Any]]#
- to_dict(schema: Callable | None = None) Dict[str, Any] | List[Dict[str, Any]][source]#
Return general metadata as a list of dictionaries.
The list contains dictionaries, where the key is the label of the metadata, and the value is a dictionary with the keys ‘label’ and ‘value’. If the metadata has a measurement unit associated with it, the dictionary will also contain the key ‘measurement_unit’ with the value of the measurement unit.
If the schema parameter is provided, it will be used to transform the metadata list. The schema should be a callable which takes the list of metadata dictionaries and returns the transformed metadata.
If no schema is provided, the function will return a dictionary where the keys are the labels of the metadata, and the values are the dictionaries from the list.
- Parameters:
schema – A callable which takes a list of dictionaries and returns the transformed metadata.
- Returns:
A dictionary or list of dictionaries with the metadata.
- class data2rdf.parsers.base.AnyBoxBaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>)[source]#
Bases:
BaseParserBasic parser for A Box or T Box producing an RDF
- property graph: Graph#
Return RDF Graph from the parsed data.
- abstract property json_ld: Dict[str, Any]#
Return dict for json-ld for the graph
- abstract property mapping_model: BaseParser#
Pydantic model for validating mapping. Must be a subclass of ABoxBaseParser or TBoxBaseParser.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod run_parser(self: BaseParser) BaseParser[source]#
Runs the parser for the given data file and mapping.
This function is a class method that takes in a self parameter, which is an instance of the BaseParser class. It loads the data file using the _load_data_file method and loads the mapping file using the load_mapping_file function. It then runs the parser using the _run_parser method and returns the parsed BaseParser instance.
- Parameters:
self (BaseParser) – The instance of the BaseParser class.
- Returns:
The parsed BaseParser instance.
- Return type:
- class data2rdf.parsers.base.BaseFileParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#
Bases:
BaseParserBase model for data files which can be run in abox or tbox mode. The respective ABoxBaseParser and TBoxBaseParser must be set as properties for this model. The childclasses of this BaseFileParser will be directly used by the main Data2RDF class later.
- property abox: ABoxBaseParser#
Return instance of the abox_parser after model validation
- property dataframe: Dict[str, Any]#
Return dataframe
- property dataframe_metadata: List[BasicConceptMapping]#
Return dataframe metadata
- classmethod execute_parser(self: BaseFileParser) BaseFileParser[source]#
Validates the parser model and executes the parser based on the specified mode.
- Parameters:
self – An instance of the BaseFileParser class.
- Returns:
An instance of the BaseFileParser class with the parser executed.
- property general_metadata: List[BasicConceptMapping]#
Return list object with general metadata
- property graph: Graph#
Return RDFlib Graph
- property json_ld: Dict[str, Any]#
Return JSON LD representation of graph
- abstract property media_type: str | AnyUrl#
IANA Media type definition of the resources to be parsed.
- mode: PipelineMode#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- parser_args: Dict[str, Any]#
- property plain_metadata: Dict[str, Any]#
Metadata as flat json - without units and iris. Useful e.g. for the custom properties of the DSMS.
- property tbox: TBoxBaseParser#
Return instance of the tbox_parser after model validation
- class data2rdf.parsers.base.BaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>)[source]#
Bases:
BaseModelBasic Parser for any data file and mode
- dropna: bool#
- mapping: str | List[Any]#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- raw_data: str | bytes | Dict[str, Any] | List[Dict[str, Any]]#
- class data2rdf.parsers.base.TBoxBaseParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '')[source]#
Bases:
AnyBoxBaseParserBasic Parser for TBox mode
- authors: List[str] | None#
- property classes: List[BasicConceptMapping]#
Return list object with class models
- fillna: Any | None#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- ontology_iri: str | AnyUrl | None#
- ontology_title: str | None#
- rdfs_type_location: str | None#
- suffix_location: str#
- version_info: str | None#
CSV parser module#
CSV Parser for data2rdf
- class data2rdf.parsers.csv.CSVABoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.ABoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, metadata_sep: str | None = None, metadata_length: int, dataframe_sep: str | None = None, dataframe_header_length: int = 2, fillna: ~typing.Any | None = '')[source]#
Bases:
ABoxBaseParserCSV file parser in abox mode
- dataframe_header_length: int#
- dataframe_sep: str | None#
- fillna: Any | None#
- property json_ld: Dict[str, Any]#
Returns a JSON-LD representation of the CSV data in ABox mode.
This method generates a JSON-LD object that describes the CSV data, including its metadata, dataframe data, and relationships between them.
The returned JSON-LD object is in the format of a csvw:TableGroup, which contains one or more csvw:Table objects. Each csvw:Table object represents a table in the CSV data, and contains information about its columns, rows, and relationships to other tables.
The JSON-LD object also includes context information, such as namespace prefixes and base URLs, to help with serialization and deserialization.
Returns: Dict[str, Any]: A JSON-LD object representing the CSV data in ABox mode.
- mapping: str | List[ABoxBaseMapping]#
- property mapping_model: ABoxBaseMapping#
ABox Mapping Model for CSV Parser
- metadata_length: int#
- metadata_sep: str | None#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class data2rdf.parsers.csv.CSVParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#
Bases:
BaseFileParserParser for CSV/TSV files
- property media_type: str#
IANA Media type definition of the resource to be parsed.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class data2rdf.parsers.csv.CSVTBoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.TBoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '', column_sep: str | None = ',', header_length: int = 1)[source]#
Bases:
TBoxBaseParserCSV file parser in tbox mode
- column_sep: str | None#
- fillna: Any | None#
- header_length: int#
- property json_ld: Dict[str, Any]#
Make the json-ld if pipeline is in abox-mode
- mapping: str | List[TBoxBaseMapping]#
- property mapping_model: TBoxBaseMapping#
TBox Mapping Model for CSV Parser
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
Excel parser module#
Data2rdf excel parser
- class data2rdf.parsers.excel.ExcelABoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.ABoxExcelMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, unit_from_macro: bool = False, unit_macro_location: int = -1)[source]#
Bases:
ABoxBaseParserParses a data file of type excel in a box mode
- property json_ld: Dict[str, Any]#
Returns the JSON-LD representation of the data in ABox mode.
The JSON-LD is constructed based on the metadata and dataframe data. If the file description is not suppressed, it includes the metadata and dataframe data tables. Otherwise, it returns a list of JSON-LD representations of the individual models.
- Returns:
A dictionary representing the JSON-LD data.
- mapping: str | List[ABoxExcelMapping]#
- property mapping_model: ABoxExcelMapping#
Mapping Model
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- unit_from_macro: bool#
- unit_macro_location: int#
- class data2rdf.parsers.excel.ExcelParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#
Bases:
BaseFileParserParser for excel files
- property media_type: str#
IANA Media type definition of the resource to be parsed.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class data2rdf.parsers.excel.ExcelTBoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.TBoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '', sheet: str, header_length: int = 1)[source]#
Bases:
TBoxBaseParserParses a data file of type excel in b box mode
- header_length: int#
- property json_ld: Dict[str, Any]#
Make the json-ld if pipeline is in abox-mode
- mapping: str | List[TBoxBaseMapping]#
- property mapping_model: TBoxBaseMapping#
TBox Mapping Model
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- sheet: str#
Json parser module#
Data2rdf excel parser
- class data2rdf.parsers.json.JsonABoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.ABoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, expand_array: bool = False)[source]#
Bases:
ABoxBaseParserParser for JSON in ABox mode
- expand_array: bool#
- property json_ld: Dict[str, Any]#
Returns the JSON-LD representation of the parser’s data.
This method generates the JSON-LD representation of the parser’s data, including the context, id, type, and members. The members are generated based on the general metadata and dataframe metadata.
The method returns a dictionary containing the JSON-LD representation.
- Returns:
A dictionary containing the JSON-LD representation.
- Return type:
Dict[str, Any]
- mapping: str | List[ABoxBaseMapping]#
- property mapping_model: ABoxBaseMapping#
ABox mapping model
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class data2rdf.parsers.json.JsonParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~typing.Any], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, mode: ~data2rdf.modes.PipelineMode = PipelineMode.ABOX, parser_args: ~typing.Dict[str, ~typing.Any] = {})[source]#
Bases:
BaseFileParserParses a data file of type json
- property media_type: str#
IANA Media type definition of the resource to be parsed.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class data2rdf.parsers.json.JsonTBoxParser(*, raw_data: str | bytes | ~typing.Dict[str, ~typing.Any] | ~typing.List[~typing.Dict[str, ~typing.Any]], mapping: str | ~typing.List[~data2rdf.models.mapping.TBoxBaseMapping], dropna: bool = False, config: ~data2rdf.config.Config = <factory>, suffix_location: str, rdfs_type_location: str | None = None, version_info: str | None = None, ontology_iri: str | ~pydantic.networks.AnyUrl | None = None, ontology_title: str | None = None, authors: ~typing.List[str] | None = None, fillna: ~typing.Any | None = '')[source]#
Bases:
TBoxBaseParserParser for JSON in TBox mode
- property json_ld: Dict[str, Any]#
Return JSON-LD in TBox mode
- mapping: str | List[TBoxBaseMapping]#
- property mapping_model: TBoxBaseMapping#
TBox mapping model
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.