gen3_validator.parsers package

Submodules

gen3_validator.parsers.parse_data module

class gen3_validator.parsers.parse_data.ParseData(data_folder_path: str | None = None, data_file_path: str | None = None, link_suffix: str = 's')

Bases: object

Parses JSON data from a specified folder or a single file, and constructs a dictionary representation of the data.

get_node_names() list

Retrieves the names of nodes from the JSON files.

This method iterates over the list of file paths and extracts the node names by removing the ‘.json’ extension from each file name.

Returns: - list: A list of node names extracted from the JSON file paths.

list_data_files() list

Lists all JSON data files in the specified folder or returns the single file path.

This method checks if a folder path is provided. If so, it lists all files in the folder that have a ‘.json’ extension and returns their absolute paths. If no folder path is provided, it returns the single file path specified during initialization.

Returns: - list: A list of absolute file paths to JSON files.

load_json_data(json_paths: list, link_suffix: str = 's') dict
read_json(path: str) dict
return_data(node: str) dict

Retrieves data for a specified node.

This method accesses the data dictionary and returns the data associated with the given node name.

Parameters: - node (str): The name of the node for which data is to be retrieved.

Returns: - dict: A dictionary containing the data for the specified node.

gen3_validator.parsers.parse_xlsx module

class gen3_validator.parsers.parse_xlsx.ParseXlsxMetadata(xlsx_path: str, link_suffix: str = 's', skip_rows: int = 0)

Bases: object

Converts a specified sheet from the metadata dictionary (generated by https://github.com/AustralianBioCommons/gen3-metadata-templates) to a JSON file. Also formats and renames the primary and foreign keys into a gen3 compatible format.

xlsx_path

The path to the Excel file containing metadata templates.

Type:

str

A suffix to append to link identifiers, default is ‘s’. e.g. if you name your links as “nodeName_link” you may set this to “_link”

Type:

str

format_pd_to_json(xlsx_data_dict: dict, sheet_name: str) list

formats the pandas data frame into a specific json format

Parameters:
  • xlsx_data_dict (dict) – A dictionary where each key is a sheet name and each

  • DataFrame. (value is a)

  • sheet_name (str) – The name of the sheet to convert to JSON.

Returns:

list of json objects with specific key pair structure

Return type:

list

get_pk_fk_pairs(xlsx_data_dict: dict, sheet_name: str) tuple

Extracts the primary key (PK) and foreign key (FK) column names from a specified sheet.

This method retrieves the first two column names from the given sheet in the Excel data dictionary, assuming the first column is the primary key and the second column is the foreign key.

Parameters:
  • xlsx_data_dict (dict) – A dictionary where each key is a sheet name and

  • DataFrame. (each value is a)

  • sheet_name (str) – The name of the sheet from which to extract the PK and FK.

Returns:

A tuple containing the primary key and foreign key column names.

Return type:

tuple

get_sheet_names() list

Retrieves the names of all sheets in the Excel file.

Returns:

A list of sheet names present in the Excel file.

Return type:

list

parse_metadata_template() dict

Parses an Excel file and converts each sheet into a DataFrame.

This function reads an Excel file specified by the xlsx_path and loads each sheet into a dictionary where the keys are the sheet names and the values are the DataFrames representing the data in those sheets. The first few rows of each DataFrame are removed based on the skip_rows attribute.

Returns:

A dictionary where each key is a sheet name and each value is a DataFrame containing the data from that sheet, with the specified number of rows removed.

Return type:

dict

pd_to_json(xlsx_data_dict: dict, sheet_name: str, json_path: str) None

Writes a list of json objects to a json file

Parameters:
  • xlsx_data_dict (dict) – A dictionary where each key is a sheet name and each

  • DataFrame. (value is a)

  • sheet_name (str) – The name of the sheet to convert to JSON.

  • json_path (str) – The path to the JSON file to be saved.

Returns:

None

write_dict_to_json(xlsx_data_dict: dict, output_dir: str) None

Writes a dictionary of pandas DataFrames to JSON files.

Parameters:
  • xlsx_data_dict (dict) – The dictionary containing DataFrames to be written to JSON files.

  • output_dir (str) – The directory where JSON files will be created.

Returns:

None

Module contents