gen3_validator.parsers package¶
Submodules¶
gen3_validator.parsers.parse_data module¶
- class gen3_validator.parsers.parse_data.ParseData(data_folder_path: str | None = None, data_file_path: str | None = None, link_suffix: str = 's')¶
Bases:
object
Parses JSON data from a specified folder or a single file, and constructs a dictionary representation of the data.
- get_node_names() list ¶
Retrieves the names of nodes from the JSON files.
This method iterates over the list of file paths and extracts the node names by removing the ‘.json’ extension from each file name.
Returns: - list: A list of node names extracted from the JSON file paths.
- list_data_files() list ¶
Lists all JSON data files in the specified folder or returns the single file path.
This method checks if a folder path is provided. If so, it lists all files in the folder that have a ‘.json’ extension and returns their absolute paths. If no folder path is provided, it returns the single file path specified during initialization.
Returns: - list: A list of absolute file paths to JSON files.
- load_json_data(json_paths: list, link_suffix: str = 's') dict ¶
- read_json(path: str) dict ¶
- return_data(node: str) dict ¶
Retrieves data for a specified node.
This method accesses the data dictionary and returns the data associated with the given node name.
Parameters: - node (str): The name of the node for which data is to be retrieved.
Returns: - dict: A dictionary containing the data for the specified node.
gen3_validator.parsers.parse_xlsx module¶
- class gen3_validator.parsers.parse_xlsx.ParseXlsxMetadata(xlsx_path: str, link_suffix: str = 's', skip_rows: int = 0)¶
Bases:
object
Converts a specified sheet from the metadata dictionary (generated by https://github.com/AustralianBioCommons/gen3-metadata-templates) to a JSON file. Also formats and renames the primary and foreign keys into a gen3 compatible format.
- xlsx_path¶
The path to the Excel file containing metadata templates.
- Type:
str
- link_suffix¶
A suffix to append to link identifiers, default is ‘s’. e.g. if you name your links as “nodeName_link” you may set this to “_link”
- Type:
str
- format_pd_to_json(xlsx_data_dict: dict, sheet_name: str) list ¶
formats the pandas data frame into a specific json format
- Parameters:
xlsx_data_dict (dict) – A dictionary where each key is a sheet name and each
DataFrame. (value is a)
sheet_name (str) – The name of the sheet to convert to JSON.
- Returns:
list of json objects with specific key pair structure
- Return type:
list
- get_pk_fk_pairs(xlsx_data_dict: dict, sheet_name: str) tuple ¶
Extracts the primary key (PK) and foreign key (FK) column names from a specified sheet.
This method retrieves the first two column names from the given sheet in the Excel data dictionary, assuming the first column is the primary key and the second column is the foreign key.
- Parameters:
xlsx_data_dict (dict) – A dictionary where each key is a sheet name and
DataFrame. (each value is a)
sheet_name (str) – The name of the sheet from which to extract the PK and FK.
- Returns:
A tuple containing the primary key and foreign key column names.
- Return type:
tuple
- get_sheet_names() list ¶
Retrieves the names of all sheets in the Excel file.
- Returns:
A list of sheet names present in the Excel file.
- Return type:
list
- parse_metadata_template() dict ¶
Parses an Excel file and converts each sheet into a DataFrame.
This function reads an Excel file specified by the xlsx_path and loads each sheet into a dictionary where the keys are the sheet names and the values are the DataFrames representing the data in those sheets. The first few rows of each DataFrame are removed based on the skip_rows attribute.
- Returns:
A dictionary where each key is a sheet name and each value is a DataFrame containing the data from that sheet, with the specified number of rows removed.
- Return type:
dict
- pd_to_json(xlsx_data_dict: dict, sheet_name: str, json_path: str) None ¶
Writes a list of json objects to a json file
- Parameters:
xlsx_data_dict (dict) – A dictionary where each key is a sheet name and each
DataFrame. (value is a)
sheet_name (str) – The name of the sheet to convert to JSON.
json_path (str) – The path to the JSON file to be saved.
- Returns:
None
- write_dict_to_json(xlsx_data_dict: dict, output_dir: str) None ¶
Writes a dictionary of pandas DataFrames to JSON files.
- Parameters:
xlsx_data_dict (dict) – The dictionary containing DataFrames to be written to JSON files.
output_dir (str) – The directory where JSON files will be created.
- Returns:
None