Utility Functions
This script provides a collection of common helper functions used by various automation and processing scripts within the project. It includes functions for file I/O, JSON handling, interacting with Large Language Models (LLMs) via Ollama, processing LLM responses, managing command-line arguments, and generating metadata.
This script provides a collection of common helper functions used by various automation and processing scripts within the project. It includes functions for file I/O, JSON handling, interacting with Large Language Models (LLMs) via Ollama, processing LLM responses, managing command-line arguments, and generating metadata.
load_json¶
Loads data from a specified JSON file.
- Purpose: Reads a JSON file from the given
filepathand returns the parsed Python object (typically a dictionary or list). - Usage:
data = load_json("path/to/input.json")
save_output¶
Saves Python data structures to a JSON file.
- Purpose: Writes the provided
output_datato the specifiedoutput_filepathin JSON format with indentation. It automatically creates the necessary output directory if it doesn’t exist. - Usage:
save_output(my_data, "output/results.json")
parse_embedded_json¶
Recursively parses JSON strings embedded within node structures.
- Purpose: Checks if the
stepfield of a dictionary (node) contains a string that looks like a JSON array or object. If it does, it parses the JSON and replaces the node’schildrenfield with the parsed content. This is useful for handling nested structures generated by LLMs where steps might contain further sub-steps defined as JSON. If the node doesn’t have atitle, the originalstepstring is used as the title after parsing. - Usage:
processed_node = parse_embedded_json(node_with_potential_json_step)
chat_with_llm¶
Interacts with a specified Ollama LLM.
- Purpose: Provides a generic interface to send a system message and a user message to an LLM (specified by
model) using theollamalibrary. It accepts optionalparametersfor the Ollama API call. - Returns: The content of the LLM’s response as a cleaned string.
- Usage:
response = chat_with_llm("gemma3", "System prompt", "User query", {"temperature": 0.7})
clean_llm_json_response¶
Cleans up potential JSON responses from an LLM.
- Purpose: Attempts to extract a valid JSON object or array from a raw LLM response string. It removes common artifacts like Markdown code fences (
```json,```). - Returns: The extracted JSON string if found, otherwise the cleaned-up text.
- Usage:
json_string = clean_llm_json_response(raw_llm_output)
parse_llm_json_response¶
Parses JSON from an LLM response with fallback handling.
- Purpose: Takes raw LLM response text, cleans it using
clean_llm_json_response, and attempts to parse it as JSON. - Fallback: If JSON parsing fails, it splits the cleaned text by lines and returns a list of dictionaries.
- If
include_childrenisTrue, each dictionary will have the structure{"step": "line content", "children": []}. - If
include_childrenisFalse(default), each dictionary will have the structure{"step": "line content"}.
- If
- Usage:
parsed_data = parse_llm_json_response(raw_llm_output, include_children=True)
create_output_metadata¶
Generates standard metadata for output files.
- Purpose: Creates a dictionary containing common metadata associated with a task’s output, including a UUID (
output_uuid), creation timestamp, thetask_name, and the time taken (calculated from thestart_time). - Usage:
metadata = create_output_metadata("Data Processing", start_timestamp, generated_uuid)
get_output_filepath¶
Determines the final path for an output file.
- Purpose: Returns the appropriate file path for saving output. If
specified_pathis provided, it’s used directly. Otherwise, it constructs a path within theoutput/{output_dir}directory using a provided or newly generatedoutput_uuid. It ensures the output directory exists. - Returns: A tuple containing the determined filepath and the UUID used.
- Usage:
filepath, uuid = get_output_filepath("processed_data", output_uuid=my_uuid)orfilepath, uuid = get_output_filepath("results", specified_path="custom/output/final.json")
handle_command_args¶
Parses and validates common command-line arguments.
- Purpose: Processes
sys.argvto extract input/output file paths and common flags like-saveInputs(for debugging prompts),-uuid=<value>(to specify an output UUID), and-flow_uuid=<value>. It validates the number of positional arguments againstmin_argsandmax_args. - Returns: A tuple containing
input_filepath,output_filepath,save_inputs(boolean),custom_uuid, andflow_uuid. - Usage:
input_path, output_path, save_flag, uuid, flow_id = handle_command_args("Usage: script.py <input> [output]", min_args=1, max_args=2)
saveToFile¶
Saves LLM prompts to a file for debugging.
- Purpose: Used specifically when the
-saveInputsflag is detected byhandle_command_args. It saves thesystem_messageanduser_messagesent to the LLM, along with a timestamp, into a specified JSONfilepath. - Usage: Typically called internally after parsing arguments if
save_inputsis true.saveToFile(system_prompt, user_prompt, "debug/prompts/prompt_abc.json")
translate_to_basic_english¶
Translates text into simple Basic English using an LLM.
- Purpose: Takes input
textand uses thechat_with_llmfunction to request a translation into very short, simple Basic English (using the 850-word list). The output is cleaned and truncated to be suitable for use as a file or folder name. - Usage:
folder_name_part = translate_to_basic_english("Complex technical description")