Utility Functions

This script provides a collection of common helper functions used by various automation and processing scripts within the project. It includes functions for file I/O, JSON handling, interacting with Large Language Models (LLMs) via Ollama, processing LLM responses, managing command-line arguments, and generating metadata.

This script provides a collection of common helper functions used by various automation and processing scripts within the project. It includes functions for file I/O, JSON handling, interacting with Large Language Models (LLMs) via Ollama, processing LLM responses, managing command-line arguments, and generating metadata.

load_json

Loads data from a specified JSON file.

  • Purpose: Reads a JSON file from the given filepath and returns the parsed Python object (typically a dictionary or list).
  • Usage: data = load_json("path/to/input.json")

save_output

Saves Python data structures to a JSON file.

  • Purpose: Writes the provided output_data to the specified output_filepath in JSON format with indentation. It automatically creates the necessary output directory if it doesn’t exist.
  • Usage: save_output(my_data, "output/results.json")

parse_embedded_json

Recursively parses JSON strings embedded within node structures.

  • Purpose: Checks if the step field of a dictionary (node) contains a string that looks like a JSON array or object. If it does, it parses the JSON and replaces the node’s children field with the parsed content. This is useful for handling nested structures generated by LLMs where steps might contain further sub-steps defined as JSON. If the node doesn’t have a title, the original step string is used as the title after parsing.
  • Usage: processed_node = parse_embedded_json(node_with_potential_json_step)

chat_with_llm

Interacts with a specified Ollama LLM.

  • Purpose: Provides a generic interface to send a system message and a user message to an LLM (specified by model) using the ollama library. It accepts optional parameters for the Ollama API call.
  • Returns: The content of the LLM’s response as a cleaned string.
  • Usage: response = chat_with_llm("gemma3", "System prompt", "User query", {"temperature": 0.7})

clean_llm_json_response

Cleans up potential JSON responses from an LLM.

  • Purpose: Attempts to extract a valid JSON object or array from a raw LLM response string. It removes common artifacts like Markdown code fences (```json, ```).
  • Returns: The extracted JSON string if found, otherwise the cleaned-up text.
  • Usage: json_string = clean_llm_json_response(raw_llm_output)

parse_llm_json_response

Parses JSON from an LLM response with fallback handling.

  • Purpose: Takes raw LLM response text, cleans it using clean_llm_json_response, and attempts to parse it as JSON.
  • Fallback: If JSON parsing fails, it splits the cleaned text by lines and returns a list of dictionaries.
    • If include_children is True, each dictionary will have the structure {"step": "line content", "children": []}.
    • If include_children is False (default), each dictionary will have the structure {"step": "line content"}.
  • Usage: parsed_data = parse_llm_json_response(raw_llm_output, include_children=True)

create_output_metadata

Generates standard metadata for output files.

  • Purpose: Creates a dictionary containing common metadata associated with a task’s output, including a UUID (output_uuid), creation timestamp, the task_name, and the time taken (calculated from the start_time).
  • Usage: metadata = create_output_metadata("Data Processing", start_timestamp, generated_uuid)

get_output_filepath

Determines the final path for an output file.

  • Purpose: Returns the appropriate file path for saving output. If specified_path is provided, it’s used directly. Otherwise, it constructs a path within the output/{output_dir} directory using a provided or newly generated output_uuid. It ensures the output directory exists.
  • Returns: A tuple containing the determined filepath and the UUID used.
  • Usage: filepath, uuid = get_output_filepath("processed_data", output_uuid=my_uuid) or filepath, uuid = get_output_filepath("results", specified_path="custom/output/final.json")

handle_command_args

Parses and validates common command-line arguments.

  • Purpose: Processes sys.argv to extract input/output file paths and common flags like -saveInputs (for debugging prompts), -uuid=<value> (to specify an output UUID), and -flow_uuid=<value>. It validates the number of positional arguments against min_args and max_args.
  • Returns: A tuple containing input_filepath, output_filepath, save_inputs (boolean), custom_uuid, and flow_uuid.
  • Usage: input_path, output_path, save_flag, uuid, flow_id = handle_command_args("Usage: script.py <input> [output]", min_args=1, max_args=2)

saveToFile

Saves LLM prompts to a file for debugging.

  • Purpose: Used specifically when the -saveInputs flag is detected by handle_command_args. It saves the system_message and user_message sent to the LLM, along with a timestamp, into a specified JSON filepath.
  • Usage: Typically called internally after parsing arguments if save_inputs is true. saveToFile(system_prompt, user_prompt, "debug/prompts/prompt_abc.json")

translate_to_basic_english

Translates text into simple Basic English using an LLM.

  • Purpose: Takes input text and uses the chat_with_llm function to request a translation into very short, simple Basic English (using the 850-word list). The output is cleaned and truncated to be suitable for use as a file or folder name.
  • Usage: folder_name_part = translate_to_basic_english("Complex technical description")