Page Metadata Generation
This script generates standardized metadata for a technical topic page. It can either use pre-existing metadata provided in the input or leverage a Large Language Model (LLM) to generate new metadata based on the topic name.
This script generates standardized metadata for a technical topic page. It can either use pre-existing metadata provided in the input or leverage a Large Language Model (LLM) to generate new metadata based on the topic name.
Purpose¶
The primary goal of generate-metadata.py is to create consistent and informative metadata for pages within a knowledge base or wiki, specifically focusing on automation topics. This metadata includes a title, subtitle, automation status, progress percentage, and a descriptive explanation.
Usage¶
The script is executed from the command line:
python generate-metadata.py <input_json> [output_json] [-saveInputs] [-uuid="UUID"] [-flow_uuid="FLOW-UUID"]
<input_json>: (Required) Path to the input JSON file containing topic information.[output_json]: (Optional) Path where the output JSON containing the generated metadata should be saved. If not provided, a path is automatically generated based on the process name and a UUID.-saveInputs: (Optional) Flag to save the system and user messages sent to the LLM into theflow/<flowUUID>/inputs/directory. Requires-flow_uuidto be set.-uuid="UUID": (Optional) Specify a custom UUID for the output file naming and process metadata.-flow_uuid="FLOW-UUID": (Optional) Specify a UUID for the overall flow, used for organizing saved inputs when-saveInputsis active.
Input Files¶
The script expects an input JSON file (<input_json>) with the following structure:
{
"topic": "Name of the Technical Topic",
"model": "gemma3", // Optional: LLM model to use (defaults if not provided)
"parameters": {}, // Optional: Parameters for the LLM call
"metadata": { // Optional: Pre-existing metadata object
"title": "Existing Title",
"subtitle": "Existing Subtitle",
"automation_status": "Some Automation",
"automation_percentage": "40%",
"explanation": "Existing explanation text..."
}
}
If the metadata key exists in the input JSON, its value will be used directly, bypassing the LLM generation step.
Key Functions¶
generate_page_metadata(input_data, save_inputs=False):- Checks if
input_datacontains ametadatakey. If yes, returns that metadata. - If no pre-existing metadata, it constructs system and user prompts based on the
topic. - Optionally saves prompts using
saveToFileifsave_inputsis True andflowUUIDis set. - Calls the LLM using
chat_with_llm. - Parses the LLM’s JSON response using
parse_llm_json_response. - Returns the parsed metadata dictionary.
- Checks if
main():- Handles command-line arguments using
handle_command_args. - Sets the global
flowUUID. - Loads the input JSON using
load_json. - Calls
generate_page_metadatato get the metadata. - Determines the output file path and UUID using
get_output_filepath. - Creates process metadata (script name, start time, UUID) using
create_output_metadata. - Combines process metadata and page metadata into a final dictionary.
- Saves the output JSON using
save_output.
- Handles command-line arguments using
- Imported
utilsfunctions:load_json,save_output: Handle file I/O for JSON data.chat_with_llm: Interacts with the specified LLM.parse_llm_json_response: Extracts JSON content from the LLM response string.create_output_metadata: Generates standard metadata about the script execution.get_output_filepath: Determines the appropriate path for saving output files.handle_command_args: Parses arguments passed via the command line.saveToFile: Saves content (like LLM prompts) to a specified file path.
Metadata Generation Logic¶
The script follows two main paths for obtaining metadata:
- Pre-existing Metadata: If the input JSON file contains a top-level key named
"metadata", the script assumes this object contains the required metadata fields and uses it directly. The LLM is not called in this case. - LLM Generation: If the
"metadata"key is not present in the input JSON, the script proceeds to generate it using an LLM:- It constructs a system prompt instructing the AI to act as a metadata specialist for technical topics.
- It constructs a user prompt containing the specific
topicfrom the input file. - It calls the
chat_with_llmfunction with these prompts, the specified model, and any parameters. - The response from the LLM, expected to be a JSON object, is parsed using
parse_llm_json_response.
LLM Interaction¶
When generating metadata via the LLM, the script specifically requests the following fields in JSON format:
title: A short (2-3 words max), descriptive title based on the topic. No subtitles (text after a semicolon) should be included.subtitle: A brief explanation of the topic’s scope.automation_status: The current level of automation, chosen from a predefined list (No Automation, Very Early Automation, Early Automation, Some Automation, Partially Fully Automated, Mostly Fully Automated, Fully Automated).automation_percentage: An estimated percentage (e.g., “25%”) representing progress towards full automation. The prompt emphasizes being critical and avoiding exaggeration.explanation: 2-3 full paragraphs describing the topic and its automation journey.
The interaction relies on the chat_with_llm utility function for the API call and parse_llm_json_response to reliably extract the JSON block from the potentially verbose LLM response.
Output¶
The script generates a JSON output file (at the path specified by [output_json] or an auto-generated path). This file contains:
- Process Metadata: Information about the script execution itself (e.g., script name, timestamp, UUID), generated by
create_output_metadata. - Page Metadata: The core metadata for the topic, under the key
"page_metadata". This is either the metadata object passed directly from the input file or the JSON object generated by the LLM.
Example Output Structure:
```json { “process_name”: “Page Metadata Generation”, “start_time”: “2023-10-27T10:00:00.123456”, “end_time”: “2023-10-27T10:00:05.678910”, “duration_seconds”: 5.555, “uuid”: “generated-or-specified-uuid”, “page_metadata”: { “title”: “Topic Title”, “subtitle”: “Scope explanation.”, “automation_status”: “Some Automation”, “automation_percentage”: “30%”, “explanation”: “Paragraph 1 describing the topic…\n\nParagraph 2 detailing automation journey…” } }