Search Query Generation
This script utilizes a Large Language Model (LLM) to generate a diverse list of search engine queries based on a user-provided topic. It aims to produce queries that facilitate comprehensive and relevant information retrieval.
This script utilizes a Large Language Model (LLM) to generate a diverse list of search engine queries based on a user-provided topic. It aims to produce queries that facilitate comprehensive and relevant information retrieval.
Purpose¶
The primary goal of search-queries.py is to automate the creation of varied search queries for a given subject. This includes general overview queries, specific queries targeting authoritative sources, question-based queries, alternative phrasings, and queries using advanced search operators.
Usage¶
Run the script from the command line:
python search-queries.py <input_json> [output_json]
<input_json>: (Required) Path to the input JSON file containing the topic and configuration.[output_json]: (Optional) Path to save the output JSON file. If not provided, a path is generated automatically based on the script name and a UUID.
The script uses the handle_command_args utility function to parse these arguments.
Input Files¶
The script expects a JSON input file with the following structure:
topic: (String, Required) The subject for which to generate search queries.model: (String, Optional) The identifier for the LLM to use (defaults to “gemma3” if not provided in the script, although the script shows it defaults to “gemma3” within the function).parameters: (Object, Optional) Any additional parameters to pass to the LLM during the chat interaction (e.g., temperature, top_p).
Example (examples/search-queries-in.json):
{
"topic": "Sustainable agriculture practices",
"model": "gemma3",
"parameters": {
"temperature": 0.7
}
}
Key Functions¶
generate_search_queries(input_data): Takes the loaded input data, constructs prompts, interacts with the LLM viachat_with_llm, processes the response usingparse_llm_json_response, and returns a list of generated queries.main(): Handles command-line arguments usinghandle_command_args, loads the input JSON usingload_json, callsgenerate_search_queries, prepares metadata usingcreate_output_metadataandget_output_filepath, structures the final output data, and saves it usingsave_output.- Utility Functions (from
utils.py):load_json: Loads data from a JSON file.save_output: Saves data to a JSON file.chat_with_llm: Manages the interaction with the specified LLM.parse_llm_json_response: Parses the LLM’s response, attempting to interpret it as JSON or splitting it into lines if parsing fails.create_output_metadata: Generates standard metadata (script name, timestamp, UUID).get_output_filepath: Determines the appropriate output file path.handle_command_args: Parses command-line arguments for input and output file paths.
LLM Interaction¶
- System Prompt Construction: A detailed system message instructs the LLM to act as a search query generation assistant. It specifies the goal: generate multiple high-quality queries for a given topic, covering various types (general, specific, question-based, alternative phrasings, advanced operators like
site:,filetype:,intitle:). - Output Format Request: The system prompt explicitly asks the LLM to output the queries in a simple list format, with each query separated by a single newline, and without any numbers, symbols, or extra formatting.
- User Prompt Construction: A simple user message is created, providing the
topicfrom the input file to the LLM. - LLM Call: The
chat_with_llmfunction is called with the specifiedmodel, the constructed system and user messages, and any optionalparameters.
Output Processing¶
- Parsing: The raw text response from the LLM is passed to the
parse_llm_json_responseutility function withinclude_children=False. This function attempts to parse the response. Based on typical utility function behavior, it likely tries to parse the response as a JSON list first. If that fails, it might split the raw text by newlines to create a list of strings, where each string is a query. - Validation & Fallback: The script checks if the result from
parse_llm_json_responseis a list. If it’s not a list (indicating parsing or processing failed to produce the expected structure), it defaults to a list containing a single fallback message:["No valid search queries could be generated"].
Output¶
The script generates a JSON output file containing:
metadata: An object with information about the script execution (script name, timestamp, UUID).topic: The original topic string provided in the input file.queries: A list containing the search queries generated by the LLM and processed byparse_llm_json_response. If generation or parsing failed, this list will contain the fallback message.
Example (examples/search-queries-out.json structure):
```json { “metadata”: { “script”: “Search Queries”, “timestamp”: “YYYY-MM-DDTHH:MM:SS.ffffff”, “uuid”: “xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx” }, “topic”: “Sustainable agriculture practices”, “queries”: [ “sustainable agriculture definition”, “benefits of sustainable farming”, “types of sustainable agriculture practices”, “organic farming vs sustainable agriculture”, “site:fao.org sustainable agriculture”, “filetype:pdf sustainable agriculture techniques”, “"regenerative agriculture" principles”, “how does sustainable agriculture help the environment?”, “challenges facing sustainable agriculture”, “intitle:"sustainable farming" case studies” // … more queries ] }