BASIC English Conversion
This script converts input text into BASIC English using a Large Language Model (LLM). It processes a JSON input file containing the text and configuration, interacts with the LLM according to specific constraints (using only the 850-word BASIC English vocabulary, short sentences, etc.), and saves the simplified text along with metadata to an output JSON file.
This script converts input text into BASIC English using a Large Language Model (LLM). It processes a JSON input file containing the text and configuration, interacts with the LLM according to specific constraints (using only the 850-word BASIC English vocabulary, short sentences, etc.), and saves the simplified text along with metadata to an output JSON file.
Purpose¶
The primary goal of this script is to simplify complex English text into BASIC English, making it more accessible and easier to understand. It leverages an LLM to perform the translation while adhering to the strict rules of BASIC English.
Usage¶
Run the script from the command line:
python basic-english.py <input_json> [output_json]
<input_json>: (Required) Path to the input JSON file containing the text and configuration.[output_json]: (Optional) Path where the output JSON file should be saved. If not provided, a path will be generated automatically in theoutput/directory based on the script name and a UUID.
Note: The script utilizes the handle_command_args function from utils.py for argument parsing. However, based on its usage within the main function, it primarily expects the input and optional output file paths and does not seem to utilize other potential flags handled by handle_command_args (like -uuid or -saveInputs).
Input Files¶
The script expects an input JSON file with the following structure:
{
"input_text": [
"Line 1 of the text to be converted.",
"Line 2 of the text."
],
"output_format": "Desired format description (e.g., 'plain text list')",
"model": "Name of the LLM model to use (e.g., 'ollama/llama3')",
"parameters": {
// Optional: LLM parameters like temperature, top_p, etc.
},
"success_criteria": {
// Optional: Criteria for successful conversion
}
}
input_text: A list of strings, where each string is a line or segment of the text to convert.output_format: A string describing the desired output format (used in the prompt).model: The identifier for the LLM to be used via thechat_with_llmfunction.parameters: (Optional) A dictionary of parameters to pass to the LLM.success_criteria: (Optional) A dictionary defining success criteria, which will be included in the prompt if present.
Key Functions¶
translate_basic_english(input_data):- Constructs the system and user prompts for the LLM.
- Calls the
chat_with_llmutility function to interact with the specified LLM. - Returns the raw content string received from the LLM.
main():- Handles command-line arguments using
utils.handle_command_args. - Loads the input JSON data using
utils.load_json. - Calls
translate_basic_englishto get the converted text. - Determines the output file path using
utils.get_output_filepath. - Creates metadata (including script name, start time, UUID) using
utils.create_output_metadata. - Combines the metadata and the converted text (split into lines) into a final dictionary.
- Saves the output dictionary to a JSON file using
utils.save_output.
- Handles command-line arguments using
- Utility Functions (from
utils.py):load_json: Loads data from a JSON file.save_output: Saves data to a JSON file.chat_with_llm: Handles the interaction with the LLM.create_output_metadata: Generates standard metadata for output files.get_output_filepath: Determines the appropriate output file path.handle_command_args: Parses command-line arguments.
LLM Interaction¶
The script interacts with an LLM via the chat_with_llm function.
- System Prompt: Instructs the LLM to:
- Convert the text into BASIC English.
- Use only words from the 850-word BASIC English list.
- Make sentences short, clear, and simple.
- Avoid difficult words, explaining concepts with easy words if necessary.
- Keep numbers and measurements clear.
- Leave already simple sentences unchanged.
- User Prompt: Contains:
- The input text joined into a single string.
- The specified
output_format. - Optional
success_criteriaif provided in the input JSON.
- LLM Call: The
chat_with_llmfunction sends these prompts to the specifiedmodelalong with any providedparameters. - Expected Output: The script expects the LLM to return the converted text as a plain string.
Output¶
The script generates a JSON file containing the results.
- File Path: Determined by the optional
output_jsonargument or generated automatically (e.g.,output/basic-english_<uuid>.json). - Content: The JSON file includes:
- Metadata generated by
create_output_metadata(script name, timestamp, UUID, etc.). output_text: A list of strings, representing the BASIC English version of the input text, split by newline characters.
- Metadata generated by
```json { “routine”: “BASIC English conversion”, “timestamp”: “YYYY-MM-DDTHH:MM:SS.ffffff”, “uuid”: “xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx”, // … other metadata fields … “output_text”: [ “This is the first line in BASIC English.”, “This is the second line.” ] }