BASIC English Conversion

This script converts input text into BASIC English using a Large Language Model (LLM). It processes a JSON input file containing the text and configuration, interacts with the LLM according to specific constraints (using only the 850-word BASIC English vocabulary, short sentences, etc.), and saves the simplified text along with metadata to an output JSON file.

This script converts input text into BASIC English using a Large Language Model (LLM). It processes a JSON input file containing the text and configuration, interacts with the LLM according to specific constraints (using only the 850-word BASIC English vocabulary, short sentences, etc.), and saves the simplified text along with metadata to an output JSON file.

Purpose

The primary goal of this script is to simplify complex English text into BASIC English, making it more accessible and easier to understand. It leverages an LLM to perform the translation while adhering to the strict rules of BASIC English.

Usage

Run the script from the command line:

python basic-english.py <input_json> [output_json]
  • <input_json>: (Required) Path to the input JSON file containing the text and configuration.
  • [output_json]: (Optional) Path where the output JSON file should be saved. If not provided, a path will be generated automatically in the output/ directory based on the script name and a UUID.

Note: The script utilizes the handle_command_args function from utils.py for argument parsing. However, based on its usage within the main function, it primarily expects the input and optional output file paths and does not seem to utilize other potential flags handled by handle_command_args (like -uuid or -saveInputs).

Input Files

The script expects an input JSON file with the following structure:

{
  "input_text": [
    "Line 1 of the text to be converted.",
    "Line 2 of the text."
  ],
  "output_format": "Desired format description (e.g., 'plain text list')",
  "model": "Name of the LLM model to use (e.g., 'ollama/llama3')",
  "parameters": {
    // Optional: LLM parameters like temperature, top_p, etc.
  },
  "success_criteria": {
    // Optional: Criteria for successful conversion
  }
}
  • input_text: A list of strings, where each string is a line or segment of the text to convert.
  • output_format: A string describing the desired output format (used in the prompt).
  • model: The identifier for the LLM to be used via the chat_with_llm function.
  • parameters: (Optional) A dictionary of parameters to pass to the LLM.
  • success_criteria: (Optional) A dictionary defining success criteria, which will be included in the prompt if present.

Key Functions

  • translate_basic_english(input_data):
    • Constructs the system and user prompts for the LLM.
    • Calls the chat_with_llm utility function to interact with the specified LLM.
    • Returns the raw content string received from the LLM.
  • main():
    • Handles command-line arguments using utils.handle_command_args.
    • Loads the input JSON data using utils.load_json.
    • Calls translate_basic_english to get the converted text.
    • Determines the output file path using utils.get_output_filepath.
    • Creates metadata (including script name, start time, UUID) using utils.create_output_metadata.
    • Combines the metadata and the converted text (split into lines) into a final dictionary.
    • Saves the output dictionary to a JSON file using utils.save_output.
  • Utility Functions (from utils.py):
    • load_json: Loads data from a JSON file.
    • save_output: Saves data to a JSON file.
    • chat_with_llm: Handles the interaction with the LLM.
    • create_output_metadata: Generates standard metadata for output files.
    • get_output_filepath: Determines the appropriate output file path.
    • handle_command_args: Parses command-line arguments.

LLM Interaction

The script interacts with an LLM via the chat_with_llm function.

  • System Prompt: Instructs the LLM to:
    • Convert the text into BASIC English.
    • Use only words from the 850-word BASIC English list.
    • Make sentences short, clear, and simple.
    • Avoid difficult words, explaining concepts with easy words if necessary.
    • Keep numbers and measurements clear.
    • Leave already simple sentences unchanged.
  • User Prompt: Contains:
    • The input text joined into a single string.
    • The specified output_format.
    • Optional success_criteria if provided in the input JSON.
  • LLM Call: The chat_with_llm function sends these prompts to the specified model along with any provided parameters.
  • Expected Output: The script expects the LLM to return the converted text as a plain string.

Output

The script generates a JSON file containing the results.

  • File Path: Determined by the optional output_json argument or generated automatically (e.g., output/basic-english_<uuid>.json).
  • Content: The JSON file includes:
    • Metadata generated by create_output_metadata (script name, timestamp, UUID, etc.).
    • output_text: A list of strings, representing the BASIC English version of the input text, split by newline characters.

```json { “routine”: “BASIC English conversion”, “timestamp”: “YYYY-MM-DDTHH:MM:SS.ffffff”, “uuid”: “xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx”, // … other metadata fields … “output_text”: [ “This is the first line in BASIC English.”, “This is the second line.” ] }