diff --git a/AGENTS.md b/AGENTS.md index 86634ebf93..5fde1d7798 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,23 +1,38 @@ # Repository Guidelines ## Project Structure & Module Organization + The cookbook is organized around runnable examples and reference articles for OpenAI APIs. Place notebooks and Python scripts under `examples//`, grouping related assets inside topic subfolders (for example, `examples/agents_sdk/`). Narrative guides and long-form docs live in `articles/`, and shared diagrams or screenshots belong in `images/`. Update `registry.yaml` whenever you add content so it appears on cookbook.openai.com, and add new author metadata in `authors.yaml` if you want custom attribution. Keep large datasets outside the repo; instead, document how to fetch them in the notebook. ## Build, Test, and Development Commands + Use a virtual environment to isolate dependencies: + - `python -m venv .venv && source .venv/bin/activate` - `pip install -r examples//requirements.txt` (each sample lists only what it needs) - `jupyter lab` or `jupyter notebook` to develop interactively - `python .github/scripts/check_notebooks.py` to validate notebook structure before pushing ## Coding Style & Naming Conventions + Write Python to PEP 8 with four-space indentation, descriptive variable names, and concise docstrings that explain API usage choices. Name new notebooks with lowercase, dash-or-underscore-separated phrases that match their directory—for example `examples/gpt-5/prompt-optimization-cookbook.ipynb`. Keep markdown cells focused and prefer numbered steps for multi-part workflows. Store secrets in environment variables such as `OPENAI_API_KEY`; never hard-code keys inside notebooks. ## Testing Guidelines + Execute notebooks top-to-bottom after installing dependencies and clear lingering execution counts before committing. For Python modules or utilities, include self-check cells or lightweight `pytest` snippets and show how to run them (for example, `pytest examples/object_oriented_agentic_approach/tests`). When contributions depend on external services, mock responses or gate the cells behind clearly labeled opt-in flags. ## Commit & Pull Request Guidelines + Use concise, imperative commit messages that describe the change scope (e.g., "Add agent portfolio collaboration demo"). Every PR should provide a summary, motivation, and self-review, and must tick the registry and authors checklist from `.github/pull_request_template.md`. Link issues when applicable and attach screenshots or output snippets for UI-heavy content. Confirm CI notebook validation passes locally before requesting review. ## Metadata & Publication Workflow + New or relocated content must have an entry in `registry.yaml` with an accurate path, date, and tag set so the static site generator includes it. When collaborating, coordinate author slugs in `authors.yaml` to avoid duplicates, and run `python -m yaml lint registry.yaml` (or your preferred YAML linter) to catch syntax errors before submitting. + +## Review Guidelines + +- Verify file, function, and notebook names follow the repo's naming conventions and clearly describe their purpose. +- Scan prose and markdown for typos, broken links, and inconsistent formatting before approving. +- Check that code identifiers remain descriptive (no leftover placeholder names) and that repeated values are factored into constants when practical. +- Ensure notebooks or scripts document any required environment variables instead of hard-coding secrets or keys. +- Confirm metadata files (`registry.yaml`, `authors.yaml`) stay in sync with new or relocated content. diff --git a/authors.yaml b/authors.yaml index aab9008ebe..7b9cb4063f 100644 --- a/authors.yaml +++ b/authors.yaml @@ -460,7 +460,7 @@ daisyshe-oai: dkundel-openai: name: "Dominik Kundel" - website: "https://www.linkedin.com/in/dominik-kundel/" + website: "https://www.linkedin.com/in/dkundel/" avatar: "https://avatars.githubusercontent.com/u/200841172?v=4" edbeeching: diff --git a/examples/codex/build_code_review_with_codex_sdk.md b/examples/codex/build_code_review_with_codex_sdk.md index d3e463978a..4f261357f2 100644 --- a/examples/codex/build_code_review_with_codex_sdk.md +++ b/examples/codex/build_code_review_with_codex_sdk.md @@ -1,20 +1,22 @@ # Build Code Review with the Codex SDK -With [Code Review](https://chatgpt.com/codex/settings/code-review) in Codex Cloud, you can connect your team's cloud hosted Github repository to Codex and receive automated code reviews on every PR. But what if your code is hosted on-prem, or you don't have Github as an SCM? +With [Code Review](https://chatgpt.com/codex/settings/code-review) in Codex Cloud, you can connect your team's cloud hosted GitHub repository to Codex and receive automated code reviews on every PR. But what if your code is hosted on-prem, or you don't have GitHub as an SCM? -Luckily, we can replicate Codex's cloud hosted review process in our own CI/CD runners. In this guide, we'll build our own Code Review action using the Codex CLI headless mode with both Github actions and Jenkins. +Luckily, we can replicate Codex's cloud hosted review process in our own CI/CD runners. In this guide, we'll build our own Code Review action using the Codex CLI headless mode with both GitHub Actions and Jenkins. To build our own Code review, we'll take the following steps: + 1. Install the Codex CLI in our CI/CD runner -1. Prompt Codex in headless (exec) mode with the Code Review prompt that ships with the CLI -1. Specify a structured output JSON schema for Codex -1. Parse the JSON result and use it to make API calls to our SCM to create review comments +2. Prompt Codex in headless (exec) mode with the Code Review prompt that ships with the CLI +3. Specify a structured output JSON schema for Codex +4. Parse the JSON result and use it to make API calls to our SCM to create review comments Once implemented, Codex will be able to leave inline code review comments: -Codex Code Review in Github +Codex Code Review in GitHub ## The Code Review Prompt -GPT-5-Codex has received specific training to improve is code review abilities. You can steer GPT-5-Codex to conduct a code review with the following prompt: + +GPT-5-Codex has received specific training to improve its code review abilities. You can steer GPT-5-Codex to conduct a code review with the following prompt: ``` You are acting as a reviewer for a proposed code change made by another engineer. @@ -25,7 +27,9 @@ Prioritize severe issues and avoid nit-level comments unless they block understa After listing findings, produce an overall correctness verdict (\"patch is correct\" or \"patch is incorrect\") with a concise justification and a confidence score between 0 and 1. Ensure that file citations and line numbers are exactly correct using the tools available; if they are incorrect your comments will be rejected. ``` + ## Codex Structured Outputs + In order to make comments on code ranges in our pull request, we need to receive Codex's response in a specific format. To do that we can create a file called `codex-output-schema.json` that conforms to OpenAI's [structured outputs](https://platform.openai.com/docs/guides/structured-outputs) format. To use this file in our workflow YAML, we can call Codex with the `output-schema-file` argument like this: @@ -49,8 +53,10 @@ You can also pass a similar argument to `codex exec` for example: codex exec "Review my pull request!" --output-schema codex-output-schema.json ``` -## Github Actions Example -Let's put it all together. If you're using Github actions in an on-prem environment, you can tailor this example to your specific workflow. Inline comments highlight the key steps. +## GitHub Actions Example + +Let's put it all together. If you're using GitHub Actions in an on-prem environment, you can tailor this example to your specific workflow. Inline comments highlight the key steps. + ```yaml name: Codex Code Review @@ -331,6 +337,7 @@ jobs: ``` ## Jenkins Example + We can use the same approach to scripting a job with Jenkins. Once again, comments highlight key stages of the workflow: ```groovy @@ -650,5 +657,7 @@ pipeline { } } ``` + # Wrap Up -With the Codex SDK, you can build your own Github Code Review in on-prem environments. However, the pattern of triggering Codex with a prompt, receiving a structured output, and then acting on that output with an API call extends far beyond Code Review. For example, we could use this pattern to trigger a root-cause analysis when an incident is created and post a structured report into a slack channel. Or we could create a code quality report on each PR and post results into a dashboard. \ No newline at end of file + +With the Codex SDK, you can build your own GitHub Code Review in on-prem environments. However, the pattern of triggering Codex with a prompt, receiving a structured output, and then acting on that output with an API call extends far beyond Code Review. For example, we could use this pattern to trigger a root-cause analysis when an incident is created and post a structured report into a Slack channel. Or we could create a code quality report on each PR and post results into a dashboard. diff --git a/examples/partners/self_evolving_agents/autonomous_agent_retraining.ipynb b/examples/partners/self_evolving_agents/autonomous_agent_retraining.ipynb new file mode 100644 index 0000000000..d4c6584df4 --- /dev/null +++ b/examples/partners/self_evolving_agents/autonomous_agent_retraining.ipynb @@ -0,0 +1,2039 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Self-Evolving Agents: A Cookbook for Autonomous Agent Retraining\n", + "\n", + "## Overview\n", + "\n", + "Agentic systems often reach a plateau after proof-of-concept because they depend on humans to diagnose edge cases and correct failures. This cookbook introduces a repeatable retraining loop that captures those issues, learns from the feedback, and promotes improvements back into production-like workflows. We ground the approach in a regulated healthcare documentation task, but the patterns generalize to any domain that demands accuracy, auditability, and rapid iteration.\n", + "\n", + "### What You Will Learn\n", + "- Diagnose why an autonomous agent falls short of production readiness and instrument it with measurable feedback signals.\n", + "- Compare three prompt-optimization strategies—from quick manual iteration to fully automated loops—and understand when to reach for each.\n", + "- Assemble a self-healing workflow that combines human review, LLM-as-judge evals, and iterative prompt refinement.\n", + "\n", + "### Who This Notebook Is For\n", + "- ML/AI engineers and solution architects who need to move beyond toy demos.\n", + "- Product and delivery teams looking for executable artifacts they can adapt into internal tooling or production pipelines.\n", + "\n", + "### How to Work Through This Notebook\n", + "1. Start with Section 1 to understand the healthcare use case, baseline agent, and system architecture.\n", + "2. Use Section 2 to practice prompt optimization within the OpenAI Evals interface and collect structured feedback.\n", + "3. Run Section 3 to automate the optimization loop with graders, evals, and retraining logic.\n", + "4. Reference the appendix for reusable prompts, configurations, and evaluation templates as you tailor the workflow to your environment.\n", + "\n", + "The notebook is modular—feel free to run sections independently or sequentially as you adapt the retraining loop to your own agents." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Use Case Overview: Self-Evolving Agents in Healthcare\n", + "\n", + "### Problem Definition\n", + "\n", + "For this cookbook, we focus on a **real-world use case**: drafting regulatory documents for pharmaceutical companies. These organizations must prepare and submit extensive documentation to regulatory authorities (e.g., the U.S. Food and Drug Administration) to obtain approval for new drugs. The accuracy and speed of these submissions are critical, as they directly impact how quickly life-saving treatments can reach patients. \n", + "\n", + "Regulatory document drafting is a highly complex, iterative, and precision-driven process that requires deep scientific, medical, and compliance expertise. Despite the availability of advanced authoring tools, it remains labor-intensive and prone to human error. **Agentic systems offer substantial leverage** by assisting with research synthesis, content generation, and document structuring, yet human experts are still needed to ensure factual accuracy and regulatory compliance. \n", + "\n", + "The key challenge is to design a feedback loop that enables these agentic systems to learn iteratively and refine model behavior over time. Such a system can gradually shift human effort from detailed correction to high-level oversight, improving efficiency while maintaining the rigorous standards required for regulatory submissions. \n", + "\n", + "### Self-evolving Agent\n", + "\n", + "The diagram below illustrates the iterative process for continuously improving an AI agent through feedback, meta prompting, and evaluation. The loop combines human judgment or automated feedback using an LLM-as-a-judge to iteratively enhance performance. \n", + "\n", + "\"Self-evolving\n", + "
Figure 1 - Diagram showing the self-evolving loop for automated agent improvement.\n", + "\n", + "The process consists of the following steps: \n", + "\n", + "1. **Baseline Agent** \n", + " The process begins with a baseline agent. In this notebook, we use a deliberately simple example (an agent that summarizes sections of a document) to illustrate the iterative improvement loop. In real-world or enterprise settings, the baseline agent could be much more complex. The summaries it produces serve as the initial benchmark for subsequent evaluation and refinement.\n", + "\n", + "2. **Human Feedback (or LLM-as-judge)** \n", + " The baseline agent’s outputs are then evaluated either by human reviewers (e.g., for production environments) and/or by an automated **LLM-as-judge** system. This step gathers both quantitative and qualitative feedback that indicates how well the agent meets its goals — for instance, if we are testing the length of the summary, the feedback might be “the summary is too long” or a numerical score (generally between `0` and `1`) generated by eval when assessing if the summary is under 500 words.\n", + "\n", + "3. **Evals and Aggregated Score** \n", + " Based on the collected feedback, new prompts are generated and tested through evaluations (**Evals**). These tests measure performance against predefined criteria, and the outcomes are combined into an aggregated score that reflects the overall performance. The loop continues until the score exceeds a target threshold (e.g., `0.8`) or the maximum number of retries is reached (e.g., `max_retry = 10`). If the retry limit is hit, engineers are alerted that manual improvements are required.\n", + "\n", + "4. **Updated Baseline Agent** \n", + " Once an improved version achieves the target performance, it replaces the original baseline agent. This updated agent becomes the foundation for the next iteration, supporting a continuous cycle of learning, feedback, and optimization.\n", + "\n", + "\n", + "\n", + "### Dataset Overview\n", + "\n", + "The dataset used for evaluation comprises ~70 sections extracted from the _Sample CMC Section for Hyperpolarized Pyruvate (13C) Injection_, publicly available [here](https://dctd.cancer.gov/drug-discovery-development/reagents-materials/imaging-ind-resources/documentation/13c-pyruvate-cmc.pdf). This dataset provides realistic, domain-specific content suitable for testing both scientific summarization and regulatory compliance behavior. \n", + "\n", + "\n", + "### Baseline Agent Overview\n", + "\n", + "To keep this cookbook self-contained and easily reproducible, we simplified the regulatory drafting use case while retaining its essential complexity. In production, a typical regulatory authoring agent comprises multiple specialized sub-agents responsible for tasks such as drafting, data analysis, compliance checking, citation generation, and fact verification.\n", + "\n", + "For this guide, we narrow the scope of the regulatory authoring agent to focus on the self-healing aspect of the system. Our regulatory authoring agent consists of two sub-agents:\n", + "- **A summarizer** creating scientific and concise summaries.\n", + "- **A compliance checker**: evaluating each summary against key regulatory requirements (e.g., FDA 21 CFR Part 11). \n", + "\n", + "\"Baseline\n", + "
Figure 2 - The baseline agent as created in the AgentBuilder UI.\n", + "\n", + "For the remainder of this cookbook, we implemented a simplified version of the Summarizer agent (see the section **Agent Setup** below). Alternatively, you can reuse the code for the agent created with AgentBuilder. If you’d like to reproduce the agent directly from the AgentBuilder UI, here are the key prompts and parameters used:\n", + "\n", + "- **Summarizer agent:** This agent used the file search tool, where the [CMC PDF](\"data/c13_pyruvate_sample_CMC_from_UCSF.pdf\") was uploaded to the vector store.\n", + "> _Prompt:_ \"Summarize section {{workflow.input_as_text}} from {{state.cmc_pdf}} uploaded to the vector store.\"\n", + "\n", + "- **Compliance Checker agent:**\n", + "> _Prompt:_ \"Verify that the summary below is compliant with FDA 21 CFR Part 11: {{input.output_text}}. If the summary is compliant, return _Compliant_. Otherwise, return _This section needs to be manually summarized_.\" \n", + "\n", + "Both agents were configured with the default parameters - using GPT-5, low reasoning effort, and text as the output format.\n", + "\n", + "### Evaluation Approach\n", + "\n", + "To evaluate the baseline agent, there are two main approaches:\n", + "\n", + "1. **Collecting Human Feedback.** This approach involves gathering feedback from human users through the OpenAI Evals platform (or a custom UI built for a specific application). It is best suited for production settings or when piloting a tool where subject matter experts (SMEs) interact with the tool in real-world scenarios. This method helps uncover edge cases that may not have been identified during development. On the Evals platform, users can provide thumbs-up or thumbs-down ratings and share qualitative feedback about the summaries. \n", + "\n", + "\n", + "2. **Using an LLM-as-a-Judge.** This option is typically used during the development phase, enabling fast feedback loops without requiring SME's time. An **LLM-as-a-judge** uses an LLM to automatically evaluate and score the agent’s outputs based on predefined criteria. It can also be used for monitoring model drift (e.g., in production) or validating changes between model and model versions (e.g., switching between `gpt-5` and `gpt-5-mini`).\n", + "\n", + "\n", + "This cookbook demonstrates both approaches:\n", + "- **Section 2** shows the platform UI approach for manual prompt optimization\n", + "- **Section 3** implements the fully automated API approach using LLM-as-a-judge\n", + "\n", + "_Note: The Evals platform does not yet provide an API to retrieve user feedback programmatically._\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Using the OpenAI Evals Platform\n", + "\n", + "The OpenAI Evals platform provides an intuitive interface for prompt optimization and evaluation. This section demonstrates the complete workflow from dataset upload through iterative prompt improvement, showing how you can leverage the platform's visual interface to optimize your prompts before implementing automated solutions.\n", + "\n", + "### Step 1: Upload Dataset\n", + "\n", + "To begin using the OpenAI Evaluation platform, you'll first need to upload your dataset:\n", + "\n", + "1. Click the **+ Create** button\n", + "2. Define the dataset name\n", + "3. Upload a CSV file and select the columns to keep\n", + "4. Upload\n", + "\n", + "Your dataset should contain the documents or document sections that need to be summarized. Each row represents one input that will be processed by your system.\n", + "\n", + "### Step 2: Explore Your Data\n", + "\n", + "Once uploaded, you can explore your dataset. Click the dataset name to explore the uploaded data. This allows you to verify that your data is properly formatted and contains the expected content before proceeding with prompt configuration.\n", + "\n", + "### Step 3: Configure Initial Prompt\n", + "\n", + "This is where you define your initial system prompt and configure how data flows through your model. \n", + "\n", + "\"Platform\n", + "
Figure 3 - The platform's \"New prompt\" interface showing model configuration, variables, and system message settings.\n", + "\n", + "\n", + "#### Configuration Steps\n", + "\n", + "1. **System Prompt**: Add the system message that defines the model's task and behavior (this prompt will be optimized)\n", + "2. **User Prompt Template**: Add the prompt message template for user messages, using variables such as `{{}}` that get replaced with actual data from your dataset\n", + "3. **Model Selection**: Choose the model for generation (e.g., gpt-3.5-turbo, gpt-4)\n", + "4. **Temperature**: Configure creativity vs. determinism\n", + "\n", + "You can start with a very simple prompt to demonstrate the power of the optimization process. For example, beginning with just \"summarize\" shows how the system can evolve from a minimal starting point.\n", + "\n", + "### Step 4: Generate Outputs\n", + "\n", + "Once your prompt is configured, you're ready to generate outputs across your dataset. The prompt will run once per row and output will be generated on a new **output** column.\n", + "\n", + "1. Click **\"Generate Output\"**\n", + "2. The platform runs your prompt against all samples\n", + "3. Results appear in a new **Output** column\n", + "\n", + "The platform will process each row in your dataset, replacing template variables with actual values and calling the model with your system prompt. This creates a baseline of outputs that you can evaluate.\n", + "\n", + "### Step 5: Review and Evaluate\n", + "\n", + "Evaluation is where you provide structured feedback to guide prompt improvement.\n", + "\n", + "#### Review Outputs\n", + "\n", + "1. **Add Evaluation Columns** if not automatically added - Click \"Columns\" → \"Annotations\" → \"Add\":\n", + " - **Rating** - Binary (good/bad) or numeric ratings\n", + " - **Feedback** - Text describing what needs improvement\n", + "\n", + "2. **Provide Rating and Feedback** - Add your assessment for each output. \n", + "\n", + " Depending on the quality of the output, you may select a good or bad rating and explain your score based on how you would like the answer to be improved. For example:\n", + "\n", + " > (Rating) | Feedback\n", + " > - (Good) Good, but only the answer should be provided. The output should not include headers or any text other than the answer.\n", + " > - (Bad) The information is good, but it should be presented as bullet points.\n", + " > - (Good) Good summary; it is clear.\n", + " > - (Bad) Use bullet points when answering to improve readability. Summarize each sub-section individually.\n", + "\n", + "3. **Save Annotations** - Your feedback is saved with the evaluation run\n", + "\n", + "\"Platform\n", + "
Figure 4 - The evaluation interface showing generated outputs with rating and feedback columns for annotation.\n", + "\n", + "This structured feedback becomes the foundation for automatic prompt optimization.\n", + "\n", + "### Step 6: Optimize Prompt\n", + "\n", + "After collecting feedback, the platform can automatically generate an improved prompt.\n", + "\n", + "1. Click **\"Optimize\"**\n", + "2. A new prompt version is generated in a new tab\n", + "3. Click **\"View Prompt\"** to see the improved version\n", + "\n", + "\"Platform\n", + "
Figure 5 - The improved prompt generated by the platform, showing detailed instructions and requirements.\n", + "\n", + "### Step 7: Iterate and Compare\n", + "\n", + "With your improved prompt ready, start a new iteration to measure improvement.\n", + "\n", + "1. Click **\"Generate Output\"**\n", + "2. Review the new results and provide feedback on any remaining issues\n", + "3. Click **\"Optimize\"** again if needed\n", + "4. Repeat until satisfied\n", + "\n", + "The platform's tab structure allows you to compare performance across iterations. You can easily see how outputs evolved from your initial prompt to the optimized versions.\n", + "\n", + "\"Platform\n", + "
Figure 6 - Feedback and evaluation results for the optimized prompt, showing improvements in output quality.\n", + "\n", + "#### When to Stop Iterating\n", + "\n", + "Continue the optimization cycle until:\n", + "- **Quality threshold reached**: >80% of outputs receive positive feedback\n", + "- **Diminishing returns**: New iterations show minimal improvement\n", + "- **Specific issues resolved**: All identified failure modes are addressed\n", + "\n", + "This platform-based approach provides an excellent foundation for understanding prompt optimization before moving to automated implementations. The visual interface makes it easy to see the impact of changes and understand the optimization process.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Self-evolving Loop with LLM-as-a-Judge\n", + "\n", + "This section introduces a fully automated evaluation workflow using an LLM-as-a-Judge through the OpenAI API, eliminating the need for any user interface. This approach enables scalable, programmatic assessment of agent performance, supporting rapid iteration and continuous model monitoring in production." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# gepa and litellm are only required for the Section 4.b (prompt optimization with GEPA)\n", + "%pip install --upgrade openai openai-agents pydantic pandas gepa litellm python-dotenv -qqq \n", + "%load_ext dotenv\n", + "%dotenv\n", + "\n", + "# Place your API key in a file called .env\n", + "# OPENAI_API_KEY=sk-...\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Eval Creation\n", + "\n", + "To evaluate the baseline summarization agent, we use four complementary graders that balance deterministic checks with semantic judgment.\n", + "\n", + "| Grader | Type | Pass threshold | What it checks | Why |\n", + "|---|---|---:|---|---|\n", + "| Chemical string name | `python` | 0.8 | If any exact chemical names in the section appear in the summary. | Forces preservation of critical domain entities so summaries don’t omit chemically meaningful terms. |\n", + "| Summarization length | `python` | 0.85 | Inverse deviation from an expected 100-word length. | Keeps summaries concise and comparable, reducing verbosity that can mask poor content. |\n", + "| Cosine similarity | `text_similarity` | 0.85 | Cosine similarity between section and summary texts. | Ensures the summary stays anchored to the source content rather than drifting semantically. |\n", + "| LLM-as-judge | `score_model` | 0.85 | A rubric-driven score from a model acting as an evaluator. | Captures nuanced quality signals that rule-based metrics miss, improving overall robustness. |\n", + "\n", + "**Notes**\n", + "- The two Python graders catch domain fidelity and length discipline early, which stabilizes optimization before semantic tuning.\n", + "- Text similarity guards against superficial rephrasing that strays from the source.\n", + "- The LLM judge provides a holistic failsafe when edge cases slip past deterministic checks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from openai import OpenAI\n", + "\n", + "client = OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n", + "\n", + "data_source_config = {\n", + " \"type\": \"custom\",\n", + " \"item_schema\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\"section\": {\"type\": \"string\"}, \"summary\": {\"type\": \"string\"}},\n", + " \"required\": [\"section\", \"summary\"],\n", + " },\n", + " \"include_sample_schema\": False,\n", + "}\n", + "\n", + "testing_criteria = [\n", + " {\n", + " \"type\": \"python\",\n", + " \"name\": \"chemical_name_grader\",\n", + " \"image_tag\": \"2025-05-08\",\n", + " \"pass_threshold\": 0.8,\n", + " \"source\": r\"\"\"def grade(sample: dict, item: dict) -> float:\n", + " section = item[\"section\"]\n", + " summary = item[\"summary\"]\n", + " CHEMICALS_MASTER = [\"[1-¹³C]Pyruvic acid\",\"[1-¹³C]Pyruvate\",\"¹²C Pyruvic acid\",\"Sodium [1-¹³C]pyruvate\",\"Sodium pyruvate (¹²C)\",\"AH111501 (Trityl radical)\",\"Tris{8-carboxyl-2,2,6,6-tetra[2-(1-methoxyethyl)]-benzo(1,2-d:4,5-d’)bis(1,3)dithiole-4-yl}methyl acid\",\"AH111501 sodium salt\",\"Methyl, tris[8-carboxy-2,2,6,6-tetrakis(2-methoxyethyl)benzo[1,2-d:4,5-d’]bis[1,3]dithiol-4-yl]-, trisodium salt\",\"AH111501 trisodium salt\",\"AH111576\",\"2,2′,2″,2‴-(4,8-Dibromobenzo[1,2-d:4,5-d′]bis([1,3]dithiole)-2,2,6,6-tetrayl)tetraethanol\",\"AH111586\",\"4,8-Dibromo-2,2,6,6-tetrakis(2-methoxyethyl)benzo[1,2-d:4,5-d′]bis([1,3]dithiole)\",\"AH111709\",\"AH111743\",\"AH112615\",\"4,4-Bis-hydroxymethyl-2-methyl-oxazolidine-2-carboxylic acid\",\"AH112623\",\"Parapyruvate\",\"2-Hydroxy-2-methyl-4-oxo-pentanedioic acid\",\"AH113127\",\"(4-Hydroxymethyl-oxazolidin-4-yl)-methanol\",\"AH113462/E\",\"Enol lactone\",\"AH113462/K\",\"Keto lactone\",\"Acetyl bromide\",\"Methanol\",\"Dimethyl sulfoxide\",\"DMSO\",\"Tetrahydrofuran\",\"THF\",\"Acetonitrile\",\"ACN\",\"Diethyl ether\",\"Et₂O\",\"N,N-Dimethylacetamide\",\"DMA\",\"1,3-Dimethyl-2-imidazolidinone\",\"DMI\",\"Hydrochloric acid\",\"HCl\",\"Sodium hydroxide\",\"NaOH\",\"Disodium ethylenediaminetetraacetate\",\"Na₂EDTA\",\"Ethylenediaminetetraacetic acid\",\"EDTA\",\"Tris(hydroxymethyl)aminomethane\",\"TRIS\",\"Trometamol\",\"Trifluoroacetic acid\",\"TFA\",\"Toluene\",\"Heptane\",\"Ethyl acetate\",\"Ethanol\",\"Water\",\"H₂O\",\"Sodium chloride\",\"NaCl\",\"Cuprous [1-¹³C]cyanide\",\"Cu¹³CN\",\"Gadolinium\",\"Gd\",\"Tin\",\"Sn\",\"Phosphorus\",\"P\",\"Carbon dioxide\",\"CO₂\",\"Sodium [1-13C]pyruvate\",\"[1-13C]Pyruvic acid\",\"1-13C pyruvate\"]\n", + "\n", + " # Identify the chemicals present in the section\n", + " present = [chem for chem in CHEMICALS_MASTER if chem in section]\n", + "\n", + " # If no chemicals present, consider it satisfied\n", + " if not present:\n", + " return 1.0\n", + "\n", + " correct = 0\n", + " for chem in present:\n", + " # Only count as correct if the exact chemical string appears in the summary\n", + " if chem in summary:\n", + " correct += 1\n", + "\n", + " return correct / len(present)\"\"\",\n", + " },\n", + " {\n", + " \"type\": \"python\",\n", + " \"name\": \"word_length_deviation_grader\",\n", + " \"image_tag\": \"2025-05-08\",\n", + " \"pass_threshold\": 0.85,\n", + " \"source\": r\"\"\"\n", + "def grade(sample: dict, item: dict) -> float:\n", + " summary = item[\"summary\"]\n", + " word_count = len(summary.split())\n", + " \n", + " expected_summary_length = 100\n", + " tolerance = 0.2 # 20% band around target\n", + " \n", + " # relative deviation\n", + " deviation = abs(word_count - expected_summary_length) / expected_summary_length\n", + " \n", + " # If within tolerance band → full score\n", + " if deviation <= tolerance:\n", + " return 1.0\n", + " \n", + " # Outside band → score decays linearly, capped at 0\n", + " # e.g., deviation 0.3 → score 0.8, deviation 1.0+ → 0.0\n", + " score = 1.0 - (deviation - tolerance)\n", + " return max(0.0, score)\n", + "\"\"\",\n", + "},\n", + " {\n", + " \"name\": \"cosine_similarity\",\n", + " \"type\": \"text_similarity\",\n", + " \"input\": \"{{ item.summary }}\",\n", + " \"reference\": \"{{ item.section }}\",\n", + " \"evaluation_metric\": \"cosine\",\n", + " \"pass_threshold\": 0.85,\n", + " },\n", + " {\n", + " \"name\": \"llm_as_judge\",\n", + " \"type\": \"score_model\",\n", + " \"model\": \"gpt-4.1\",\n", + " \"input\": [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": (\n", + " \"You are an expert technical summarization evaluator. \"\n", + " \"Evaluate whether the summary captures and preserves the important technical facts and specific details from the section, allowing for occasional minor rewording or omissions of less important points, but not major technical inaccuracies or information loss.\\n\\n\"\n", + " \"Scoring Guidelines:\\n\"\n", + " \"- Return a numerical score between 0 and 1 (with up to two decimal places).\\n\"\n", + " \"- A score of 1 means the summary is almost flawless: it is comprehensive, highly faithful, and technically accurate, with virtually no important or meaningful details missing, and no significant misstatements or distortions.\\n\"\n", + " \"- 0.75-0.99 indicates excellent work: all main facts are represented, but there may be trivial omissions or very minor rewording that do not materially affect understanding.\\n\"\n", + " \"- 0.5-0.75 indicates good but imperfect: most technical information is retained and correctly presented, some less critical details might be missing or slightly rephrased, but overall fidelity is preserved.\\n\"\n", + " \"- 0.3-0.5 means significant information is missing, or some technical inaccuracies are present, but the summary retains a reasonable portion of key facts.\\n\"\n", + " \"- 0.0-0.3 means there are major omissions, misunderstandings, or a failure to capture the most important technical content.\\n\\n\"\n", + " \"Respond only with a single number between 0 and 1 indicating summary quality by these criteria.\"\n", + " ),\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": (\n", + " \"Section:\\n{{item.section}}\\n\"\n", + " \"Summary:\\n{{sample.output_text}}\"\n", + " ),\n", + " },\n", + " ],\n", + " \"range\": [0, 1],\n", + " \"pass_threshold\": 0.85,\n", + " },\n", + "]\n", + "\n", + "eval = client.evals.create(\n", + " name=\"self_evolving_eval\",\n", + " data_source_config=data_source_config,\n", + " testing_criteria=testing_criteria,\n", + ")\n", + "print(f\"Created Eval: {eval.id}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should see an eval ID in the output, e.g. `eval_...`. This is the ID of the eval we just created (as shown below)\n", + "\n", + "\"Platform\n", + "
Figure 7 - The platform's Eval interface showing data source configuration, and test criteria settings." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Grader Scoring and Parsing\n", + "\n", + "Next we'll need run the evals on the summarization agent's output and parse the results for the eval's grader scores. To do this we'll use a few helper functions:\n", + "- `run_eval`: Simple runner to call the evals API with proper formatting\n", + "- `poll_eval_run`: A polling utility to wait for the scheduled eval run to complete\n", + "- `parse_eval_run_output`: Parses the eval run and returns a structured output for the feedback loop" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "import json\n", + "\n", + "def run_eval(eval_id: str, section: str, summary: str):\n", + " \"\"\"Creates a run of the eval with the input section and output summary.\"\"\"\n", + " return client.evals.runs.create(\n", + " eval_id=eval_id,\n", + " name=\"self-evolving-eval\",\n", + " data_source={\n", + " \"type\": \"jsonl\",\n", + " \"source\": {\n", + " \"type\": \"file_content\",\n", + " \"content\": [\n", + " {\n", + " \"item\": {\n", + " \"section\": section,\n", + " \"summary\": summary,\n", + " }\n", + " }\n", + " ],\n", + " },\n", + " },\n", + " )\n", + "\n", + "\n", + "def poll_eval_run(eval_id: str, run_id: str, max_polls = 10):\n", + " \"\"\"\n", + " Polls the evaluation run until completion or timeout.\n", + "\n", + " This function exists to handle asynchronous behavior in the eval service by\n", + " periodically checking run status. It balances responsiveness and resource use by\n", + " polling at fixed intervals rather than blocking indefinitely. The retry limit\n", + " prevents runaway loops in cases where the service never returns a completed status.\n", + " \"\"\"\n", + " run = None\n", + " for attempt in range(1, max_polls + 1):\n", + " run = client.evals.runs.retrieve(eval_id=eval_id, run_id=run_id)\n", + " if run.status == \"completed\":\n", + " break\n", + " if attempt == max_polls:\n", + " print(\"Exceeded retries, aborting\")\n", + " break\n", + "\n", + " time.sleep(5)\n", + "\n", + " run_output_items = client.evals.runs.output_items.list(\n", + " eval_id=eval_id, run_id=run_id\n", + " )\n", + " return run_output_items\n", + "\n", + "\n", + "def parse_eval_run_output(items):\n", + " \"\"\"Extract all grader scores and any available conclusion outputs.\"\"\"\n", + " all_results = []\n", + "\n", + " for item in items.data:\n", + " for result in item.results:\n", + " grader_name_full = result.name\n", + " score = result.score\n", + " passed = result.passed\n", + " reasoning = None\n", + " try:\n", + " sample = result.sample\n", + " if sample:\n", + " content = result.sample[\"output\"][0][\"content\"]\n", + " content_json = json.loads(content)\n", + " steps = content_json[\"steps\"]\n", + " reasoning = \" \".join([step[\"conclusion\"] for step in steps])\n", + " except Exception:\n", + " pass\n", + "\n", + " all_results.append(\n", + " {\n", + " \"grader_name\": grader_name_full,\n", + " \"score\": score,\n", + " \"passed\": passed,\n", + " \"reasoning\": reasoning,\n", + " }\n", + " )\n", + "\n", + " return all_results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can use the created eval ID from earlier and run the graders against an arbitrary input section and summary output. This forms the backbone of the feedback loop which will kick off the prompt optimization routine.\n", + "\n", + "### Eval execution run\n", + "\n", + "Let's test our evals by providing a section and a generated summary directly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "EVAL_ID = eval.id #Created eval ID from above cell\n", + "SECTION = \"3.2.S.1 General Information ([1-13C]pyruvic acid) The active ingredient in Hyperpolarized Pyruvate (13C) Injection is hyperpolarized [1-13C]pyruvate. The drug substance is defined as [13C]pyruvic acid, which is neutralized to [1-13C]pyruvate during the compounding process. In several pre-clinical and clinical studies and during evaluation of stability, pyruvic acid has been used instead of [1-13C]pyruvic acid (see Sections 3.2.P.2.2.1 Formulation Development for Hyperpolarized Pyruvate (13C) Injection and Section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info). In the Section 3.2.S Drug Substance, data are presented for both pyruvic acid and for [1-13C]pyruvic acid. For simplicity, the terminology used in headings and captions is [1-13C]pyruvic acid. Batches containing pyruvic acid are specified by footnotes. 3.2.S.1.1 Nomenclature ([1-13C]pyruvic acid) The drug substance used for compounding of Hyperpolarized Pyruvate (13C) Injection is [1-13C]pyruvic acid. Company code: W6578 Chemical name: [1-13C]pyruvic acid CAS registry number: 127-17-3 3.2.S.1.2 Structure ([1-13C]pyruvic acid) Figure 1 Structure of [1-13C]pyruvic acid Molecular formula: C H O 3 4 3 Molecular weight: 89.06 3.2.S.1.3 General Properties ([1-13C]pyruvic acid) Appearance: Colorless to yellow, clear, viscous liquid pKa:Ka:aranWater solubility: Complete The structure of [1-13C]pyruvic acid has been confirmed by spectroscopic analysis (see Section 3.2.S.3.1 Elucidation of Structure and other Characteristics).\"\n", + "SUMMARY = \"The active ingredient in Hyperpolarized Pyruvate (13C) Injection is hyperpolarized [1-13C]pyruvate, derived from [1-13C]pyruvic acid (neutralized during compounding). Both pyruvic acid and [1-13C]pyruvic acid were used in studies and stability evaluations, but the documentation refers to [1-13C]pyruvic acid unless otherwise noted. The drug substance ([1-13C]pyruvic acid, CAS 127-17-3) is a colorless to yellow, clear, viscous liquid with a molecular formula C3H4O3 and molecular weight 89.06. Its structure has been confirmed by spectroscopic analysis, and it is completely soluble in water.\"\n", + "\n", + "eval_run = run_eval(EVAL_ID, section=SECTION, summary=SUMMARY)\n", + "run_output = poll_eval_run(eval_id=EVAL_ID, run_id=eval_run.id)\n", + "\n", + "grader_scores = parse_eval_run_output(run_output)\n", + "print(grader_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "source": [ + "You should see a list of grader scores in the output, e.g.\n", + "\n", + "```[{'grader_name': 'chemical_name_grader-', 'score': 0.5, 'passed': False, 'reasoning': None}, {'grader_name': 'word_length_deviation_grader-', 'score': 0.8, 'passed': True, 'reasoning': None}, {'grader_name': 'cosine_similarity-', 'score': 0.9104484223477793, 'passed': True, 'reasoning': None}, {'grader_name': 'llm_as_judge-', 'score': 0.8, 'passed': True, 'reasoning': 'The summary needs to include specific details from the section. Part of the essential information is captured. Key pieces of information are missing. Not all relevant structural information is included.'}]```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Running this script we can see that most of our graders are passing except the `chemical_name_grader`. Next we'll programmatically recognize this opportunity to improve the summarization agent.\n", + "\n", + "_Note: When you run it locally, graders other than `chemical_name_grader` may fail at first. This is normal, as graders can initially fail, but the results should improve through the feedback loop. Early failures simply reflect the model adjusting its responses before converging on more accurate results._\n", + "\n", + "\n", + "### Dashboard Observability\n", + "Eval runs and results can also be seen in the OpenAI Dashboard: \n", + "\n", + "\"Eval\n", + "
Figure 8 - Eval dashboard showing evaluation runs and results.\n", + "\n", + "\n", + "We can also drill down into a specific eval run: \n", + "\"Eval\n", + "
Figure 9 - Detailed eval run results showing grader scores and performance metrics.\n", + "\n", + "\n", + "## Agent Setup\n", + "\n", + "Now that we have our evals and graders set up, we can go back to our summarization agent. \n", + "For simplicity, we will provide the code for a simple agent below. You could also use `AgentBuilder`, as shown in Figure 2, and export the code from the UI.\n", + "\n", + "\n", + "We will also need a metaprompt optimization agent, to optimize our prompt, as well as some simple utilities to handle prompt versions:\n", + "- `PromptVersionEntry`: A pydantic model used to track the prompt and metadata as it changes in production\n", + "- `VersionedPrompt`: A utility class to track prompt versions, this will be important in production when analyzing the evolution of the prompt as well as ensuring there is a fallback history in case of a regression" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime\n", + "from typing import Any, Optional\n", + "\n", + "from pydantic import BaseModel, Field, ConfigDict, field_validator\n", + "\n", + "class PromptVersionEntry(BaseModel):\n", + " \"\"\"Data model for a prompt and associated data for observability\"\"\"\n", + " version: int = Field(\n", + " ..., ge=0, description=\"Version number of the prompt (increments)\"\n", + " )\n", + " model: str = Field(\n", + " \"gpt-5\",\n", + " min_length=1,\n", + " description=\"The model version to use for this version of the prompt, defaults to gpt-5\",\n", + " )\n", + " prompt: str = Field(\n", + " ..., min_length=1, description=\"The prompt text for this version\"\n", + " )\n", + " timestamp: datetime = Field(\n", + " default_factory=datetime.utcnow,\n", + " description=\"UTC timestamp when this version was created\",\n", + " )\n", + " eval_id: Optional[str] = Field(\n", + " None, description=\"ID of the evaluation associated with this prompt version\"\n", + " )\n", + " run_id: Optional[str] = Field(\n", + " None, description=\"ID of the run associated with this prompt version\"\n", + " )\n", + " metadata: Optional[dict[str, Any]] = Field(\n", + " None, description=\"Free-form metadata dict (e.g., section, summary)\"\n", + " )\n", + "\n", + " model_config = ConfigDict(\n", + " str_strip_whitespace=True, validate_assignment=True, extra=\"forbid\"\n", + " )\n", + "\n", + " @field_validator(\"prompt\")\n", + " @classmethod\n", + " def prompt_not_blank(cls, v: str) -> str:\n", + " if not v.strip():\n", + " raise ValueError(\"prompt must not be blank or only whitespace\")\n", + " return v\n", + "\n", + "\n", + "class VersionedPrompt:\n", + " \"\"\"Manages a collection of prompt versions and provides controlled updates and rollbacks.\"\"\"\n", + " def __init__(\n", + " self,\n", + " initial_prompt: str,\n", + " model: Optional[str] = \"gpt-5\",\n", + " eval_id: Optional[str] = None,\n", + " run_id: Optional[str] = None,\n", + " metadata: Optional[dict[str, Any]] = None,\n", + " ):\n", + " if not initial_prompt or not initial_prompt.strip():\n", + " raise ValueError(\"initial_prompt must be non-empty\")\n", + " self._versions: list[PromptVersionEntry] = []\n", + " first_entry = PromptVersionEntry(\n", + " version=0,\n", + " prompt=initial_prompt,\n", + " model=model,\n", + " eval_id=eval_id,\n", + " run_id=run_id,\n", + " metadata=metadata,\n", + " )\n", + " self._versions.append(first_entry)\n", + "\n", + " def update(\n", + " self,\n", + " new_prompt: str,\n", + " model: Optional[str] = \"gpt-5\",\n", + " eval_id: Optional[str] = None,\n", + " run_id: Optional[str] = None,\n", + " metadata: Optional[dict[str, Any]] = None,\n", + " ) -> PromptVersionEntry:\n", + " if not new_prompt or not new_prompt.strip():\n", + " raise ValueError(\"new_prompt must be non-empty\")\n", + "\n", + " version = self.current().version + 1\n", + " entry = PromptVersionEntry(\n", + " version=version,\n", + " prompt=new_prompt,\n", + " model=model,\n", + " eval_id=eval_id,\n", + " run_id=run_id,\n", + " metadata=metadata,\n", + " )\n", + " self._versions.append(entry)\n", + " return entry\n", + "\n", + " def current(self) -> PromptVersionEntry:\n", + " return self._versions[-1]\n", + "\n", + " def revert_to_version(self, version: int) -> PromptVersionEntry:\n", + " idx = None\n", + " for i, entry in enumerate(self._versions):\n", + " if entry.version == version:\n", + " idx = i\n", + " break\n", + "\n", + " if idx is None:\n", + " raise ValueError(f\"No version found with version={version}\")\n", + "\n", + " self._versions = self._versions[: idx + 1]\n", + " return self._versions[-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we'll create the starting summarization and prompt optimization agents.\n", + "\n", + "_Note: We created a wrapper to track prompt changes in the summarization agent since it is expected to evolve in production, the metaprompt agent's prompt will stay static for the purposes of this cookbook._" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "from agents import Agent\n", + "\n", + "METAPROMPT_TEMPLATE = \"\"\"\n", + "# Context:\n", + "## Original prompt:\n", + "{original_prompt}\n", + "\n", + "## Section:\n", + "{section}\n", + "\n", + "## Summary:\n", + "{summary}\n", + "\n", + "## Reason to improve the prompt:\n", + "{reasoning}\n", + "\n", + "# Task:\n", + "Write a new summarization prompt that is significantly improved and more specific than the original. \n", + "The new prompt should instruct the model to produce concise yet comprehensive technical summaries that precisely preserve all explicit information from the source text. It should emphasize the inclusion of all named entities, quantities, compounds, and technical terminology without paraphrasing or omission. The resulting prompt should read like a clear, directive system message for a technical summarization assistant—structured, unambiguous, and generalizable across scientific or regulatory document sections.\n", + "\"\"\"\n", + "\n", + "metaprompt_agent = Agent(\n", + " name=\"MetapromptAgent\", instructions=\"You are a prompt optimizer.\"\n", + ")\n", + "\n", + "summarization_prompt = VersionedPrompt(\n", + " initial_prompt=\"\"\"You are a summarization assistant.\n", + "Given a section of text, produce a summary.\"\"\"\n", + ")\n", + "\n", + "def make_summarization_agent(prompt_entry: PromptVersionEntry) -> Agent:\n", + " return Agent(\n", + " name=\"SummarizationAgent\",\n", + " instructions=prompt_entry.prompt,\n", + " model=prompt_entry.model,\n", + " )\n", + "\n", + "summarization_agent = make_summarization_agent(summarization_prompt.current())\n", + "\n", + "# Cache eval results by section + summary so repeated attempts do not trigger redundant grader runs.\n", + "eval_cache: dict[tuple[str, str], list[dict[str, Any]]] = {}\n", + "\n", + "# Track the highest-scoring candidate that also passes the lenient score threshold.\n", + "best_candidate: dict[str, Any] = {\n", + " \"score\": float(\"-inf\"),\n", + " \"prompt\": summarization_prompt.current().prompt,\n", + " \"model\": summarization_prompt.current().model,\n", + " \"summary\": None,\n", + " \"metadata\": None,\n", + " \"version\": summarization_prompt.current().version,\n", + " \"passed_lenient\": False,\n", + " \"total_score\": float(\"-inf\"),\n", + "}\n", + "\n", + "# Aggregate per-version performance so we can pick the strongest total scorer at the end.\n", + "aggregate_prompt_stats: dict[int, dict[str, Any]] = {}\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Orchestration and Monitoring\n", + "\n", + "This is what we've done so far - we've created:\n", + "- Evals with 4 graders that will assess the outputs and produce a score for each grader\n", + "- A summarization agent with a versioned prompt class to track changes to the prompt and model\n", + "- A metaprompt optimization agent that will attempt to update the prompt based on a set of reasoning\n", + "\n", + "Now these different functionalities can be composed to orchestrate the self-evolving loop with Agent tracing in the OpenAI dashboard.\n", + "\n", + "Keep in mind that this is a simplified example. In a real-world scenario, you'd want to ensure you have guardrails for optimization attempts and that an alert notifies a human when a guardrail is triggered.\n", + "\n", + "_Note: Due to practical limitations of the cookbook we are simulating a stream of data by feeding in a static dataset and using `print` statements in place of true observability._\n", + "\n", + "### Orchestration Utilities\n", + "\n", + "As in previous sections we'll create some utilities to manage the orchestration logic of the feedback loop." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "from typing import Any, Optional\n", + "from agents import Runner\n", + "\n", + "LENIENT_PASS_RATIO = 0.75 # 75% of graders must pass (binary) \n", + "LENIENT_AVERAGE_THRESHOLD = 0.85 # 85% average score across graders \n", + "\n", + "def reset_best_candidate() -> None:\n", + " \"\"\"Reset the best candidate tracker for a new optimization run.\"\"\"\n", + " global best_candidate\n", + "\n", + " current = summarization_prompt.current()\n", + " best_candidate = {\n", + " \"score\": float(\"-inf\"),\n", + " \"prompt\": current.prompt,\n", + " \"model\": current.model,\n", + " \"summary\": None,\n", + " \"metadata\": None,\n", + " \"version\": current.version,\n", + " }\n", + "\n", + "def reset_best_trackers() -> None:\n", + " \"\"\"Reset both the best-candidate tracker and aggregate stats.\"\"\"\n", + " reset_best_candidate()\n", + " aggregate_prompt_stats.clear()\n", + "\n", + "\n", + "def update_best_candidate(\n", + " *,\n", + " average_score: Optional[float] = None,\n", + " prompt_text: str,\n", + " model_name: str,\n", + " summary_text: str = None,\n", + " metadata: dict[str, Any] = None,\n", + " lenient_passed: bool = False,\n", + " prompt_version: int = None,\n", + " total_score: Optional[float] = None,\n", + " score: Optional[float] = None,\n", + ") -> None:\n", + " \"\"\"Persist the best lenient-passing candidate.\"\"\"\n", + " global best_candidate\n", + "\n", + " if prompt_version is None:\n", + " prompt_version = summarization_prompt.current().version\n", + "\n", + " if average_score is None:\n", + " average_score = score\n", + "\n", + " if average_score is None:\n", + " return\n", + "\n", + " if lenient_passed:\n", + " best_candidate.update(\n", + " {\n", + " \"score\": average_score,\n", + " \"prompt\": prompt_text,\n", + " \"model\": model_name,\n", + " \"summary\": summary_text,\n", + " \"metadata\": metadata,\n", + " \"version\": prompt_version,\n", + " \"total_score\": total_score if total_score is not None else average_score,\n", + " }\n", + " )\n", + "\n", + "\n", + "def apply_best_candidate_if_needed() -> Agent:\n", + " \"\"\"Ensure summarization_prompt reflects the best prompt candidate.\"\"\"\n", + " if best_candidate[\"score\"] > float(\"-inf\"):\n", + " current = summarization_prompt.current()\n", + " target = best_candidate\n", + " # Only update if different\n", + " if (\n", + " current.prompt != target[\"prompt\"]\n", + " or current.model != target[\"model\"]\n", + " or current.version != target.get(\"version\")\n", + " ):\n", + " summarization_prompt.update(\n", + " new_prompt=target[\"prompt\"],\n", + " model=target[\"model\"],\n", + " metadata=target.get(\"metadata\"),\n", + " )\n", + " target[\"version\"] = summarization_prompt.current().version\n", + " return make_summarization_agent(summarization_prompt.current())\n", + "\n", + " return make_summarization_agent(summarization_prompt.current())\n", + "\n", + "\n", + "def record_aggregate_prompt_score(\n", + " *,\n", + " prompt_version: int,\n", + " prompt_text: str,\n", + " model_name: str,\n", + " average_score: float,\n", + " total_score: Optional[float] = None,\n", + ") -> None:\n", + " \"\"\"Accumulate per-version grader scores for aggregate selection.\"\"\"\n", + " stats = aggregate_prompt_stats.setdefault(\n", + " prompt_version,\n", + " {\n", + " \"version\": prompt_version,\n", + " \"prompt\": prompt_text,\n", + " \"model\": model_name,\n", + " \"total_score\": 0.0,\n", + " \"total_average\": 0.0,\n", + " \"count\": 0,\n", + " },\n", + " )\n", + " stats[\"total_score\"] += total_score if total_score is not None else average_score\n", + " stats[\"total_average\"] += average_score\n", + " stats[\"count\"] += 1\n", + " stats[\"prompt\"] = prompt_text\n", + " stats[\"model\"] = model_name\n", + "\n", + "\n", + "def select_best_aggregate_prompt() -> Optional[dict[str, Any]]:\n", + " \"\"\"Return the prompt version with the highest cumulative score.\"\"\"\n", + " if not aggregate_prompt_stats:\n", + " return None\n", + " return max(\n", + " aggregate_prompt_stats.values(),\n", + " key=lambda entry: (\n", + " entry.get(\"total_score\", float(\"-inf\")),\n", + " entry.get(\"version\", -1),\n", + " ),\n", + " )\n", + "\n", + "\n", + "async def get_eval_grader_score(eval_id: str, section: str, summary: str):\n", + " \"\"\"Retrieve grader scores for a section-summary pair with caching.\"\"\"\n", + " cache_key = (section, summary)\n", + " if cache_key in eval_cache:\n", + " return eval_cache[cache_key]\n", + "\n", + " eval_run = run_eval(eval_id=eval_id, section=section, summary=summary)\n", + " run_output = poll_eval_run(eval_id=eval_id, run_id=eval_run.id)\n", + " results = parse_eval_run_output(run_output)\n", + " eval_cache[cache_key] = results\n", + " return results\n", + "\n", + "\n", + "def calculate_grader_score(grader_scores):\n", + " \"\"\"Simple average score of all graders from the eval.\"\"\"\n", + " if not grader_scores:\n", + " return 0.0\n", + "\n", + " score_sum = 0.0\n", + " for entry in grader_scores:\n", + " score_sum += entry.get(\"score\", 0.0)\n", + "\n", + " return score_sum / len(grader_scores)\n", + "\n", + "\n", + "\n", + "def calculate_total_grader_score(grader_scores):\n", + " \"\"\"Sum of all grader scores for aggregate tracking.\"\"\"\n", + " if not grader_scores:\n", + " return 0.0\n", + "\n", + " return sum(entry.get(\"score\", 0.0) for entry in grader_scores)\n", + "\n", + "\n", + "DEFAULT_PASSING_FEEDBACK = (\n", + " \"All graders passed; tighten factual coverage, chemical completeness, and conciseness.\"\n", + ")\n", + "\n", + "\n", + "def is_lenient_pass(grader_scores, average_score: float) -> bool:\n", + " if not grader_scores:\n", + " return False\n", + "\n", + " passed_count = sum(1 for entry in grader_scores if entry.get(\"passed\"))\n", + " total_graders = len(grader_scores)\n", + "\n", + " if total_graders and (passed_count / total_graders) >= LENIENT_PASS_RATIO:\n", + " return True\n", + " return average_score >= LENIENT_AVERAGE_THRESHOLD\n", + "\n", + "\n", + "def collect_grader_feedback(grader_scores):\n", + " \"\"\"Consolidate grader reasoning into actionable feedback for the metaprompt agent.\"\"\"\n", + " feedback_lines = []\n", + "\n", + " for entry in grader_scores:\n", + " grader = entry.get(\"grader_name\", \"\")\n", + " passed = entry.get(\"passed\", False)\n", + " reasoning = entry.get(\"reasoning\")\n", + "\n", + " if not passed:\n", + " if grader.startswith(\"chemical_name_grader\"):\n", + " feedback_lines.append(\n", + " \"Not all chemical names in the input section were included in the summary.\"\n", + " )\n", + " elif grader.startswith(\"word_length_deviation_grader\"):\n", + " feedback_lines.append(\n", + " \"The summary length deviates too much from the expected length.\"\n", + " )\n", + " elif grader.startswith(\"cosine_similarity\"):\n", + " feedback_lines.append(\n", + " \"The summary is not sufficiently similar to the source section (cosine similarity too low).\"\n", + " )\n", + " elif grader.startswith(\"llm_as_judge\") and reasoning:\n", + " feedback_lines.append(reasoning)\n", + "\n", + " if not feedback_lines:\n", + " feedback_lines.append(DEFAULT_PASSING_FEEDBACK)\n", + "\n", + " return \"\".join(feedback_lines)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Self-evolving loop\n", + "\n", + "Now to simulate a stream of requests for summarization we'll feed in a prepared dataset and observe the optimization evolve from a naive prompt.\n", + "\n", + "> The referenced dataset.csv can be found in the Github repository." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "from agents import Agent, trace\n", + "\n", + "EVAL_ID = eval.id #Created eval ID from above cell\n", + "MAX_OPTIMIZATION_RETRIES = 3\n", + "\n", + "async def self_evolving_loop(summarization_agent: Agent) -> Agent:\n", + " print(f\"Starting self-evolving loop | Initial prompt v{summarization_prompt.current().version}\")\n", + " print(f\"Prompt:{summarization_prompt.current().prompt}\")\n", + " print(\"-\" * 80)\n", + "\n", + " reset_best_trackers()\n", + " df = pd.read_csv(\"data/dataset.csv\")\n", + "\n", + " with trace(\"Self-evolving Optimization Workflow\"):\n", + " for _, row in df.head().iterrows():\n", + " content = row.get(\"content\")\n", + " if pd.isna(content) or (isinstance(content, str) and not content.strip()):\n", + " continue\n", + "\n", + " section_number = str(row[\"section_number\"])\n", + " section = str(content)\n", + " current_version = summarization_prompt.current().version\n", + "\n", + " print(f\"[Section {section_number}] Using prompt v{current_version}\")\n", + "\n", + " optimization_success = False\n", + "\n", + " for attempt in range(1, MAX_OPTIMIZATION_RETRIES + 1):\n", + " print(f\" Attempt {attempt}: evaluating summary...\")\n", + "\n", + " summary_result = await Runner.run(summarization_agent, section)\n", + " summary = summary_result.final_output\n", + "\n", + " grader_scores = await get_eval_grader_score(eval_id=EVAL_ID, summary=summary, section=section)\n", + " average_score = calculate_grader_score(grader_scores)\n", + " total_score = calculate_total_grader_score(grader_scores)\n", + " lenient_passed = is_lenient_pass(grader_scores, average_score)\n", + " print(\n", + " f\"\tScores — avg={average_score:.3f}, total={total_score:.3f}, lenient_passed={lenient_passed}\"\n", + " )\n", + "\n", + " record_aggregate_prompt_score(\n", + " prompt_version=summarization_prompt.current().version,\n", + " prompt_text=summarization_prompt.current().prompt,\n", + " model_name=summarization_prompt.current().model,\n", + " average_score=average_score,\n", + " total_score=total_score,\n", + " )\n", + "\n", + " update_best_candidate(\n", + " average_score=average_score,\n", + " prompt_text=summarization_prompt.current().prompt,\n", + " model_name=summarization_prompt.current().model,\n", + " summary_text=summary,\n", + " metadata={\n", + " \"section\": section_number,\n", + " \"average_score\": average_score,\n", + " \"grader_results\": grader_scores,\n", + " \"prompt_version\": summarization_prompt.current().version,\n", + " },\n", + " lenient_passed=lenient_passed,\n", + " prompt_version=summarization_prompt.current().version,\n", + " )\n", + "\n", + " if lenient_passed:\n", + " optimization_success = True\n", + " print(f\"\tPassed with prompt v{summarization_prompt.current().version}\")\n", + " break\n", + "\n", + " print(\"\tFailed eval. Improving prompt...\")\n", + " eval_feedback = collect_grader_feedback(grader_scores)\n", + "\n", + " metaprompt_result = await Runner.run(\n", + " metaprompt_agent,\n", + " input=METAPROMPT_TEMPLATE.format(\n", + " original_prompt=summarization_prompt.current().prompt,\n", + " section=section,\n", + " summary=summary,\n", + " reasoning=eval_feedback,\n", + " ),\n", + " )\n", + " improved_prompt = metaprompt_result.final_output\n", + " summarization_prompt.update(\n", + " new_prompt=improved_prompt,\n", + " metadata={\"section\": section, \"summary\": summary},\n", + " )\n", + " summarization_agent = make_summarization_agent(summarization_prompt.current())\n", + "\n", + " print(f\"\tPrompt improved → v{summarization_prompt.current().version}\")\n", + "\n", + " if not optimization_success:\n", + " print(\n", + " \"\tAll attempts failed; keeping latest prompt version \"\n", + " f\"v{summarization_prompt.current().version} for the next section.\"\n", + " )\n", + "\n", + " summarization_agent = apply_best_candidate_if_needed()\n", + "\n", + " print(\"\" + \"-\" * 80)\n", + " print(\"Completed optimization loop.\")\n", + " print(f\"Final prompt version: v{summarization_prompt.current().version}\")\n", + " if best_candidate[\"score\"] > float(\"-inf\"):\n", + " print(\n", + " f\"Best lenient prompt: v{best_candidate.get('version')} (avg={best_candidate['score']:.3f})\"\n", + " )\n", + "\n", + " aggregate_best = select_best_aggregate_prompt()\n", + " if aggregate_best:\n", + " per_section = (\n", + " aggregate_best.get(\"total_average\", 0.0) / aggregate_best.get(\"count\", 1)\n", + " if aggregate_best.get(\"count\")\n", + " else 0.0\n", + " )\n", + " print(\n", + " f\"Aggregate best prompt: v{aggregate_best.get('version')} \"\n", + " f\"(total={aggregate_best.get('total_score', 0.0):.3f}, avg/section={per_section:.3f}, model={aggregate_best.get('model', 'unknown')})\"\n", + " )\n", + "\n", + " print(f\"Final prompt:{summarization_prompt.current().prompt}\")\n", + " return summarization_agent\n", + "\n", + "summarization_agent = await self_evolving_loop(summarization_agent)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**How the final prompt is chosen**\n", + "\n", + "- Every evaluation logs the average grader score, the total score across graders, and whether the attempt passed the lenient criteria.\n", + "- `best_candidate` tracks the most recent lenient pass (for transparency), but the final selection uses the aggregate totals to ensure we keep the top-performing prompt overall.\n", + "- When the loop ends, `apply_best_candidate_if_needed` restores the prompt with the highest cumulative grader score (ties favor the latest version), guaranteeing that the surfaced prompt is the strongest performer observed.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example (abridged) output for the code above.\n", + "\n", + "Inspecting the output shows that the self evolving prompt worked. There are a few takeaways to account for:\n", + "1. The optimization is not always successful, so being able to roll back the prompt version is important\n", + "2. The fidelity of the information from the graders is crucially important to ensuring a quality optimization" + ] + }, + { + "cell_type": "raw", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "Starting self-evolving loop | Initial prompt v0\n", + "Prompt:You are a summarization assistant.\n", + "Given a section of text, produce a summary.\n", + "--------------------------------------------------------------------------------\n", + "[Section 7.1] Using prompt v0\n", + " Attempt 1: evaluating summary...\n", + "\tScores — avg=0.805, total=3.218, lenient_passed=False\n", + "\tFailed eval. Improving prompt...\n", + "\tPrompt improved → v1\n", + " Attempt 2: evaluating summary...\n", + "\tScores — avg=0.720, total=2.881, lenient_passed=False\n", + "\tFailed eval. Improving prompt...\n", + "\tPrompt improved → v2\n", + " Attempt 3: evaluating summary...\n", + "\tScores — avg=0.762, total=3.048, lenient_passed=True\n", + "\tPassed with prompt v2\n", + "[Section 7.2] Using prompt v2\n", + " Attempt 1: evaluating summary...\n", + "\tScores — avg=0.612, total=2.450, lenient_passed=False\n", + "\tFailed eval. Improving prompt...\n", + "\tPrompt improved → v3\n", + " Attempt 2: evaluating summary...\n", + "\tScores — avg=0.915, total=3.660, lenient_passed=True\n", + "\tPassed with prompt v3\n", + "[Section 3.2.P.2.1] Using prompt v3\n", + " Attempt 1: evaluating summary...\n", + "\tScores — avg=0.684, total=2.736, lenient_passed=False\n", + "\tFailed eval. Improving prompt...\n", + "\tPrompt improved → v4\n", + " Attempt 2: evaluating summary...\n", + "\tScores — avg=0.684, total=2.736, lenient_passed=False\n", + "\tFailed eval. Improving prompt...\n", + "\tPrompt improved → v5\n", + " Attempt 3: evaluating summary...\n", + "\tScores — avg=0.920, total=3.680, lenient_passed=True\n", + "\tPassed with prompt v5\n", + "[Section 3.2.P.2.2] Using prompt v5\n", + " Attempt 1: evaluating summary...\n", + "\tScores — avg=0.737, total=2.950, lenient_passed=True\n", + "\tPassed with prompt v5\n", + "[Section 3.2.P.2.3] Using prompt v5\n", + " Attempt 1: evaluating summary...\n", + "\tScores — avg=0.750, total=3.000, lenient_passed=True\n", + "\tPassed with prompt v5\n", + "--------------------------------------------------------------------------------\n", + "Completed optimization loop.\n", + "Final prompt version: v5\n", + "Best lenient prompt: v5 (avg=0.750)\n", + "Aggregate best prompt: v5 (total=9.630, avg/section=0.802)\n", + "Final prompt:**Optimized Technical Summarization System Prompt**\n", + "\n", + "You are a technical summarization assistant specialized in scientific and regulatory documents. Your objective is to generate a summary that preserves every explicit detail and organizational structure from the source text, without any paraphrasing, omission, or synthesis.\n", + "\n", + "**Strict Summarization Guidelines:**\n", + "\n", + "**1. Comprehensive Detail Inclusion:** \n", + "- Transcribe all named compounds, salts, excipients, drug substances, molecular designations, batch codes, identifiers, and CAS numbers exactly as written.\n", + "- Include every stated concentration, unit, measurement, quantitative value, compositional detail, and preparatory parameter verbatim and in original format.\n", + "- Accurately replicate all descriptions of appearance, color, physical state, rationale for inclusion, and labeling or typographical conventions present in the source.\n", + "- Clearly include all section titles, headings, subsections, hierarchical numbering, referenced sections, and in-line citations or figures.\n", + "\n", + "**2. Prohibited Actions:** \n", + "- Do NOT paraphrase, summarize, interpret, synthesize, restructure, generalize, or alter any information at any level.\n", + "- Do NOT omit, compress, merge, or reorder any data point, named entity, technical term, or explicit instruction from the source.\n", + "- Do NOT introduce additional content, inference, or editorial clarification.\n", + "\n", + "**3. Structural and Formatting Requirements:** \n", + "- Maintain verbatim order, sectioning, and hierarchy from the source text, including all original lists, bullet points, numbering, or formatting.\n", + "- Reproduce every element in the precise sequence, alignment, and structure as the input, ensuring maximal traceability.\n", + "- If the source uses lists, tables, subpoints, or hierarchies, mirror them exactly.\n", + "\n", + "**4. Precision, Fidelity, and Reviewability:** \n", + "- Your summary must enable full regulatory or technical audit by containing every explicit detail, designation, and measurement from the original—unaltered and unabridged.\n", + "- The output must be comprehensive, exhaustive, and identical in informational content and structure to the input. Every visible explicit detail must be present.\n", + "\n", + "**Output Instruction:** \n", + "Begin summarization after this message, applying the above rules without exception. Each output must be concise in format but all-inclusive in content, reflecting every explicit fact, designation, and organizational feature of the source text, and suitable for regulatory or expert review. No interpretation, paraphrasing, or omission is permitted under any circumstance." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Agent Logs & Tracing\n", + "\n", + "We can view optimization workflow runs in the dashboard under logs: \n", + "\n", + "\"Agent\n", + "
Figure 10 - Agent log traces showing optimization workflow runs in the dashboard.\n", + "\n", + "And drill down into the different agent calls: \n", + "\n", + "\"Agent\n", + "
Figure 11 - Detailed agent trace showing individual agent calls and execution flow.\n", + "\n", + "### Continuous Monitoring\n", + "\n", + "Once the evaluation loop is complete, the system should continue to monitor new incoming data and periodically re-evaluate model performance on blind datasets. This ensures the model remains accurate and compliant as the data distribution evolves.\n", + "\n", + "To enable continuous monitoring, you can integrate a cron job or a lightweight scheduler loop that periodically checks for updates in your data source (e.g., new PDF uploads or database entries). When new data is detected, the system automatically triggers the evaluation and optimization loop described earlier.\n", + "\n", + "For example (pseudo code):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# this cell is pseudo-code and not meant to be run as-is\n", + "\n", + "import time\n", + "\n", + "def continuous_monitoring(interval_hours=24):\n", + " \"\"\"Periodically check for new data and trigger the evaluation loop.\"\"\"\n", + " while True:\n", + " print(\"Checking for new data...\")\n", + " if new_data_detected():\n", + " print(\"New data found — running evaluation and optimization loop.\")\n", + " self_evolving_loop()\n", + " else:\n", + " print(\"No new data. Sleeping until next cycle.\")\n", + " time.sleep(interval_hours * 3600)\n", + "\n", + "continuous_monitoring(interval_hours=24)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This approach allows the model to continuously learn and adapt, improving over time as it processes fresh data — a key requirement for maintaining high-quality, real-world performance." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Going Further\n", + "\n", + "### a. Model Evaluation\n", + "\n", + "We now have a fully automated loop improving our prompt with **evals** and accepting the new prompt when the rating is over the defined threshold. \n", + "\n", + "In production, you could use a similar framework to monitor the performance of your agents as new user requests come in.\n", + "As mentioned above, this is a simplified example, and in a real-world scenario you'd want to have additional guardrails and a human-in-the-loop approach to approve new prompts. \n", + "\n", + "Taking this concept further, we can also use evals to test different model parameter candidates such as the model version, verbosity, and reasoning. To see the full available set of parameters that could considered, check the [ModelSettings class in the Agents SDK](https://openai.github.io/openai-agents-python/ref/model_settings/#agents.model_settings.ModelSettings)\n", + "\n", + "The `compare_model_candidates` function is an example of how to:\n", + "1. Optimize the prompt\n", + "2. Generate candidate outputs from the optimized prompt using two or more different models\n", + "3. Use evals to grade the candidate outputs and select the best candidate\n", + "\n", + "It can be worked into the `self_evolving_loop` function with minimal refactoring.\n", + "\n", + "> **NOTE:** Production testing of model versions should be limited to versions within the same family version (e.g. gpt-5, gpt-5-mini, gpt-5-nano). It is recommended to conduct cross family version selection pre-production deployment.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And the final `self_evolving_loop` with model comparison code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from agents import Agent, Runner\n", + "\n", + "async def eval_agent_candidate(agent: Agent, section: str, prompt_text: str, model_name: str):\n", + " summary_result = await Runner.run(agent, section)\n", + " summary = summary_result.final_output\n", + "\n", + " scores = await get_eval_grader_score(\n", + " eval_id=EVAL_ID, summary=summary, section=section\n", + " )\n", + " average = calculate_grader_score(scores)\n", + " lenient_passed = is_lenient_pass(scores, average)\n", + " passed = all(entry.get(\"passed\") is True for entry in scores)\n", + "\n", + " update_best_candidate(\n", + " average_score=average,\n", + " prompt_text=prompt_text,\n", + " model_name=model_name,\n", + " summary_text=summary,\n", + " metadata={\n", + " \"section\": section,\n", + " \"average_score\": average,\n", + " \"grader_results\": scores,\n", + " },\n", + " lenient_passed=lenient_passed,\n", + " )\n", + "\n", + " return {\"summary\": summary, \"scores\": scores, \"average\": average, \"passed\": passed}\n", + "\n", + "async def compare_model_candidates(\n", + " summarization_prompt,\n", + " eval_feedback: str,\n", + " section: str,\n", + " summary: str,\n", + " model_candidates=None,\n", + "):\n", + " \"\"\"Improve the prompt, evaluate it across candidate models, and adopt the top performer.\"\"\"\n", + " if model_candidates is None:\n", + " model_candidates = [\"gpt-5\", \"gpt-5-mini\"]\n", + "\n", + " metaprompt_result = await Runner.run(\n", + " metaprompt_agent,\n", + " input=METAPROMPT_TEMPLATE.format(\n", + " original_prompt=summarization_prompt.current().prompt,\n", + " section=section,\n", + " summary=summary,\n", + " reasoning=eval_feedback,\n", + " ),\n", + " )\n", + " improved_prompt = metaprompt_result.final_output\n", + "\n", + " async def evaluate_model(model_name: str):\n", + " candidate_agent = Agent(\n", + " name=f\"SummarizationAgent:{model_name}\",\n", + " instructions=improved_prompt,\n", + " model=model_name,\n", + " )\n", + " result = await eval_agent_candidate(candidate_agent, section, improved_prompt, model_name)\n", + " return model_name, candidate_agent, result\n", + "\n", + " best = {\n", + " \"average\": float(\"-inf\"),\n", + " \"passed\": False,\n", + " \"agent\": None,\n", + " \"model\": None,\n", + " \"summary\": None,\n", + " }\n", + "\n", + " tasks = [asyncio.create_task(evaluate_model(model_name)) for model_name in model_candidates]\n", + " for task in asyncio.as_completed(tasks):\n", + " model_name, candidate_agent, result = await task\n", + " print(\n", + " f\"Candidate average — {model_name}: {result['average']:.4f} \"\n", + " f\"(passed={result.get('passed', False)})\"\n", + " )\n", + " if result[\"average\"] > best[\"average\"]:\n", + " best.update(\n", + " {\n", + " \"average\": result[\"average\"],\n", + " \"model\": model_name,\n", + " \"summary\": result.get(\"summary\"),\n", + " \"agent\": candidate_agent,\n", + " \"passed\": result.get(\"passed\", False),\n", + " }\n", + " )\n", + "\n", + " for task in tasks:\n", + " if not task.done():\n", + " task.cancel()\n", + "\n", + " if best[\"passed\"] and best[\"model\"]:\n", + " summarization_prompt.update(\n", + " new_prompt=improved_prompt,\n", + " model=best[\"model\"],\n", + " metadata={\"section\": section, \"summary\": best[\"summary\"]},\n", + " )\n", + " print(f\"Updated summarization_prompt with passing model: {best['model']}\")\n", + " return make_summarization_agent(summarization_prompt.current())\n", + "\n", + " print(\n", + " f\"No passing models. Best candidate (model={best['model']}, \"\n", + " f\"avg={best['average']:.4f}) did not pass. Prompt not updated.\"\n", + " )\n", + " return None\n", + "\n", + "async def self_evolving_loop_with_model_comparison(summarization_agent: Agent) -> Agent:\n", + " print(\n", + " f\"Starting self-evolving loop | Initial prompt v{summarization_prompt.current().version}\"\n", + " )\n", + " print(f\"Prompt: {summarization_prompt.current().prompt}\")\n", + " print(f\"Model: {summarization_prompt.current().model}\")\n", + " print(\"-\" * 80)\n", + "\n", + " reset_best_trackers()\n", + " df = pd.read_csv(\"data/dataset.csv\")\n", + "\n", + " with trace(\"Self-evolving Optimization Workflow: model comparison\"):\n", + " for _, row in df.head(5).iterrows():\n", + " content = row.get(\"content\")\n", + " if pd.isna(content) or (isinstance(content, str) and not content.strip()):\n", + " continue\n", + "\n", + " section_number = str(row[\"section_number\"])\n", + " section = str(content)\n", + " current_version = summarization_prompt.current().version\n", + "\n", + " print(f\"[Section {section_number}] Using prompt v{current_version}\")\n", + "\n", + " summary_passed = False\n", + "\n", + " for attempt in range(1, MAX_OPTIMIZATION_RETRIES + 1):\n", + " print(f\"\\tAttempt {attempt}: evaluating summary...\")\n", + "\n", + " summary_result = await Runner.run(summarization_agent, section)\n", + " summary = summary_result.final_output\n", + "\n", + " grader_scores = await get_eval_grader_score(\n", + " eval_id=EVAL_ID, summary=summary, section=section\n", + " )\n", + " average_score = calculate_grader_score(grader_scores)\n", + " total_score = calculate_total_grader_score(grader_scores)\n", + " lenient_passed = is_lenient_pass(grader_scores, average_score)\n", + " print(\n", + " f\"\\tScores — avg={average_score:.3f}, total={total_score:.3f}, lenient_passed={lenient_passed}\"\n", + " )\n", + "\n", + " record_aggregate_prompt_score(\n", + " prompt_version=summarization_prompt.current().version,\n", + " prompt_text=summarization_prompt.current().prompt,\n", + " model_name=summarization_prompt.current().model,\n", + " average_score=average_score,\n", + " total_score=total_score,\n", + " )\n", + "\n", + " update_best_candidate(\n", + " average_score=average_score,\n", + " total_score=total_score,\n", + " prompt_text=summarization_prompt.current().prompt,\n", + " model_name=summarization_prompt.current().model,\n", + " summary_text=summary,\n", + " metadata={\n", + " \"section\": section_number,\n", + " \"average_score\": average_score,\n", + " \"grader_results\": grader_scores,\n", + " \"prompt_version\": summarization_prompt.current().version,\n", + " },\n", + " lenient_passed=lenient_passed,\n", + " prompt_version=summarization_prompt.current().version,\n", + " )\n", + "\n", + " if lenient_passed:\n", + " summary_passed = True\n", + " print(\n", + " f\"\\tPassed with prompt v{summarization_prompt.current().version} (model={summarization_prompt.current().model})\"\n", + " )\n", + " break\n", + "\n", + " print(\"\\tFailed eval. Improving prompt...\")\n", + " eval_feedback = collect_grader_feedback(grader_scores)\n", + "\n", + " new_agent = await compare_model_candidates(\n", + " summarization_prompt=summarization_prompt,\n", + " eval_feedback=eval_feedback,\n", + " section=section,\n", + " summary=summary,\n", + " # model_candidates could be given as an argument if you want to expand options.\n", + " )\n", + "\n", + " if new_agent is None:\n", + " print(\n", + " \"\\tNo passing model found. Optimization failed for this section.\"\n", + " )\n", + " summary_passed = False\n", + " else:\n", + " summarization_agent = new_agent\n", + " summary_passed = True\n", + " print(\n", + " f\"\\tPrompt improved → v{summarization_prompt.current().version} \"\n", + " f\"(model={summarization_prompt.current().model})\"\n", + " )\n", + " break\n", + "\n", + " if not summary_passed:\n", + " print(\n", + " \"\\tAll attempts failed; keeping latest prompt version \"\n", + " f\"v{summarization_prompt.current().version} (model={summarization_prompt.current().model}) for the next section.\"\n", + " )\n", + "\n", + " summarization_agent = apply_best_candidate_if_needed()\n", + "\n", + " print(\"\" + \"-\" * 80)\n", + " print(\"Completed optimization loop.\")\n", + " print(f\"Final prompt version: v{summarization_prompt.current().version}\")\n", + " print(f\"Final model: {summarization_prompt.current().model}\")\n", + " aggregate_best = select_best_aggregate_prompt()\n", + " if best_candidate[\"score\"] > float(\"-inf\"):\n", + " print(\n", + " f\"Best lenient prompt: v{best_candidate.get('version')} (avg={best_candidate['score']:.3f}, model={best_candidate.get('model', 'unknown')})\"\n", + " )\n", + " if aggregate_best:\n", + " per_section = (\n", + " aggregate_best.get(\"total_average\", 0.0) / aggregate_best.get(\"count\", 1)\n", + " if aggregate_best.get(\"count\")\n", + " else 0.0\n", + " )\n", + " print(\n", + " f\"Aggregate best prompt: v{aggregate_best.get('version')} \"\n", + " f\"(total={aggregate_best.get('total_score', 0.0):.3f}, avg/section={per_section:.3f}, model={aggregate_best.get('model', 'unknown')})\"\n", + " )\n", + " print(f\"Final prompt: {summarization_prompt.current().prompt}\")\n", + " print(f\"Final model: {summarization_prompt.current().model}\")\n", + " return summarization_agent\n", + "\n", + "summarization_agent = await self_evolving_loop_with_model_comparison(summarization_agent)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we can see a very similar output with additional information on the model version scores:" + ] + }, + { + "cell_type": "raw", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "Starting self-evolving loop | Initial prompt v0\n", + "Prompt:\n", + "\tYou are a summarization assistant.\n", + "Given a section of text, produce a concise, accurate summary.\n", + "\n", + "[....]\n", + "\n", + "[Section 3.2.P.2.2] Using prompt v2\n", + "\tAttempt 1: evaluating summary...\n", + "\tFailed eval. Improving prompt...\n", + "Candidate average — gpt-5: 0.3533 (passed=False)\n", + "Candidate average — gpt-5-mini: 0.4670 (passed=False)\n", + "No passing models. Best candidate (model=gpt-5-mini, avg=0.4670) did not pass. Prompt not updated.\n", + "\tNo passing model found. Optimization failed for this section.\n", + "\tAttempt 2: evaluating summary...\n", + "Exceeded retries, aborting\n", + "\tPassed with prompt v2\n", + "\n", + "--------------------------------------------------------------------------------\n", + "Completed optimization loop.\n", + "Final prompt version: v2\n", + "Final prompt:\n", + "**Improved Prompt:**\n", + "\n", + "You are a summarization assistant. \n", + "Given any section of text, generate a concise and accurate summary that includes all key concepts, components, and their main characteristics or interactions as described in the original section. Your summary should be brief yet complete, faithfully reflecting essential information, descriptors, and relationships between elements while omitting unnecessary details. Ensure the summary maintains the original meaning and captures all critical content and terminology relevant to the section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### b. Prompt Optimization with Genetic-Pareto (GEPA)\n", + "\n", + "We've demonstrated that the self-evolving loop works and that a prompt can be improved autonomously using Evals. However, we relied on a relatively straightforward, static metaprompt to improve our system prompt. In this section, we explore a more dynamic and reflexive method by using Genetic-Pareto (GEPA) [[1]](##Citations) — a framework that samples agent trajectories, reflects on them in natural language, proposes prompt revisions, and evolves the system through iterative feedback loops. \n", + "\n", + "The GEPA method, described in the paper available [here](https://doi.org/10.48550/arXiv.2507.19457), offers an compelling blueprint for continuous, self-improving prompt optimization. The code below draws generously on the GEPA Github repository available [here](https://github.com/gepa-ai/gepa)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import gepa\n", + "from gepa import EvaluationBatch\n", + "\n", + "# Extract sections from dataset\n", + "def read_csv_content(file_path: str) -> list[dict]:\n", + " \"\"\"Read csv and return section to summarize.\"\"\"\n", + " df = pd.read_csv(file_path)\n", + " return [{'content': content} for content in df['content'].tolist()]\n", + "\n", + "# Split dataset into training and validation sets\n", + "trainset = read_csv_content(\"data/dataset.csv\")\n", + "val_cut = max(1, int(0.1 * len(trainset)))\n", + "valset = trainset[:val_cut] if len(trainset) > 1 else trainset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We’ll reuse our graders and helper functions by adding a small adapter so that our setup works with GEPA. GEPA’s `GEPAAdapter` makes it easy to plug into our eval framework. We defined three hooks\n", + "- `evaluate`: runs the summarization and grades with graders defined in the previous section (i.e., chemical_name_grader, word_length_deviation_grader, cosine_similarity, llm_as_judge).\n", + "- `get_components_to_update`: gets the text fields GEPA should evolve (here, system_prompt).\n", + "- `make_reflective_dataset`: packages inputs, outputs, and feedback for reflection." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class EvalsBackedSummarizationAdapter:\n", + " \"\"\"\n", + " Minimal adapter for GEPA:\n", + " - evaluate(...) -> EvaluationBatch (scores + outputs + feedback-rich trajectories)\n", + " - get_components_to_update(...) returns the prompt to update\n", + " - make_reflective_dataset(...) packages examples for reflection\n", + " \"\"\"\n", + " propose_new_texts = None # use GEPA's default reflection flow\n", + "\n", + " def __init__(self, client, eval_id: str, gen_model: str = \"gpt-5\", user_prefix: str | None = None):\n", + " self.client = client\n", + " self.eval_id = eval_id\n", + " self.gen_model = gen_model\n", + " self.user_prefix = user_prefix or \"Summarize:\\n\\n\"\n", + "\n", + " # Same summarization agent as in the previous section\n", + " def _summarize(self, system_prompt: str, section: str) -> str:\n", + " resp = self.client.chat.completions.create(\n", + " model=self.gen_model,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\"role\": \"user\", \"content\": f\"{self.user_prefix}{section}\"},\n", + " ],\n", + " )\n", + " return resp.choices[0].message.content.strip()\n", + "\n", + " # Required by GEPA: run eval minibatch\n", + " def evaluate(self, inputs: list[dict], candidate: dict, capture_traces: bool = True) -> EvaluationBatch:\n", + " system_prompt = candidate[\"system_prompt\"]\n", + "\n", + " scores: list[float] = []\n", + " outputs: list[str] = []\n", + " trajectories: list[dict] = []\n", + "\n", + " for item in inputs:\n", + " section = item[\"content\"]\n", + "\n", + " # 1) Generate with the candidate prompt\n", + " summary = self._summarize(system_prompt, section)\n", + " outputs.append(summary)\n", + "\n", + " # 2) Grade using previous evals pipeline\n", + " run = run_eval(eval_id=self.eval_id, section=section, summary=summary)\n", + " out_items = poll_eval_run(eval_id=self.eval_id, run_id=run.id)\n", + " grader_scores = parse_eval_run_output(out_items)\n", + "\n", + " # 3) Score + actionable feedback\n", + " scalar = calculate_grader_score(grader_scores)\n", + " feedback = collect_grader_feedback(grader_scores) or \"All graders passed; keep precision and coverage.\"\n", + "\n", + " scores.append(float(scalar))\n", + " trajectories.append(\n", + " {\n", + " \"inputs\": {\"section\": section},\n", + " \"generated_output\": summary,\n", + " \"metrics\": {\n", + " \"combined\": float(scalar),\n", + " \"by_grader\": grader_scores, # keeping for analysis if needed\n", + " },\n", + " \"feedback\": feedback,\n", + " }\n", + " )\n", + "\n", + " return EvaluationBatch(scores=scores, outputs=outputs, trajectories=trajectories)\n", + "\n", + " # Required by GEPA: text field to evolve\n", + " def get_components_to_update(self, candidate: dict) -> list[str]:\n", + " return [\"system_prompt\"]\n", + "\n", + " # Required by GEPA: build the reflective dataset the reflection LM will read\n", + " def make_reflective_dataset(self, candidate: dict, eval_batch: EvaluationBatch, components_to_update: list[str]) -> dict:\n", + " examples = []\n", + " for traj in (eval_batch.trajectories or []):\n", + " examples.append(\n", + " {\n", + " \"Inputs\": {\"section\": traj[\"inputs\"][\"section\"]},\n", + " \"Generated Outputs\": traj[\"generated_output\"],\n", + " \"Feedback\": traj[\"feedback\"],\n", + " }\n", + " )\n", + " return {\"system_prompt\": examples}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that the adapter is ready, we can run GEPA using the same starting prompt (`\"You are a summarization assistant. Given a section of text, produce a summary.\"`) and model (here, `gpt-5`) as in the earlier self-evolving loop for comparison. We provide our adapter instance, seed candidate, and training/validation sets to `gepa.optimize(...)`. During the optimization, GEPA repeatedly invokes the adapter to score candidates, reflects on feedback, and ultimately produces the best evolved prompt.\n", + "\n", + "_Note: GEPA might take ~10-15 minutes to complete._" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "seed_candidate = {\"system_prompt\": \"You are a summarization assistant. Given a section of text, produce a summary.\"}\n", + "\n", + "adapter = EvalsBackedSummarizationAdapter(\n", + " client=client,\n", + " eval_id=EVAL_ID,\n", + " gen_model=summarization_prompt.current().model, \n", + ")\n", + "\n", + "# Keeping max_metric_calls small for the cookbook. \n", + "# In practice, use a larger value to allow more optimization iterations.\n", + "result = gepa.optimize(\n", + " seed_candidate=seed_candidate,\n", + " trainset=trainset,\n", + " valset=valset,\n", + " adapter=adapter,\n", + " reflection_lm=\"gpt-5\",\n", + " max_metric_calls=10,\n", + " track_best_outputs=True,\n", + " display_progress_bar=True\n", + ")\n", + "\n", + "best_prompt = result.best_candidate[\"system_prompt\"]\n", + "print(\"\\n=== Best evolved instruction ===\\n\")\n", + "print(best_prompt)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is an example (abridged) output for the code above:" + ] + }, + { + "cell_type": "raw", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "Iteration 0: Base program full valset score: 0.2183466466681351\n", + "Iteration 1: Selected program 0 score: 0.2183466466681351\n", + "Iteration 1: Proposed new text for system_prompt: \n", + "\n", + "[.......]\n", + "\n", + "Iteration 3: New subsample score 0.6592202195294341 is better than old score 0.6565039300893376. Continue to full eval and add to candidate pool.\n", + "GEPA Optimization: 90%|█████████ | 18/20 [39:21<04:22, 131.19s/rollouts]\n", + "Iteration 3: Full valset score for new program: 0.2225472423976205\n", + "Iteration 3: Full train_val score for new program: 0.2225472423976205\n", + "Iteration 3: Individual valset scores for new program: [0.22866548337721018, 0.21864704884895614, 0.2203291949666952]\n", + "Iteration 3: New valset pareto front scores: [0.23142100182952327, 0.2389098334382265, 0.23513790628541456]\n", + "Iteration 3: Full valset pareto front score: 0.2351562471843881\n", + "Iteration 3: Updated valset pareto front programs: [{1}, {1}, {1}]\n", + "Iteration 3: Best valset aggregate score so far: 0.2351562471843881\n", + "Iteration 3: Best program as per aggregate score on train_val: 1\n", + "Iteration 3: Best program as per aggregate score on valset: 1\n", + "Iteration 3: Best score on valset: 0.2351562471843881\n", + "Iteration 3: Best score on train_val: 0.2351562471843881\n", + "Iteration 3: Linear pareto front program index: 1\n", + "Iteration 3: New program candidate index: 2\n", + "\n", + "=== Best evolved instruction ===\n", + "\n", + "You are a domain-aware summarization assistant for technical pharmaceutical texts. Given a “section” of text, produce a concise summary that preserves key technical facts and exact nomenclature.\n", + "\n", + "Requirements:\n", + "- Length and format:\n", + " - Write 1–3 sentences totaling about 45–70 words (never exceed 90 words). Default to ~60 words.\n", + " - Use a single paragraph (no bullet points, headings, or heavy formatting).\n", + "- Preserve exact technical names and notation:\n", + " - Include every chemical name that appears in the section at least once, with exact spelling, capitalization, isotopic labels, brackets, hyphens, salts, and buffer names (e.g., Hyperpolarized Pyruvate (13C) Injection; [1-13C]pyruvic acid; hyperpolarized [1-13C]pyruvate; 15 mM AH111501 sodium salt; TRIS/EDTA buffer solution).\n", + " - Keep study identifiers, section numbers, regulatory citations, and codes verbatim when mentioned (e.g., GE-101-001, GE-101-003, USP <797>, 3.2.P.7, company codes, CAS numbers).\n", + "...\n", + "Self-check before finalizing:\n", + "- Have you included every chemical name exactly as written?\n", + "- Is the summary within 45–70 words (≤90 max) and a single paragraph?\n", + "- Are key process/regulatory/test details and critical numbers preserved without unnecessary verbosity?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this cookbook, we explored three distinct approaches to prompt optimization:\n", + "\n", + "- **OpenAI Platform Optimizer:** using the _Optimize_ button with a dataset containing manually entered human feedback (thumbs up/down and textual comments), we quickly produced a strong prompt with minimal configuration. This method excels at rapid iteration, but does not provide the automation needed for production environments.\n", + "\n", + "- **Optimization using a static metaprompt:** Our loop, incorporating four different graders,enabled automated exploration and iterative self-improvement without manual intervention. However, its exploration space was limited by a single static meta-prompt, and evaluation was performed section by section. Consequently, this approach risked overfitting to immediate grader feedback instead of achieving broader generalization.\n", + "\n", + "- **GEPA optimization:** Offering a more structured search process, reflective updates were informed by both quantitative scores and textual feedback, while candidates were trained on one dataset and validated on another. This method produced a more robust, generalized prompt and provided clearer empirical evidence of its performance.\n", + "\n", + "_Note: Examples of prompts generated by each method are available in the Appendix._ \n", + "\n", + "Depending on your use case, you may prioritize speed (OpenAI optimizer), lightweight automation (static metaprompt), or systematic generalization (GEPA). In practice, combining these methods by starting with rapid iteration and progressing toward reflective optimization can deliver both agility and performance.\n", + "\n", + "Happy coding!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Contributors\n", + "\n", + "This cookbook is based on a joint collaboration between [Bain](www.bain.com) and [OpenAI](openai.com). \n", + "\n", + "[Calvin Maguranis](https://www.linkedin.com/in/calvin-maguranis-b9956045/) \n", + "[Fanny Perraudeau](https://www.linkedin.com/in/fanny-sabran-perraudeau-494b7573/) \n", + "[Giorgio Saladino](https://www.linkedin.com/in/giorgio-saladino-202/) \n", + "[Shikhar Kwatra](https://www.linkedin.com/in/shikharkwatra/) \n", + "[Valentina Frenkel](https://www.linkedin.com/in/valentina-frenkel/) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Citations\n", + "\n", + "[1] _GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning_ by Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab - https://arxiv.org/abs/2507.19457" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix\n", + "\n", + "### Examples of output prompts:\n", + "\n", + "- **Initial prompt:** \n", + "```pgsql \n", + "You are a summarization assistant. Given a section of text, produce a summary.\n", + "```\n", + "\n", + "- **OpenAI Platform Optimizer:** \n", + "```pgsql \n", + "You are a summarization assistant.\n", + "Task: Summarize the provided text concisely and accurately.\n", + "Output requirements:\n", + "- Output only the summary. Do not add titles, labels (e.g.,\n", + "\"Summary:\"), prefaces, or commentary.\n", + "- Preserve the document's structure. If multiple sections/subsections appear, summarize each one.\n", + "- Use a numbered list for sections/subsections (use their numbers/titles when present).\n", + "- Under each, use short dash bullets for key points.\n", + "- If there is only a single short section, return a brief bullet list or 1-2 concise sentences.\n", + "- Split any inline lists into separate bullets.\n", + "- Use plain, simple language. Keep bullets tight (ideally one line each). Remove redundancy.\n", + "- Include important quantitative details (values, units, conditions) and constraints. Do not invent information.\n", + "- Keep formatting simple: plain text, \"1.\" numbering and \"-\" bullets only. No tables or special markup.\n", + "- Retain exact technical terms/notation from the source (e.g., chemical names, isotopic labels).\n", + "- If a section is explicitly marked \"Not applicable,\" include that status; otherwise do not add it.\n", + "```\n", + "\n", + "- **Static metaprompt:** \n", + "```pgsql \n", + "You are a technical summarization assistant for scientific and regulatory documentation. Your task is to generate a concise, comprehensive, and fully detailed summary of any scientific, technical, or regulatory text provided. Strictly adhere to the following instructions:\n", + "\n", + "---\n", + "\n", + "**1. Complete and Exact Information Inclusion** \n", + "- Capture *every* explicit fact, technical value, specification, quantity, measurement, regulatory reference, entity, process, site, and contextual detail verbatim from the source text.\n", + "- Do not omit or generalize any explicit information, no matter how minor.\n", + "\n", + "**2. Precise Terminology and Named Entity Retention** \n", + "- Reproduce all names of chemicals, drugs, mixtures, buffer components, devices, companies, institutions, regulatory standards, section numbers, and procedural labels *exactly as stated*.\n", + "- Report all quantities, measurements, concentrations, ratios, masses, volumes, compositions, pH values, and units precisely as given.\n", + "- Do not paraphrase, rename, substitute, or simplify any term or value.\n", + "\n", + "**3. All Procedural Details and Justifications** \n", + "- Explicitly include all described procedures, technical processes (e.g., terminal sterilization, aseptic processing), operational constraints, process justifications, compliance requirements, and standards references.\n", + "- Clearly state all reasons provided for choosing or omitting particular methods or processes.\n", + "\n", + "**4. Regulatory and Compliance References** \n", + "- Accurately cite all regulations, standards (e.g., USP <797>), compliance statements, section numbers, and cross-references as in the original.\n", + "- Include all explicit mentions of compliance, applicability, and site location details.\n", + "\n", + "**5. Explicit Statements of Absence, Limitations, and Applicability** \n", + "- Clearly state any declarations of absence, inapplicability (“Not applicable”), or limitations exactly as written in the source.\n", + "\n", + "**6. Structural and Organizational Fidelity** \n", + "- Precisely reflect the original document’s section and subsection hierarchy, using clear section labels and indentation.\n", + "- Present all enumerations, lists, and tabulated data in structured bullet-point or numbered format, organized in accordance with the source document’s arrangement.\n", + "\n", + "**7. No Paraphrasing, Summarizing, or Reinterpretation** \n", + "- Do *not* paraphrase, summarize contextually, reinterpret, or alter the meaning or sequence of any content.\n", + "- Remove only literal repetitions or redundant phrasing; otherwise, preserve all explicit statements, technical details, and contextual notes.\n", + "\n", + "---\n", + "\n", + "**Summary Output Objective:** \n", + "Produce a summary that delivers the full technical, factual, and regulatory content and structure of the original text, reformatted by eliminating only redundant language. The summary must enable audit, regulatory review, or peer reference without loss of any explicit information or terminology from the source.\n", + "\n", + "---\n", + "\n", + "*Apply these instructions rigorously to every provided document section to ensure scientific and regulatory accuracy and completeness.*\n", + "```\n", + "\n", + "- **GEPA optimizer**: \n", + "```pgsql \n", + "You are a domain-aware summarization assistant for technical pharmaceutical texts. Given a “section” of text, produce a concise, single-paragraph summary that preserves key technical facts and exact nomenclature.\n", + "\n", + "Length and format\n", + "- Write 1–3 sentences totaling about 45–70 words (target ~60; never exceed 90).\n", + "- Use one paragraph; no bullets, headings, tables, or heavy formatting.\n", + "\n", + "Exact names and notation\n", + "- Include every chemical name that appears in the section at least once, using the exact original spelling, capitalization, punctuation, isotopic labels, brackets, hyphens, salts, buffer names, and parenthetical qualifiers. Treat distinct case/format variants as distinct names (e.g., [1-13C]pyruvic acid and [1-13C]Pyruvic acid are separate and each must appear once).\n", + "- Examples you must preserve verbatim when present: Hyperpolarized Pyruvate (13C) Injection; non-polarized Pyruvate Injection; Pyruvate (13C) Injection; hyperpolarized [1-13C]pyruvate; Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt; TRIS/EDTA buffer solution; TRIS; NaOH; Na2EDTA; [1-13C]pyruvic acid; AH111501 sodium salt.\n", + "- Also preserve exact study identifiers, batch codes, section numbers, regulatory citations, and instrument parameters as written (e.g., GE-101-001, GE-101-003, USP <797>, 3.2.P.5.2.5, FFF106/140-806, FFF106/142-806, 3T MRI, 5 degree RF pulse, TR=3s, 90 degree pulse, 64 averages, TR=10s, 10 μl Gd/ml solution).\n", + "\n", + "Content prioritization (if space is tight)\n", + "1) What the section is about (topic/purpose).\n", + "2) All named chemical entities and compositions (list all chemical names at least once; include concentrations/amounts if given).\n", + "3) Critical process/handling facts (e.g., aseptic processing vs terminal sterilization; ISO classifications; filtration specs; compounding/filling steps; temperatures/times/volumes; storage/administration limits).\n", + "4) Container/packaging specifics (e.g., cryovials, “sterile fluid path”).\n", + "5) Microbiological/testing/regulatory details (e.g., sterility/pyrogenicity testing timing; USP <797>; state board compliance; site/manufacturer if stated).\n", + "6) Overages/single-dose formulas and key quantities.\n", + "\n", + "Numerical fidelity\n", + "- Preserve all critical numbers and units exactly (e.g., 1.44 g, 27.7 mg, 15 mM, 18 mL, 1.47 g, two 0.2 μm filters, ISO 7, ISO 5, 38 mL).\n", + "- Include testing/analysis parameters when present (e.g., polarization/relaxation time (T1); number of spectra; pulse angles; TR values; MRI location relative to clean room).\n", + "\n", + "Style and compression\n", + "- Be neutral and factual; do not infer unstated information.\n", + "- Consolidate repeated statements; compress lists with commas/semicolons to save words.\n", + "- Mention tables/figures only to convey key data; do not reproduce them.\n", + "- If many chemicals are present, ensure each distinct name appears once; group them succinctly.\n", + "- Avoid symbols or special formatting not in the source text.\n", + "\n", + "Common domain cues to include when present\n", + "- Aseptic processing vs terminal sterilization and the rationale/timing (e.g., “tested for sterility and pyrogenicity subsequent to patient administration”).\n", + "- Environmental/processing controls (ISO 7/ISO 5; LAF unit; filtration; filling/weight targets per cryovial).\n", + "- Site/regulatory context (e.g., USP <797>; California State Board of Pharmacy; University of California, San Francisco Department of Clinical Pharmacy).\n", + "- Study/kit equivalence statements (e.g., equivalence to GE-101-001/GE-101-003 formulations).\n", + "- QC/measurement methods (e.g., capacitive threshold at Administration syringe nominal 38 mL).\n", + "\n", + "Self-check before finalizing\n", + "- Does the paragraph contain every distinct chemical name exactly as written in the section (including case and notation variants)?\n", + "- Is the summary 45–70 words (≤90), in a single paragraph?\n", + "- Are the most critical process/regulatory/testing details and all key numbers preserved without unnecessary verbosity?`\n", + "```" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/partners/self_evolving_agents/data/c13_pyruvate_sample_CMC_from_UCSF.pdf b/examples/partners/self_evolving_agents/data/c13_pyruvate_sample_CMC_from_UCSF.pdf new file mode 100644 index 0000000000..867a530657 Binary files /dev/null and b/examples/partners/self_evolving_agents/data/c13_pyruvate_sample_CMC_from_UCSF.pdf differ diff --git a/examples/partners/self_evolving_agents/data/chemical_names.json b/examples/partners/self_evolving_agents/data/chemical_names.json new file mode 100644 index 0000000000..61c3679ce3 --- /dev/null +++ b/examples/partners/self_evolving_agents/data/chemical_names.json @@ -0,0 +1,77 @@ +[ + "[1-¹³C]Pyruvic acid", + "[1-¹³C]Pyruvate", + "¹²C Pyruvic acid", + "Sodium [1-¹³C]pyruvate", + "Sodium pyruvate (¹²C)", + "AH111501 (Trityl radical)", + "Tris{8-carboxyl-2,2,6,6-tetra[2-(1-methoxyethyl)]-benzo(1,2-d:4,5-d’)bis(1,3)dithiole-4-yl}methyl acid", + "AH111501 sodium salt", + "Methyl, tris[8-carboxy-2,2,6,6-tetrakis(2-methoxyethyl)benzo[1,2-d:4,5-d’]bis[1,3]dithiol-4-yl]-, trisodium salt", + "AH111501 trisodium salt", + "AH111576", + "2,2′,2″,2‴-(4,8-Dibromobenzo[1,2-d:4,5-d′]bis([1,3]dithiole)-2,2,6,6-tetrayl)tetraethanol", + "AH111586", + "4,8-Dibromo-2,2,6,6-tetrakis(2-methoxyethyl)benzo[1,2-d:4,5-d′]bis([1,3]dithiole)", + "AH111709", + "AH111743", + "AH112615", + "4,4-Bis-hydroxymethyl-2-methyl-oxazolidine-2-carboxylic acid", + "AH112623", + "Parapyruvate", + "2-Hydroxy-2-methyl-4-oxo-pentanedioic acid", + "AH113127", + "(4-Hydroxymethyl-oxazolidin-4-yl)-methanol", + "AH113462/E", + "Enol lactone", + "AH113462/K", + "Keto lactone", + "Acetyl bromide", + "Methanol", + "Dimethyl sulfoxide", + "DMSO", + "Tetrahydrofuran", + "THF", + "Acetonitrile", + "ACN", + "Diethyl ether", + "Et₂O", + "N,N-Dimethylacetamide", + "DMA", + "1,3-Dimethyl-2-imidazolidinone", + "DMI", + "Hydrochloric acid", + "HCl", + "Sodium hydroxide", + "NaOH", + "Disodium ethylenediaminetetraacetate", + "Na₂EDTA", + "Ethylenediaminetetraacetic acid", + "EDTA", + "Tris(hydroxymethyl)aminomethane", + "TRIS", + "Trometamol", + "Trifluoroacetic acid", + "TFA", + "Toluene", + "Heptane", + "Ethyl acetate", + "Ethanol", + "Water", + "H₂O", + "Sodium chloride", + "NaCl", + "Cuprous [1-¹³C]cyanide", + "Cu¹³CN", + "Gadolinium", + "Gd", + "Tin", + "Sn", + "Phosphorus", + "P", + "Carbon dioxide", + "CO₂", + "Sodium [1-13C]pyruvate", + "[1-13C]Pyruvic acid", + "1-13C pyruvate" +] diff --git a/examples/partners/self_evolving_agents/data/dataset.csv b/examples/partners/self_evolving_agents/data/dataset.csv new file mode 100644 index 0000000000..20d667ed9c --- /dev/null +++ b/examples/partners/self_evolving_agents/data/dataset.csv @@ -0,0 +1,31 @@ +section_number,toc_index,title,content +7.1,17,Drug Substance,"3.2.S.1 General Information ([1-13C]pyruvic acid) The active ingredient in Hyperpolarized Pyruvate (13C) Injection is hyperpolarized [1-13C]pyruvate. The drug substance is defined as [13C]pyruvic acid, which is neutralized to [1-13C]pyruvate during the compounding process. In several pre-clinical and clinical studies and during evaluation of stability, pyruvic acid has been used instead of [1-13C]pyruvic acid (see Sections 3.2.P.2.2.1 Formulation Development for Hyperpolarized Pyruvate (13C) Injection and Section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info). In the Section 3.2.S Drug Substance, data are presented for both pyruvic acid and for [1-13C]pyruvic acid. For simplicity, the terminology used in headings and captions is [1-13C]pyruvic acid. Batches containing pyruvic acid are specified by footnotes. 3.2.S.1.1 Nomenclature ([1-13C]pyruvic acid) The drug substance used for compounding of Hyperpolarized Pyruvate (13C) Injection is [1-13C]pyruvic acid. Company code: W6578 Chemical name: [1-13C]pyruvic acid CAS registry number: 127-17-3 3.2.S.1.2 Structure ([1-13C]pyruvic acid) Figure 1 Structure of [1-13C]pyruvic acid Molecular formula: C H O 3 4 3 Molecular weight: 89.06 3.2.S.1.3 General Properties ([1-13C]pyruvic acid) Appearance: Colorless to yellow, clear, viscous liquid pKa:Ka:aranWater solubility: Complete The structure of [1-13C]pyruvic acid has been confirmed by spectroscopic analysis (see Section 3.2.S.3.1 Elucidation of Structure and other Characteristics)." +7.2,28,Drug Product Part 1,"3.2.P DRUG PRODUCT (STERILE FLUID PATH COMPONENTS) Hyperpolarized Pyruvate (13C) Injection (drug product) is a sterile solution for intravenous injection. The compounding of Hyperpolarized Pyruvate (13C) Injection is performed by an automated compounding device known as SpinLab. For each patient dose, SpinLab utilizes a single sterile fluid path which contains the following three drug product components: • Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt • TRIS/EDTA buffer solution • Sterile Water for Injection (WFI) The following 3.2.P sections describe the individual drug product components. For aspects related to the compounding of the drug product, Hyperpolarized Pyruvate (13C) Injection, reference is made to 3.2.P for Hyperpolarized Pyruvate (13C) Injection. Commercially available USP quality Sterile Water for Injection (Hospira Inc., USA) is provided by the clinical site. Aspects of this drug product component will therefore not be addressed." +3.2.P.2.1,29,Components of the Drug Product (Drug Product Kit Components),"3.2.P.2.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt (a) Drug substance The drug substance, [1−13C]pyruvic acid, is a colorless to yellow, clear, viscous liquid. [1−13C]Pyruvic acid is described in Section 3.2.S Drug Substance. Upon neutralization in the TRIS/EDTA buffer solution, the [1−13C]pyruvic acid is converted to [1−13C]pyruvate. (b) Excipients AH111501 sodium salt is a stable trityl radical, and is added to [1−13C]pyruvic acid to enable hyperpolarization. AH111501 sodium salt is a green to black, fine to granular powder. AH111501 sodium salt is further described in Section 3.2.A.3 Novel Excipients. 3.2.P.2.1.2 TRIS/EDTA buffer solution The TRIS/EDTA buffer solution is an aqueous solution containing 333 mM TRIS, 600 mM NaOH and 333 mg/l Na EDTA. 2 TRIS is used as buffer to stabilize the pH of the Hyperpolarized Pyruvate (13C) Injection at a physiologically acceptable level. NaOH is added to neutralize the [1−13C]pyruvic acid in Mixture of [1−13C]pyruvic acid and 15" +3.2.P.2.2,30,Drug Product (Drug Product Kit Components),"3.2.P.2.2.1 Formulation Development (a) Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is dissolved in WFI and neutralized/buffered in TRIS/EDTA buffer solution to form a solution with a physiologically acceptable pH. The concentration of AH111501 sodium salt of 15 mM has been chosen for optimization of 13C nuclear polarization in Hyperpolarized Pyruvate (13C) Injection. For clinical trials GE-101-001 and GE-101-003, pyruvic acid was used instead of [1-13C]pyruvic acid. For these trials the Pyruvate Injection was not compounded hence; in order to mimic the maximum content of AH111501 in Hyperpolarized Pyruvate (13C) Injection, the kit component used during the clinical trials GE-101-001 and GE 101-003 was Mixture of pyruvic acid and 0.2 mM AH111501 sodium salt. In addition, some pre-clinical studies were performed using pyruvic acid instead of [1- 13C]pyruvic acid. See Section 3.2.P.2.2.1 Formulation development for Hyperpolarized Pyruvate (13C) Injection and section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info) for further details. The amount of [1-13C]pyruvic acid and AH111501 sodium salt mixture per cryovial is 1.47 g, which upon dissolution in the total volume of WFI and TRIS/EDTA buffer solution, gives 250 mM [1- 13C]pyruvate in the final Hyperpolarized Pyruvate (13C) Injection. (b) TRIS/EDTA buffer solution The function of TRIS/EDTA buffer solution is to neutralize the [1-13C]pyruvic acid to [1- 13C]pyruvate and to assure a physiologically acceptable pH of the drug product Hyperpolarized Pyruvate (13C) Injection. TRIS/EDTA buffer solution has not been used during pre-clinical studies or during clinical trials GE-101-001 and GE-101-003. For these studies, the Mixture of [1-13C]pyruvic acid and AH111501 sodium salt was dissolved in a single, manual step in TRIS/EDTA dissolution medium. For compounding of Hyperpolarized Pyruvate (13C) Injection, the Mixture of [1- 13C]pyruvic acid and 15 mM AH111501 sodium salt will first be dissolved in WFI and then neutralized and buffered in TRIS/EDTA buffer solution. See Section 3.2.P.2.2.1 Formulation Development for Hyperpolarized Pyruvate (13C) Injection for details. The amount of [1-13C]pyruvic acid to be dissolved is 1.67 g (equivalent to 18.75 mmol). This amount of acid is neutralized and buffered with 22.5 ml of TRIS/EDTA buffer solution (equivalent to 8.33 mmol of TRIS and 15.00 mmol of NaOH) to a target pH of 7.6 (at 37°C) in the Hyperpolarized Pyruvate (13C) Injection. Sample not for submission mM AH111501 sodium salt to [1−13C]pyruvate in the Hyperpolarized Pyruvate (13C) Injection. Na EDTA has been included in the formulation as a chelating agent to capture traces of 2 paramagnetic metal ions that might be present. 3.2.P.2.2 Drug Product (Drug Product Kit Components) 3.2.P.2.2.1 Formulation Development (a) Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is dissolved in WFI and neutralized/buffered in TRIS/EDTA buffer solution to form a solution with a physiologically acceptable pH. The concentration of AH111501 sodium salt of 15 mM has been chosen for optimization of 13C nuclear polarization in Hyperpolarized Pyruvate (13C) Injection. For clinical trials GE-101-001 and GE-101-003, pyruvic acid was used instead of [1-13C]pyruvic acid. For these trials the Pyruvate Injection was not compounded hence; in order to mimic the maximum content of AH111501 in Hyperpolarized Pyruvate (13C) Injection, the kit component used during the clinical trials GE-101-001 and GE 101-003 was Mixture of pyruvic acid and 0.2 mM AH111501 sodium salt. In addition, some pre-clinical studies were performed using pyruvic acid instead of [1- 13C]pyruvic acid. See Section 3.2.P.2.2.1 Formulation development for Hyperpolarized Pyruvate (13C) Injection and section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info) for further details. The amount of [1-13C]pyruvic acid and AH111501 sodium salt mixture per cryovial is 1.47 g, which upon dissolution in the total volume of WFI and TRIS/EDTA buffer solution, gives 250 mM [1- 13C]pyruvate in the final Hyperpolarized Pyruvate (13C) Injection. (b) TRIS/EDTA buffer solution The function of TRIS/EDTA buffer solution is to neutralize the [1-13C]pyruvic acid to [1- 13C]pyruvate and to assure a physiologically acceptable pH of the drug product Hyperpolarized Pyruvate (13C) Injection. TRIS/EDTA buffer solution has not been used during pre-clinical studies or during clinical trials GE-101-001 and GE-101-003. For these studies, the Mixture of [1-13C]pyruvic acid and AH111501 sodium salt was dissolved in a single, manual step in TRIS/EDTA dissolution medium. For compounding of Hyperpolarized Pyruvate (13C) Injection, the Mixture of [1- 13C]pyruvic acid and 15 mM AH111501 sodium salt will first be dissolved in WFI and then neutralized and buffered in TRIS/EDTA buffer solution. See Section 3.2.P.2.2.1 Formulation Development for Hyperpolarized Pyruvate (13C) Injection for details. The amount of [1-13C]pyruvic acid to be dissolved is 1.67 g (equivalent to 18.75 mmol). This amount of acid is neutralized and buffered with 22.5 ml of TRIS/EDTA buffer solution (equivalent to 8.33 mmol of TRIS and 15.00 mmol of NaOH) to a target pH of 7.6 (at 37°C) in the Hyperpolarized Pyruvate (13C) Injection. Sample not for submission 3.2.P.2.2.2 Overages (a) Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt There are no overages included in the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. (b) TRIS/EDTA buffer solution There are no overages included in the TRIS/EDTA buffer solution. 3.2.P.2.3 Manufacturing Process Development (Drug Product Kit Components) 3.2.P.2.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Terminal sterilization of the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is not possible due to degradation of [1-13C]pyruvic acid. The current process is therefore performed by aseptic processing. 3.2.P.2.3.2 TRIS/EDTA buffer solution Terminal sterilization of TRIS/EDTA buffer solution in various container closure systems has been tested, but generation of particles occurred during sterilization. This is probably caused by the high pH of the TRIS/EDTA buffer solution. The current process is therefore performed by aseptic processing. 3.2.P.2.4 Container Closure System (Sterile Fluid Path Components) 3.2.P.2.4.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.4.2 TRIS/EDTA buffer solution The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the TRIS/EDTA buffer solution. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.5 Microbiological Attributes (Sterile Fluid Path Components) 3.2.P.2.5.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Not applicable. The mixture of [1-13C]pyruvic acid and 15 mM AH111501 is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for" +3.2.P.2.3,31,Manufacturing Process Development (Drug Product Kit Components),"3.2.P.2.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Terminal sterilization of the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is not possible due to degradation of [1-13C]pyruvic acid. The current process is therefore performed by aseptic processing. 3.2.P.2.3.2 TRIS/EDTA buffer solution Terminal sterilization of TRIS/EDTA buffer solution in various container closure systems has been tested, but generation of particles occurred during sterilization. This is probably caused by the high pH of the TRIS/EDTA buffer solution. The current process is therefore performed by aseptic processing. Sample not for submission 3.2.P.2.2.2 Overages (a) Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt There are no overages included in the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. (b) TRIS/EDTA buffer solution There are no overages included in the TRIS/EDTA buffer solution. 3.2.P.2.3 Manufacturing Process Development (Drug Product Kit Components) 3.2.P.2.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Terminal sterilization of the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is not possible due to degradation of [1-13C]pyruvic acid. The current process is therefore performed by aseptic processing. 3.2.P.2.3.2 TRIS/EDTA buffer solution Terminal sterilization of TRIS/EDTA buffer solution in various container closure systems has been tested, but generation of particles occurred during sterilization. This is probably caused by the high pH of the TRIS/EDTA buffer solution. The current process is therefore performed by aseptic processing. 3.2.P.2.4 Container Closure System (Sterile Fluid Path Components) 3.2.P.2.4.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.4.2 TRIS/EDTA buffer solution The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the TRIS/EDTA buffer solution. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.5 Microbiological Attributes (Sterile Fluid Path Components) 3.2.P.2.5.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Not applicable. The mixture of [1-13C]pyruvic acid and 15 mM AH111501 is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for Sample not for submission sterility and pyrogenicity subsequent to patient administration. 3.2.P.2.5.2 TRIS/EDTA buffer solution Not applicable The TRIS/EDTA buffer solution is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for sterility and pyrogenicity subsequent to patient administration. 3.2.P.3.1 Manufacturer(s) (Sterile Fluid Path Components) 3.2.P.3.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.1.2 TRIS/EDTA buffer solution The compounding of TRIS/EDTA buffer solution for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.2 Single Dose Compounding Formula (Sterile Fluid Path Components) 3.2.P.3.2.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 1. Table 1 Compounding formula for Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Ingredient Quantity per container [1-13C]pyruvic acid 1.44 g AH111501 sodium salt 27.7 mg" +3.2.P.2.4,31,Container Closure System (Sterile Fluid Path Components),"3.2.P.2.4.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. This container closure system is described in more detail in Section Sample not for submission 3.2.P.2.2.2 Overages (a) Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt There are no overages included in the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. (b) TRIS/EDTA buffer solution There are no overages included in the TRIS/EDTA buffer solution. 3.2.P.2.3 Manufacturing Process Development (Drug Product Kit Components) 3.2.P.2.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Terminal sterilization of the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is not possible due to degradation of [1-13C]pyruvic acid. The current process is therefore performed by aseptic processing. 3.2.P.2.3.2 TRIS/EDTA buffer solution Terminal sterilization of TRIS/EDTA buffer solution in various container closure systems has been tested, but generation of particles occurred during sterilization. This is probably caused by the high pH of the TRIS/EDTA buffer solution. The current process is therefore performed by aseptic processing. 3.2.P.2.4 Container Closure System (Sterile Fluid Path Components) 3.2.P.2.4.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.4.2 TRIS/EDTA buffer solution The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the TRIS/EDTA buffer solution. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.5 Microbiological Attributes (Sterile Fluid Path Components) 3.2.P.2.5.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Not applicable. The mixture of [1-13C]pyruvic acid and 15 mM AH111501 is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for Sample not for submission sterility and pyrogenicity subsequent to patient administration. 3.2.P.2.5.2 TRIS/EDTA buffer solution Not applicable The TRIS/EDTA buffer solution is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for sterility and pyrogenicity subsequent to patient administration. 3.2.P.3.1 Manufacturer(s) (Sterile Fluid Path Components) 3.2.P.3.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.1.2 TRIS/EDTA buffer solution The compounding of TRIS/EDTA buffer solution for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.2 Single Dose Compounding Formula (Sterile Fluid Path Components) 3.2.P.3.2.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 1. Table 1 Compounding formula for Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Ingredient Quantity per container [1-13C]pyruvic acid 1.44 g AH111501 sodium salt 27.7 mg" +3.2.P.2.5,31,Microbiological Attributes (Sterile Fluid Path Components),"3.2.P.2.5.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Not applicable. The mixture of [1-13C]pyruvic acid and 15 mM AH111501 is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for Sample not for submission 3.2.P.2.2.2 Overages (a) Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt There are no overages included in the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. (b) TRIS/EDTA buffer solution There are no overages included in the TRIS/EDTA buffer solution. 3.2.P.2.3 Manufacturing Process Development (Drug Product Kit Components) 3.2.P.2.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Terminal sterilization of the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is not possible due to degradation of [1-13C]pyruvic acid. The current process is therefore performed by aseptic processing. 3.2.P.2.3.2 TRIS/EDTA buffer solution Terminal sterilization of TRIS/EDTA buffer solution in various container closure systems has been tested, but generation of particles occurred during sterilization. This is probably caused by the high pH of the TRIS/EDTA buffer solution. The current process is therefore performed by aseptic processing. 3.2.P.2.4 Container Closure System (Sterile Fluid Path Components) 3.2.P.2.4.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.4.2 TRIS/EDTA buffer solution The compounding process for Hyperpolarized Pyruvate (13C) Injection requires a custom made container closure system, the sterile fluid path, for the TRIS/EDTA buffer solution. This container closure system is described in more detail in Section 3.2.P.7 Container Closure System. 3.2.P.2.5 Microbiological Attributes (Sterile Fluid Path Components) 3.2.P.2.5.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Not applicable. The mixture of [1-13C]pyruvic acid and 15 mM AH111501 is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for Sample not for submission sterility and pyrogenicity subsequent to patient administration. 3.2.P.2.5.2 TRIS/EDTA buffer solution Not applicable The TRIS/EDTA buffer solution is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for sterility and pyrogenicity subsequent to patient administration. 3.2.P.3.1 Manufacturer(s) (Sterile Fluid Path Components) 3.2.P.3.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.1.2 TRIS/EDTA buffer solution The compounding of TRIS/EDTA buffer solution for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.2 Single Dose Compounding Formula (Sterile Fluid Path Components) 3.2.P.3.2.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 1. Table 1 Compounding formula for Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Ingredient Quantity per container [1-13C]pyruvic acid 1.44 g AH111501 sodium salt 27.7 mg" +3.2.P.3.1,32,Manufacturer(s) (Sterile Fluid Path Components),"3.2.P.3.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.1.2 TRIS/EDTA buffer solution The compounding of TRIS/EDTA buffer solution for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 Sample not for submission sterility and pyrogenicity subsequent to patient administration. 3.2.P.2.5.2 TRIS/EDTA buffer solution Not applicable The TRIS/EDTA buffer solution is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for sterility and pyrogenicity subsequent to patient administration. 3.2.P.3.1 Manufacturer(s) (Sterile Fluid Path Components) 3.2.P.3.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.1.2 TRIS/EDTA buffer solution The compounding of TRIS/EDTA buffer solution for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.2 Single Dose Compounding Formula (Sterile Fluid Path Components) 3.2.P.3.2.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 1. Table 1 Compounding formula for Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Ingredient Quantity per container [1-13C]pyruvic acid 1.44 g AH111501 sodium salt 27.7 mg Sample not for submission 3.2.P.3.2.2 TRIS/EDTA buffer solution The product comprises an aqueous solution of TRIS, NaOH, and Na2EDTA. The product is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 2. Table 2 Compounding formula for TRIS/EDTA buffer solution 1Quantity of sterile TRIS/EDTA buffer solution aseptically instilled into receiving vessel of sterile fluid path is 18 mL. 3.2.P.3.3 Description of Manufacturing Process and Process Controls (Drug Product Kit Components) 3.2.P.3.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The preparation of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is performed in an ISO 7 area. [1-13C]Pyruvic acid and AH111501 sodium salt are weighed out and added to the preparation vessel in successive order. The solution is allowed to stir to ensure a homogenous solution prior to filtration. As the solution is transferred from the preparation vessel in an ISO 7 area to the filling vessel in an ISO 5 area, it is filtered through two 0.2 μm sterilizing filters. Filling is performed in an ISO 5 area (LAF unit). The filling weight is calibrated to target; each cryovial shall contain 1.47 g of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt, therefore, the filling weight depends on the assay of the specific batch of [1-13C]pyruvic acid used. Each container is weighed during the filling operation. The compounding process is illustrated in Figure 1." +3.2.P.3.2,32,Single Dose Compounding Formula (Sterile Fluid Path Components),"3.2.P.3.2.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 1. Table 1 Compounding formula for Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Ingredient Quantity per container [1-13C]pyruvic acid 1.44 g AH111501 sodium salt 27.7 mg Sample not for submission sterility and pyrogenicity subsequent to patient administration. 3.2.P.2.5.2 TRIS/EDTA buffer solution Not applicable The TRIS/EDTA buffer solution is compounded immediately prior to patient administration. A sample of the final Hyperpolarized Pyruvate (13C) Injection is tested for sterility and pyrogenicity subsequent to patient administration. 3.2.P.3.1 Manufacturer(s) (Sterile Fluid Path Components) 3.2.P.3.1.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.1.2 TRIS/EDTA buffer solution The compounding of TRIS/EDTA buffer solution for clinical use is conducted in accordance with compliance of USP <797> and the regulations promulgated by the California State Board of Pharmacy at the licensed pharmacy on the following academic campus: University of California, San Francisco Department of Clinical Pharmacy San Francisco, California 94118 3.2.P.3.2 Single Dose Compounding Formula (Sterile Fluid Path Components) 3.2.P.3.2.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 1. Table 1 Compounding formula for Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt Ingredient Quantity per container [1-13C]pyruvic acid 1.44 g AH111501 sodium salt 27.7 mg Sample not for submission 3.2.P.3.2.2 TRIS/EDTA buffer solution The product comprises an aqueous solution of TRIS, NaOH, and Na2EDTA. The product is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 2. Table 2 Compounding formula for TRIS/EDTA buffer solution 1Quantity of sterile TRIS/EDTA buffer solution aseptically instilled into receiving vessel of sterile fluid path is 18 mL. 3.2.P.3.3 Description of Manufacturing Process and Process Controls (Drug Product Kit Components) 3.2.P.3.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The preparation of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is performed in an ISO 7 area. [1-13C]Pyruvic acid and AH111501 sodium salt are weighed out and added to the preparation vessel in successive order. The solution is allowed to stir to ensure a homogenous solution prior to filtration. As the solution is transferred from the preparation vessel in an ISO 7 area to the filling vessel in an ISO 5 area, it is filtered through two 0.2 μm sterilizing filters. Filling is performed in an ISO 5 area (LAF unit). The filling weight is calibrated to target; each cryovial shall contain 1.47 g of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt, therefore, the filling weight depends on the assay of the specific batch of [1-13C]pyruvic acid used. Each container is weighed during the filling operation. The compounding process is illustrated in Figure 1." +3.2.P.3.3,33,Description of Manufacturing Process and Process Controls (Drug Product Kit Components),"3.2.P.3.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The preparation of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is performed in an ISO 7 area. [1-13C]Pyruvic acid and AH111501 sodium salt are weighed out and added to the preparation vessel in successive order. The solution is allowed to stir to ensure a homogenous solution prior to filtration. As the solution is transferred from the preparation vessel in an ISO 7 area to the filling vessel in an ISO 5 area, it is filtered through two 0.2 μm sterilizing filters. Filling is performed in an ISO 5 area (LAF unit). The filling weight is calibrated to target; each cryovial shall contain Sample not for submission 3.2.P.3.2.2 TRIS/EDTA buffer solution The product comprises an aqueous solution of TRIS, NaOH, and Na2EDTA. The product is compounded by aseptic processing. The compounding formula for a single dose prepared immediately prior to patient administration is given in Table 2. Table 2 Compounding formula for TRIS/EDTA buffer solution 1Quantity of sterile TRIS/EDTA buffer solution aseptically instilled into receiving vessel of sterile fluid path is 18 mL. 3.2.P.3.3 Description of Manufacturing Process and Process Controls (Drug Product Kit Components) 3.2.P.3.3.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The preparation of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is performed in an ISO 7 area. [1-13C]Pyruvic acid and AH111501 sodium salt are weighed out and added to the preparation vessel in successive order. The solution is allowed to stir to ensure a homogenous solution prior to filtration. As the solution is transferred from the preparation vessel in an ISO 7 area to the filling vessel in an ISO 5 area, it is filtered through two 0.2 μm sterilizing filters. Filling is performed in an ISO 5 area (LAF unit). The filling weight is calibrated to target; each cryovial shall contain 1.47 g of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt, therefore, the filling weight depends on the assay of the specific batch of [1-13C]pyruvic acid used. Each container is weighed during the filling operation. The compounding process is illustrated in Figure 1. Sample not for submission Figure 1 Flow chart illustrating the manufacturing process of Mixture of [1- 13C]pyruvic acid and 15 mM AH111501 sodium salt 3.2.P.3.3.2 TRIS/EDTA buffer solution The preparation of TRIS/EDTA buffer solution is performed in an ISO 7 area. Approximately 90% of the total amount of WFI is added to the preparation vessel. TRIS, Na EDTA and NaOH are added successively, allowing each one to dissolve completely by 2 sufficiently stirring between each addition. The bulk solution is adjusted to its final weight by addition of WFI and allowed to stir to ensure a homogenous solution prior to filtration. As the solution is transferred from the preparation vessel in an ISO 7 area to the filling vessel in a ISO 5 area, it is filtered through two 0.2 μm sterilizing filters. Aseptic filling of the TRIS/EDTA buffer solution into the receiving vessel of the sterile fluid path is performed in an ISO 5 area (LAF unit). Weight controls are taken regularly during filling to assure acceptable fill volume for the whole batch." +3.2.P.3.4,36,Controls of Critical Steps and Intermediates (Sterile Fluid Path Components),"3.2.P.3.4.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt A schematic representation of the process flow and the in-process controls is presented in Figure 1, Section 3.2.P.3.3.1. In addition, environmental monitoring (microbiological and non-viable particles) of the production area is performed. 3.2.P.3.4.2 TRIS/EDTA buffer solution A schematic representation of the process flow and the in-process controls is presented in Figure 2, Section 3.2.P.3.3.2. In addition, environmental monitoring (microbiological and non-viable particles) of the production area is performed." +3.2.P.3.5,36,Process Validation and/or Evaluation (Sterile Fluid Path Components),3.2.P.3.5.1 Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt The aseptic compounding process has been validated by simulation of the aseptic process using a microbial nutrient medium. No growth has been observed in any of the media fill batches. Monitoring of the clean room and personnel is carried out and controlled on a routine basis to assure an environment suitable for aseptic processing. 3.2.P.3.5.2 TRIS/EDTA buffer solution The aseptic compounding process has been validated by simulation of the aseptic process using a microbial nutrient medium. No growth has been observed in any of the media fill batches. Monitoring of the clean room and personnel is carried out and controlled on a routine basis to assure an environment suitable for aseptic processing. +3.2.P.6,39,Reference Standards or Materials (Sterile Fluid Path Components),"Sample not for submission automated compounding device, SpinLab, only if the procedures for aseptic compounding the solution are satisfied. The project and utility of SpinLab for automatic compounding of the Hyperpolarized pyruvate (13C) injection drug product is in early development and preliminary specifications may be developed and evaluated as this project continues in the development phase. Considering the early stage of the project and only single doses are compounded immediately prior to patient administration by licensed pharmacy personnel, the specifications are considered justified. 3.2.P.5.6.2 TRIS/EDTA buffer solution A sterile fluid path containing the TRIS/EDTA buffer solution, which is prepared immediately prior to patient administration, will be released by a licensed pharmacist for compounding by the automated compounding device, SpinLab, only if the procedures for aseptic compounding the solution are satisfied. The project and utility of SpinLab for automatic compounding of the Hyperpolarized pyruvate (13C) injection drug product is in early development and preliminary specifications may be developed and evaluated as this project continues in the development phase. Considering the early stage of the project and only single doses are compounded immediately prior to patient administration by licensed pharmacy personnel, the specifications are considered justified. 3.2.P.6 Reference Standards or Materials (Sterile Fluid Path Components) Not applicable. 3.2.P.7 Container Closure System (Sterile Fluid Path) The fluid path system is a single, sterile drug product container, container closure system that provides for rapid and complete dissolution of a frozen hyperpolarized drug product and transports the resulting hyperpolarized drug product solution from its initial location within a polarizer system to a final sterile Medrad syringe outside the polarizer system for clinical administration—injection into a patient. The empty sterile fluid path (Figure 1A) is provided in a double bag plastic tray with a lid of the following approximate size: 60 cm (L) x 35.6 cm (width) x 10.2 cm (depth)/unit or 23.6 inch (L) x 14.0 inch (width) x 4.0 inch (depth)/unit The empty sterile fluid path is designed to be a single-use drug product container, container closure system which upon arrival to a licensed pharmacy, can be aseptically manipulated so that it can be charged with the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt, TRIS/EDTA buffer, and sterile water for injection. The key components of the empty sterile fluid path system that are in contact with the drug product are composed of USP Plastic Class V as follows: - A Radel R Plastic sample vial which will serve to contain a mixture of the drug substance [1- 13C] Pyruvic Acid and the Electron Paramagnetic Agent (EPA) excipient, tris(8-carboxyl- 2,2,6,6-tetra(2-( l-methoxy-2,2-d2-ethyl) )-benzo[ l ,2-d:4,5-d']bis(dithiole-4-yl)methyl sodium Sample not for submission salt, AH111501 Sodium Salt. - A Radel R Plastic syringe which will serve to contain sterile water for injection. - A Radel R Plastic receiver vessel which will serve to contain an aqueous solution of (hydroxymethyl)aminomethane (TRIS), disodium ethylenediaminetretraacetate (Na EDTA) 2 and sodium hydroxide (NaOH). - A Radel R plastic casing containing the EPA ultrahigh molecular weight polyethylene filters. - A Tygon plastic tubing connecting the Receiver vessel to the Sterile filter. - The rest of the assembly is composed of Radel R Plastic co-axial and transfer tubes and Udel plastic valves. The Dynamic seal is designed and integrated into the empty sterile fluid path however it is not in contact with the drug product. The QC appendage is designed and integrated into the empty sterile fluid path, however it is not in contact with the drug product as an aliquot of the drug product is transferred to the QC appendage. Commercially available SSQK 65/115VS Syringe Kits (Bayer Inc., USA) containing a sterile 65 mL Qwik-Fit Syringe which is aseptically added to the sterile empty fluid path for collection of the final drug product, Hyperpolarized Pyruvate (13C) Injection, will not be addressed here and is depicted as Administration syringe. Figure 1A Depiction of empty sterile fluid path in packaging. Figure 1B Basic anatomy of an empty sterile fluid path." +3.2.P.7,39,Container Closure System (Sterile Fluid Path),"The fluid path system is a single, sterile drug product container, container closure system that provides for rapid and complete dissolution of a frozen hyperpolarized drug product and transports the resulting hyperpolarized drug product solution from its initial location within a polarizer system to a final sterile Medrad syringe outside the polarizer system for clinical administration—injection into a patient. The empty sterile fluid path (Figure 1A) is provided in a double bag plastic tray with a lid of the following approximate size: 60 cm (L) x 35.6 cm (width) x 10.2 cm (depth)/unit or Sample not for submission automated compounding device, SpinLab, only if the procedures for aseptic compounding the solution are satisfied. The project and utility of SpinLab for automatic compounding of the Hyperpolarized pyruvate (13C) injection drug product is in early development and preliminary specifications may be developed and evaluated as this project continues in the development phase. Considering the early stage of the project and only single doses are compounded immediately prior to patient administration by licensed pharmacy personnel, the specifications are considered justified. 3.2.P.5.6.2 TRIS/EDTA buffer solution A sterile fluid path containing the TRIS/EDTA buffer solution, which is prepared immediately prior to patient administration, will be released by a licensed pharmacist for compounding by the automated compounding device, SpinLab, only if the procedures for aseptic compounding the solution are satisfied. The project and utility of SpinLab for automatic compounding of the Hyperpolarized pyruvate (13C) injection drug product is in early development and preliminary specifications may be developed and evaluated as this project continues in the development phase. Considering the early stage of the project and only single doses are compounded immediately prior to patient administration by licensed pharmacy personnel, the specifications are considered justified. 3.2.P.6 Reference Standards or Materials (Sterile Fluid Path Components) Not applicable. 3.2.P.7 Container Closure System (Sterile Fluid Path) The fluid path system is a single, sterile drug product container, container closure system that provides for rapid and complete dissolution of a frozen hyperpolarized drug product and transports the resulting hyperpolarized drug product solution from its initial location within a polarizer system to a final sterile Medrad syringe outside the polarizer system for clinical administration—injection into a patient. The empty sterile fluid path (Figure 1A) is provided in a double bag plastic tray with a lid of the following approximate size: 60 cm (L) x 35.6 cm (width) x 10.2 cm (depth)/unit or 23.6 inch (L) x 14.0 inch (width) x 4.0 inch (depth)/unit The empty sterile fluid path is designed to be a single-use drug product container, container closure system which upon arrival to a licensed pharmacy, can be aseptically manipulated so that it can be charged with the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt, TRIS/EDTA buffer, and sterile water for injection. The key components of the empty sterile fluid path system that are in contact with the drug product are composed of USP Plastic Class V as follows: - A Radel R Plastic sample vial which will serve to contain a mixture of the drug substance [1- 13C] Pyruvic Acid and the Electron Paramagnetic Agent (EPA) excipient, tris(8-carboxyl- 2,2,6,6-tetra(2-( l-methoxy-2,2-d2-ethyl) )-benzo[ l ,2-d:4,5-d']bis(dithiole-4-yl)methyl sodium Sample not for submission salt, AH111501 Sodium Salt. - A Radel R Plastic syringe which will serve to contain sterile water for injection. - A Radel R Plastic receiver vessel which will serve to contain an aqueous solution of (hydroxymethyl)aminomethane (TRIS), disodium ethylenediaminetretraacetate (Na EDTA) 2 and sodium hydroxide (NaOH). - A Radel R plastic casing containing the EPA ultrahigh molecular weight polyethylene filters. - A Tygon plastic tubing connecting the Receiver vessel to the Sterile filter. - The rest of the assembly is composed of Radel R Plastic co-axial and transfer tubes and Udel plastic valves. The Dynamic seal is designed and integrated into the empty sterile fluid path however it is not in contact with the drug product. The QC appendage is designed and integrated into the empty sterile fluid path, however it is not in contact with the drug product as an aliquot of the drug product is transferred to the QC appendage. Commercially available SSQK 65/115VS Syringe Kits (Bayer Inc., USA) containing a sterile 65 mL Qwik-Fit Syringe which is aseptically added to the sterile empty fluid path for collection of the final drug product, Hyperpolarized Pyruvate (13C) Injection, will not be addressed here and is depicted as Administration syringe. Figure 1A Depiction of empty sterile fluid path in packaging. Figure 1B Basic anatomy of an empty sterile fluid path." +7.3,41,Drug Product Part 2,"3.2.P DRUG PRODUCT (HYPERPOLARIZED PYRUVATE [13C] INJECTION) Hyperpolarized Pyruvate (13C) Injection (drug product) is a sterile solution for intravenous injection. Compounding the Hyperpolarized Pyruvate (13C) Injection requires the following drug product components: • Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt • TRIS/EDTA buffer solution • Sterile Water for Injection (WFI) Hyperpolarized Pyruvate (13C) Injection is compounded at the clinical site utilizing an automated compounding device, known as SpinLab, according to USP <797> Pharmaceutical Compounding – Sterile Preparations, just prior to administration. For each patient does, SpinLab utilizes a single sterile fluid path that is composed of a cryovial which contains the mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. The cryovial is lowered into the polarizer and polarized for up to 120 minutes at a temperature of approximately 0.8 K. After polarization, the mixture of [1- 13C]pyruvic acid and 15 mM AH111501 sodium salt is flushed out of the cryovial with heated and pressurized sterile WFI within the sterile fluid path then passed through a mechanical filter for removal of AH111501, then emptied into a receiver vessel containing sterile WFI and TRIS/EDTA buffer solution. A sample of solution from the receiver vessel is automatically extracted for testing by an automated quality control instrument (QC System). While the QC System processes the solution sample, the remaining solution in the receiver vessel is passed through a sterilizing filter (0.2 μm) and then enters the final drug product container for patient administration, a 65 mL MedRad syringe. Based on the results from the QC System, the final release authorization for administration to humans will be performed by a licensed pharmacist. The following 3.2.P sections describe the Hyperpolarized Pyruvate (13C) Injection. For aspects related to the drug product components required for compounding the Hyperpolarized Pyruvate (13C) Injection, reference is made to section 3.2.P for Drug Product Kit Components." +3.2.P.2.1,42,Components of the Drug Product (Hyperpolarized Pyruvate (13C) Injection),"The drug substance, [1−13C]pyruvic acid, is a colorless to yellow, clear, viscous liquid. [1−13C]Pyruvic acid is described in Section 3.2.S Drug Substance. After neutralization in the TRIS/EDTA buffer solution, the [1−13C]pyruvic acid is present as [1−13C]pyruvate. 3.2.P.2.1.2 Excipients AH111501 sodium salt is a stable trityl radical, and is added to [1−13C]pyruvic acid to enable hyperpolarization. After hyperpolarization and compounding, the solution is passed through a filter to remove the AH111501 from the drug product. AH111501 sodium salt is a green to black, fine to granular powder. AH111501 sodium salt is further described in Section 3.2.A.3 Novel Excipients. The TRIS/EDTA buffer solution is an aqueous solution containing 333 mM TRIS, 600 mM NaOH and 333 mg/l Na EDTA. 2 TRIS is added as a buffer to stabilize the pH of the Hyperpolarized Pyruvate (13C) Injection at a physiologically acceptable level. NaOH is added to neutralize the [1−13C]pyruvic acid in the Mixture of [1−13C]pyruvic acid and 15 mM AH111501 sodium salt to [1−13C]pyruvate in the Hyperpolarized Pyruvate (13C) Injection. Na EDTA has been included in the formulation as a chelating agent to capture traces of 2 Sample not for submission Table 1 Composition of Hyperpolarized Pyruvate (13C) Injection Hyperpolarized Pyruvate (13C) Injection is supplied via a sterile disposable Medrad Qwik-Fit Syringe® for contrast media with a fill volume of 65 mL. 3.2P.2.1 Components of the Drug Product (Hyperpolarized Pyruvate (13C) Injection) 3.2.P.2.1.1 Drug substance The drug substance, [1−13C]pyruvic acid, is a colorless to yellow, clear, viscous liquid. [1−13C]Pyruvic acid is described in Section 3.2.S Drug Substance. After neutralization in the TRIS/EDTA buffer solution, the [1−13C]pyruvic acid is present as [1−13C]pyruvate. 3.2.P.2.1.2 Excipients AH111501 sodium salt is a stable trityl radical, and is added to [1−13C]pyruvic acid to enable hyperpolarization. After hyperpolarization and compounding, the solution is passed through a filter to remove the AH111501 from the drug product. AH111501 sodium salt is a green to black, fine to granular powder. AH111501 sodium salt is further described in Section 3.2.A.3 Novel Excipients. The TRIS/EDTA buffer solution is an aqueous solution containing 333 mM TRIS, 600 mM NaOH and 333 mg/l Na EDTA. 2 TRIS is added as a buffer to stabilize the pH of the Hyperpolarized Pyruvate (13C) Injection at a physiologically acceptable level. NaOH is added to neutralize the [1−13C]pyruvic acid in the Mixture of [1−13C]pyruvic acid and 15 mM AH111501 sodium salt to [1−13C]pyruvate in the Hyperpolarized Pyruvate (13C) Injection. Na EDTA has been included in the formulation as a chelating agent to capture traces of 2" +3.2.P.2.2,43,Drug Product (Hyperpolarized Pyruvate (13C) Injection),"3.2.P.2.2.1 Formulation Development The drug product kit components used for compounding of Hyperpolarized Pyruvate (13C) Injection in the polarizer differ slightly from the components used for pre-clinical studies and clinical studies GE-101-001 and GE-101-003 (see Section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info). These differences are explained in the following sections and are summarized in Table 1. (a) Pyruvic acid and [1-13C]pyruvic acid Drug product used for clinical studies GE-101-001 and GE-101-003 was not hyperpolarized. As the need for 13C enriched material was not present, the drug substance used was pyruvic acid, whereas the drug substance used for compounding of Hyperpolarized Pyruvate (13C) Injection is [1-13C]pyruvic acid. Some pre-clinical safety studies were also conducted using pyruvic acid (see Section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info). (b) Content of AH111501 sodium salt AH111501 is removed during compounding of Hyperpolarized Pyruvate (13C) Injection, and the content of this excipient in the final drug product is NMT 3.0 μM. To mimic this situation for clinical studies GE-101-001 and GE-101-003, 0.2 mM AH111501 sodium salt was added to the pyruvic acid in order to obtain 3.0 μM AH111501 in the Pyruvate Injection. For most of the pre- clinical studies (see Section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info) and for compounding of Hyperpolarized Pyruvate (13C) Injection, 15 mM AH111501 sodium salt is added to the [1-13C]pyruvic acid. (c) Content of [1-13C]pyruvate in drug product The drug product was initially formulated to contain 500 mM [1-13C]pyruvate. For this formulation, a Mixture of [1-13C]pyruvic acid and AH111501 sodium salt, containing 2.23 g [1- 13C]pyruvic acid, was dissolved in 50 ml TRIS/EDTA dissolution medium, containing 360 mM NaOH, 200 mM TRIS and 100 mg/l Na EDTA. Because pre-clinical studies using this 2 formulation revealed cardiovascular effects (see Sections 8.2.4.3 Effects on the Cardiovascular Systems (CVS) in the Pentobarbital/Fentanyl Anesthetized Dog, subsections a and b, for Item 8 Pharmacology and Toxicology Info) the product was later reformulated to contain 250 mM [1- 13C]pyruvate. For this formulation, Mixture of [1-13C]pyruvic acid and AH111501 sodium salt, containing 2.23 g [1-13C]pyruvic acid, was dissolved in 100 ml TRIS/EDTA dissolution medium, containing 180 mM NaOH, 100 mM TRIS and 100 mg/l Na EDTA. Some pre- clinical studies 2 were performed with the formulation targeted 500 mM [1-13C]pyruvate. For most pre-clinical (see Section 8.1 Introduction for Item 8 Pharmacology and Toxicology Info) and all clinical studies, the Pyruvate (13C) Injection is targeted to contain 250 mM [1-13C]pyruvate. (d) TRIS/EDTA dissolution medium and TRIS/EDTA buffer solution For clinical studies GE-101-001 and GE-101-003 and pre-clinical studies, the Mixture of pyruvic acid and AH111501 sodium salt was dissolved in TRIS/EDTA dissolution medium in a single step by manual dissolution (see section 3.2.P.2.3 Manufacturing Process Development). The" +3.2.P.2.3,45,Manufacturing Process Development (Hyperpolarized Pyruvate (13C) Injection),"The procedure for compounding of Hyperpolarized Pyruvate (13C) Injection was not used for pre- clinical studies or clinical studies GE-101-001 and GE-101-003. The Pyruvate (13C) Injection for these studies was prepared by manual mixing of the drug product kit components as described in the following section. Prior to mixing the drug product kit components, the components were allowed to reach ambient room temperature. 100 mL of TRIS/EDTA dissolution medium was then added to the vial containing the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt. Immediately, the vial was vigorously shaken for at least 30 seconds to assure homogeneity. The vial was then heated in a 80°C water bath for 10 minutes and cooled in cold tap water for 5 minutes. The vial is then stored in a 37°C water bath for a maximum of 4 hours before use." +3.2.P.3.3,46,Description of Manufacturing Process and Process Controls (Hyperpolarized Pyruvate (13C) Injection),"Hyperpolarized Pyruvate (13C) Injection is compounded at the clinical site prior to administration. For compounding, the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is hyperpolarized by Dynamic Nuclear Polarization (DNP) for approximately 60 minutes at 1.2 K. After hyperpolarization, the Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt is dissolved in sterile WFI and the AH111501 is then removed from the hyperpolarized solution by mechanical filtration. The hyperpolarized solution is then neutralized, buffered and diluted in TRIS/EDTA buffer solution and subsequently passed through a sterilizing filter (0.2 μm) into the final drug product container, an empty sterile disposable Medrad syringe. The SpinLab system used for hyperpolarization and compounding is located in an area adjacent to the MR scanner room. All compounding process steps are in accordance with USP <797> Pharmaceutical Compounding – Sterile Preparations. Immediately after compounding, the Hyperpolarized Pyruvate (13C) Injection is sampled and tested by an automatic quality control instrument (QC System). The final release authorization for administration to humans will be performed by a licensed pharmacist. After release, the Hyperpolarized Pyruvate (13C) Injection will be delivered to the adjoining MR scanner room for patient administration. 3.2.P.3.3.1 Process description (a) Compounding and filling of empty sterile fluid path All process steps for compounding and filling of the empty sterile fluid path (SFP) used in SpinLab to prepare the hyperpolarized (13C) pyruvate injection are performed within a cleanroom ISO 5 area. A clean, sterile, empty SFP is aseptically removed from its packaging and placed into the ISO 5 area. The sterile water for injection (38 g) is aseptically introduced into the dissolution syringe of the SFP and another 18.5 g of sterile water for injection is aseptically introduced into the receiving vessel. The sterile TRIS/EDTA buffer (18 g) is aseptically introduced into the receiving vessel. A mixture of [1-13C]pyruvic acid and 15 mM AH111501 is prepared and then sterilized using a sterilizing filter (0.2 µm) and 1.47 g of the sterile solution is placed into the sterile cryovial. The cyrovial containing the sterile mixture of [1-13C]pyruvic acid and 15 mM AH111501 is then attached to the empty sterile fluid path and sealed utilizing a laser welder. The cryovial is then" +3.2.P.3.4,50,Controls of Critical Steps and Intermediates (Hyperpolarized Pyruvate (13C) Injection),"A schematic representation of the process flow and the in-process controls is presented in Figure 1 in Section 3.2.P.3.3.1 (a). The polarizer software application monitors and controls critical system and process functions and settings such as data communication and temperature settings. Malfunctions or settings detected to be outside pre-set ranges are communicated to the operators via software-generated alarms that prevent further processing. Control of the mechanical functionality of process hardware, such as valves and fittings, and of the He driving pressure, is performed manually by the operator. The final release analyses performed by the QC System ensure that the compounding process has executed as intended and that the Hyperpolarized Pyruvate (13C) Injection is within specifications (see Table 1 in Section 3.2.P.5.1). The post-administration integrity test of the sterilizing filter assesses whether the filter was functional during use." +3.2.P.3.5,50,Process Validation and/or Evaluation (Hyperpolarized Pyruvate (13C) Injection),"3.2.P.3.5.1 IQ/OQ/PQ Program The clean room, polarizer, process equipment and QC system have gone through an extensive IQ/OQ/PQ program prior to use during clinical trials. The clean room and equipment were found to be suitable for their intended use. 3.2.P.3.5.2 Microbiological aspects The compounding process has been validated by simulation of the process using a microbial nutrient medium. No growth has been observed in any of the media fill batches. The microbiological quality has also been demonstrated by sterility and microbial endotoxin testing of repeated runs (n=6). 3.2.P.3.5.3 Compounding process consistency The consistency of the compounding process has been evaluated by repeated (n=10) compounding of Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt and TRIS/EDTA buffer solution. All batches were within specifications (see Section 3.2.P.5.4 Batch Analyses for details). As the QC System only determines a limited set of quality parameters, all batches were also analyzed for related substances of [1-13C]pyruvate and TRIS, assay of [1-13C]pyruvate, osmolality and particulate contamination. Because of limited analytical capability at the site of compounding (UCSF), samples were shipped to GE Healthcare and analyses were performed 8 to 34 days after compounding. The formation of AH112615 after compounding (see Section 3.2.P.5.5 Characterization of Impurities) causes a decrease in the assay of [1-13C]pyruvate. Because of this" +3.2.P.4,51,Control of Excipients (Hyperpolarized Pyruvate (13C) Injection),"There are no excipients added during compounding of Hyperpolarized Pyruvate (13C) Injection. All excipients in the drug product are attributed to the drug product kit components; Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium and TRIS/EDTA buffer solution. Excipients in the drug product kit components are discussed in Section 3.2.P.4 Control of Excipients for Drug Product Kit Components and Section 3.2.A.3 Novel Excipients. Sample not for submission effect, and as the HPLC method does not detect AH112615, for this study determination of [1- 13C]pyruvate content was performed by quantitative 1H NMR analysis using acetate as an internal standard for calibration. It has been shown that the content of AH112615 immediately after compounding is negligible (see Section 3.2.P.5.5 Characterization of Impurities). To determine the content of [1-13C]pyruvate at time of compounding at UCSF, the AH112625 peak was therefore integrated as [1-13C]pyruvate. The formation of AH112615 also causes a decrease in osmolality with time after compounding. For this study the osmolality at time of compounding at UCSF was therefore calculated from the measured osmolality and the content of AH112615 (determined by 1H NMR) at time of analysis. Results from these analyses are stated in Table 1. As can be seen from these results, the assay of [1-13C]pyruvate in (Hyperpolarized Pyruvate (13C) Injection) varied in the range of 222-252 mM with an average of 241 ± 12 mM. Although the observed assay displays a larger variance than would be expected from the drug product kit components used the results are considered to demonstrate an acceptable process consistency. It should be noted that even though the QC system does not determine the assay of [1-13C]pyruvate, the determination of pH constitute a relevant indirect control of this parameter. The level of control obtained through the determination of pH has been investigated in a study where a sample of Mixture of [1- 13C]pyruvic acid and 15 mM AH111501 sodium salt was titrated with TRIS/EDTA buffer solution diluted in sterile WFI. Results from this study are shown in Figure 1. As can be seen from Figure 1 the pH of the solution is a well defined function of the [1- 13C]pyruvate concentration. As expected, the pKa of TRIS is observed at approximately 8.1 and the depletion of buffer capacity towards the acidic range is observed at approximately 280 mM. Estimated from the observed relationship, the specification to pH (6.7 to 8.0) is equivalent to approximately 210 to 270 mM Pyruvate. With regards to the efficacy of the drug product, the 13C NMR determined by the QC system is proportional to the concentration of [1-13C]pyruvate (see Section 3.2.P.5.2.1 Analytical Procedures). As the 13C nuclear polarization reported by the QC system assumes a fixed concentration of [1-13C]pyruvate, it varies linearly with the actual concentration of [1- 13C]pyruvate. Hence, this parameter represents a more relevant assurance of product efficacy than the assay of [1-13C]pyruvate alone. Osmolality varied in the range of 484-513 mOsm/kg with an average of 501 ± 12 mOsm/kg. Particulate contamination was well within the pharmacopoeia limits for all batches. The purity profile observed during this study was as expected from the purity profile of the drug product kit components. No new impurities were observed. With regards to the purity profile of Hyperpolarized Pyruvate (13C) Injection at time of compounding at UCSF, reference is made to 3.2.P.5.5 Characterization of Impurities. 3.2.P.4 Control of Excipients (Hyperpolarized Pyruvate (13C) Injection) There are no excipients added during compounding of Hyperpolarized Pyruvate (13C) Injection. All excipients in the drug product are attributed to the drug product kit components; Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium and TRIS/EDTA buffer solution. Excipients in the drug product kit components are discussed in Section 3.2.P.4 Control of Excipients for Drug Product Kit Components and Section 3.2.A.3 Novel Excipients. 3.2.P.4.1 Specification (Hyperpolarized Pyruvate (13C) Injection) Not applicable." +3.2.P.4.1,51,Specification (Hyperpolarized Pyruvate (13C) Injection),"Sample not for submission effect, and as the HPLC method does not detect AH112615, for this study determination of [1- 13C]pyruvate content was performed by quantitative 1H NMR analysis using acetate as an internal standard for calibration. It has been shown that the content of AH112615 immediately after compounding is negligible (see Section 3.2.P.5.5 Characterization of Impurities). To determine the content of [1-13C]pyruvate at time of compounding at UCSF, the AH112625 peak was therefore integrated as [1-13C]pyruvate. The formation of AH112615 also causes a decrease in osmolality with time after compounding. For this study the osmolality at time of compounding at UCSF was therefore calculated from the measured osmolality and the content of AH112615 (determined by 1H NMR) at time of analysis. Results from these analyses are stated in Table 1. As can be seen from these results, the assay of [1-13C]pyruvate in (Hyperpolarized Pyruvate (13C) Injection) varied in the range of 222-252 mM with an average of 241 ± 12 mM. Although the observed assay displays a larger variance than would be expected from the drug product kit components used the results are considered to demonstrate an acceptable process consistency. It should be noted that even though the QC system does not determine the assay of [1-13C]pyruvate, the determination of pH constitute a relevant indirect control of this parameter. The level of control obtained through the determination of pH has been investigated in a study where a sample of Mixture of [1- 13C]pyruvic acid and 15 mM AH111501 sodium salt was titrated with TRIS/EDTA buffer solution diluted in sterile WFI. Results from this study are shown in Figure 1. As can be seen from Figure 1 the pH of the solution is a well defined function of the [1- 13C]pyruvate concentration. As expected, the pKa of TRIS is observed at approximately 8.1 and the depletion of buffer capacity towards the acidic range is observed at approximately 280 mM. Estimated from the observed relationship, the specification to pH (6.7 to 8.0) is equivalent to approximately 210 to 270 mM Pyruvate. With regards to the efficacy of the drug product, the 13C NMR determined by the QC system is proportional to the concentration of [1-13C]pyruvate (see Section 3.2.P.5.2.1 Analytical Procedures). As the 13C nuclear polarization reported by the QC system assumes a fixed concentration of [1-13C]pyruvate, it varies linearly with the actual concentration of [1- 13C]pyruvate. Hence, this parameter represents a more relevant assurance of product efficacy than the assay of [1-13C]pyruvate alone. Osmolality varied in the range of 484-513 mOsm/kg with an average of 501 ± 12 mOsm/kg. Particulate contamination was well within the pharmacopoeia limits for all batches. The purity profile observed during this study was as expected from the purity profile of the drug product kit components. No new impurities were observed. With regards to the purity profile of Hyperpolarized Pyruvate (13C) Injection at time of compounding at UCSF, reference is made to 3.2.P.5.5 Characterization of Impurities. 3.2.P.4 Control of Excipients (Hyperpolarized Pyruvate (13C) Injection) There are no excipients added during compounding of Hyperpolarized Pyruvate (13C) Injection. All excipients in the drug product are attributed to the drug product kit components; Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium and TRIS/EDTA buffer solution. Excipients in the drug product kit components are discussed in Section 3.2.P.4 Control of Excipients for Drug Product Kit Components and Section 3.2.A.3 Novel Excipients. 3.2.P.4.1 Specification (Hyperpolarized Pyruvate (13C) Injection) Not applicable." +3.2.P.5.2,53,Analytical Procedures (Hyperpolarized Pyruvate (13C) Injection),"The 13C nuclear polarization of Hyperpolarized Pyruvate (13C) Injection undergoes an exponential decay with a time constant of approximately 69 seconds. In order to preserve an acceptable imaging efficacy, the time between start dissolution and start of administration to the patient has to be NMT 50s (see Section 3.2.P.8.1 Stability Summary and Conclusions). Due to this limited user window, analyses to control Hyperpolarized Pyruvate (13C) Injection are performed using an automated analytical system (QC System) that determines a limited set of parameters within a time span of approximately 10s. This QC System is specifically developed for the analysis of Hyperpolarized Pyruvate (13C) Injection immediately prior to administration to the patients. The QC accessory participates in the dissolution process by managing the state of the sterile fluid path. Specifically, the QC accessory controls the upper slide valve, which is used to isolate the receiver from the EPA filter, and the lower slide valve which controls fluid flow to the cuvettes and the Administration syringe. After a dissolution is complete, the QC accessory closes the upper slide valve, measures the temperature of the receiver, and opens the lower slide valve to allow the mixed solution to be drawn out. Once the cuvettes and NMR bulb are filled, the QC measures the pyruvate concentration, EPA concentration, and pH. The percent polarization is also measured. Once the Administration syringe is filled, the QC checks that the volume is above the level of a threshold sensor. All measurement results are reported to the Hyperpolarizer, where they are interpreted and displayed on the screen for an operator to decide how to proceed. After completed analysis, the software performs a comparison of the results to a pre-set list of specifications (see Table 1 in Section 3.2.P.5.1 Specifications) and reports the compliance or non-compliance of the Hyperpolarized Pyruvate (13C) Injection to the specifications. Control of additional parameters is assured through testing performed on the combination of the drug product kit components; Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt dissolved in WFI and TRIS/EDTA buffer solution (Pyruvate (13C) Injection) (see Section" +3.2.P.5.3,55,Validation of Analytical Procedures (Hyperpolarized Pyruvate (13C) Injection),"The analytical procedures are appropriately validated for the current development phase and are suitable for intended use. The validation performed at this stage is summarized in Table 1. Table 1 Validation of analytical procedures performed at this stage Sample not for submission 3.2.P.5.2.5 Volume The volume measurement in the QC is a threshold measurement performed at the Administration syringe after the fluid movement is complete. This measurement is a capacitive measurement that was tuned by the manufacturer during system setup. The sensor was also tuned to have its threshold centered at a nominal volume of 38 mL. 3.2.P.5.3 Validation of Analytical Procedures (Hyperpolarized Pyruvate (13C) Injection) The analytical procedures are appropriately validated for the current development phase and are suitable for intended use. The validation performed at this stage is summarized in Table 1. Table 1 Validation of analytical procedures performed at this stage 3.2.P.5.4 Batch Analyses (Hyperpolarized Pyruvate (13C) Injection) Hyperpolarized Pyruvate (13C) Injection has not been used for pre-clinical studies or clinical studies GE-101-001 and GE-101-003. For the clinical studies, non-polarized Pyruvate Injection has been used. Different formulations of Pyruvate (13C) Injection have been used during non- clinical studies and clinical studies GE-101-001 and GE-101-003. The drug product kit components used for compounding of Hyperpolarized Pyruvate (13C) Injection have been formulated such that the drug product is equivalent to the drug product used for clinical studies GE-101-001 and GE-101-003, as discussed in Section 3.2.P.2.2.1 Formulation Development. Results for batches of Hyperpolarized Pyruvate (13C) Injection are presented in Table 1 and Table 2. Table 1 Batch data for Hyperpolarized Pyruvate (13C) Injection" +3.2.P.5.4,55,Batch Analyses (Hyperpolarized Pyruvate (13C) Injection),"Hyperpolarized Pyruvate (13C) Injection has not been used for pre-clinical studies or clinical studies GE-101-001 and GE-101-003. For the clinical studies, non-polarized Pyruvate Injection has been used. Different formulations of Pyruvate (13C) Injection have been used during non- clinical studies and clinical studies GE-101-001 and GE-101-003. The drug product kit components used for compounding of Hyperpolarized Pyruvate (13C) Injection have been formulated such that the drug product is equivalent to the drug product used for clinical studies GE-101-001 and GE-101-003, as discussed in Section 3.2.P.2.2.1 Formulation Development. Results for batches of Hyperpolarized Pyruvate (13C) Injection are presented in Table 1 and Table 2. Table 1 Batch data for Hyperpolarized Pyruvate (13C) Injection Sample not for submission 3.2.P.5.2.5 Volume The volume measurement in the QC is a threshold measurement performed at the Administration syringe after the fluid movement is complete. This measurement is a capacitive measurement that was tuned by the manufacturer during system setup. The sensor was also tuned to have its threshold centered at a nominal volume of 38 mL. 3.2.P.5.3 Validation of Analytical Procedures (Hyperpolarized Pyruvate (13C) Injection) The analytical procedures are appropriately validated for the current development phase and are suitable for intended use. The validation performed at this stage is summarized in Table 1. Table 1 Validation of analytical procedures performed at this stage 3.2.P.5.4 Batch Analyses (Hyperpolarized Pyruvate (13C) Injection) Hyperpolarized Pyruvate (13C) Injection has not been used for pre-clinical studies or clinical studies GE-101-001 and GE-101-003. For the clinical studies, non-polarized Pyruvate Injection has been used. Different formulations of Pyruvate (13C) Injection have been used during non- clinical studies and clinical studies GE-101-001 and GE-101-003. The drug product kit components used for compounding of Hyperpolarized Pyruvate (13C) Injection have been formulated such that the drug product is equivalent to the drug product used for clinical studies GE-101-001 and GE-101-003, as discussed in Section 3.2.P.2.2.1 Formulation Development. Results for batches of Hyperpolarized Pyruvate (13C) Injection are presented in Table 1 and Table 2. Table 1 Batch data for Hyperpolarized Pyruvate (13C) Injection" +3.2.P.5.5,56,Characterization of Impurities (Hyperpolarized Pyruvate (13C) Injection),"A determination of the impurities in Hyperpolarized Pyruvate (13C) Injection is not part of the analyses performed by the QC System. Hence, documentation and control of the impurities in the drug product rests on analyses performed during the release testing of Pyruvate (13C) Injection (see Section 3.2.P.5.5 Characterization of Impurities for Drug Product Kit Components) and the results from process verification studies. 3.2.P.5.5.1 Differences in dissolution procedures The manual procedure for the compounding of Pyruvate (13C) Injection during preparation of samples for analysis is identical to the procedure used during pre-clinical safety studies and clinical studies GE-101-001 and GE-101-003. The dissolution process during compounding of Hyperpolarized Pyruvate (13C) Injection is different from the manual procedure, particularly with regards to parameters such as time, temperature, flow rates and pressure. These differences influence the purity profile such that the impurities in manually dissolved Pyruvate (13C) Injection, to some extent, are different from those in Hyperpolarized Pyruvate (13C) Injection. These effects and the purity profile of Hyperpolarized Pyruvate (13C) Injection are discussed in Sections 3.2.P.5.5.2, 3.2.P.5.5.3 and 3.2.P.5.5.4. 3.2.P.5.5.2 Transformation between AH113462 and AH112623 During and after manual compounding of Pyruvate (13C) Injection, the major impurity in the drug substance, AH113462/E, transforms through AH113462/K to AH112623 (see Section 3.2.P.5.5.1 (a) Transformation of the [1-13C]pyruvic acid purity profile for Drug Product Kit Components). As the dissolution step during the semi-automated compounding of Hyperpolarized Pyruvate (13C) Injection takes place in less than 10 seconds and the product is administered within 50s from start of dissolution, the transformation from AH113462/E to AH112623 will not be complete. The drug product used in the pre-clinical safety studies and for clinical studies GE-101-001 and GE-101-" +3.2.P.8.1,61,Stability Summary and Conclusion (Hyperpolarized Pyruvate (13C) Injection),"The stability-indicating parameter for Hyperpolarized Pyruvate (13C) Injection is the level of 13C nuclear polarization, which decays rapidly after compounding. The stability testing performed has therefore been limited to determination of the 13C nuclear polarization and relaxation time (T1). 3.2.P.8.1.1 Batches tested Stability testing has been performed on six samples of Hyperpolarized Pyruvate (13C) Injection compounded from Mixture of [1-13C]pyruvic acid and 15 mM AH111501 sodium salt batch FFF106/140-806 and TRIS/EDTA buffer solution batch FFF106/142-806. 3.2.P.8.1.2 Storage conditions and testing frequency Testing was performed inside an MRI scanner located next to the clean room where the compounding of the sample took place. For testing frequency, see Section 3.2.P.8.1.3 Analytical Procedures and Specification. 3.2.P.8.1.3 Analytical procedures and specification The level of 13C nuclear polarization was determined using a 3T MRI scanner. The hyperpolarized 13C NMR signal was obtained using a 5 degree RF pulse. During the relaxation of the non-equilibrium polarization, a total of 64 spectra with 5 degree pulse and TR=3s were collected; the first of which was used for calculating 13C polarization. The relaxation time (13C T1) was calculated by fitting these data to a mono-exponential decay curve. After relaxation to thermal equilibrium, a thermal 13C NMR spectrum was collected (90 degree pulse, 64 averages, TR=10s, after addition of 10 μl Gd/ml solution) in order to calculate the 13C polarization. No shelf-life specifications have been established for Hyperpolarized Pyruvate (13C) Injection. Assurance of quality at time of administration rests on analyses performed before release and the time limit for administration after the dissolution step is completed, which is stated in the imaging" +3.2.P.8.2,62,Post-approval Stability Protocol and Stability Commitment (Hyperpolarized Pyruvate (13C) Injection),"Sample not for submission protocol. 3.2.P.8.1.4 Summary of results The stability results are presented in Section 3.2.P.8.3 Stability Data. 3.2.P.8.1.5 Conclusion With a relaxation time of 69s, the polarization decreases by 7% (relative) each 5 seconds. To optimize imaging signal, administration should hence take place as quickly as practically possible. To assure the level of polarization during clinical use, and hence a certain level of imaging signal, the drug product will be administered within 50s from time of start dissolution. With a release specification of NLT 15.0% and a relaxation time of 69s, this user window limit will assure a polarization of NLT 7.3% at time of administration. 3.2.P.8.2 Post-approval Stability Protocol and Stability Commitment (Hyperpolarized Pyruvate (13C) Injection) Not applicable. 3.2.P.8.3 Stability Data (Hyperpolarized Pyruvate (13C) Injection) The average relaxation time determined for the six samples investigated was 68.8 ± 1.3s, with a range of 67.1 to 71.0s. A line derived from the stability results on Hyperpolarized Pyruvate (13C) Injection is shown in Figure 1. In Figure 1, the line represents a sample released at specification limit (NLT 15.0% at start of dissolution), decaying with the average measured relaxation time (69s)." +3.2.P.8.3,62,Stability Data (Hyperpolarized Pyruvate (13C) Injection),"The average relaxation time determined for the six samples investigated was 68.8 ± 1.3s, with a range of 67.1 to 71.0s. A line derived from the stability results on Hyperpolarized Pyruvate (13C) Injection is shown in Figure 1. In Figure 1, the line represents a sample released at specification limit (NLT 15.0% at start of dissolution), decaying with the average measured relaxation time (69s). Sample not for submission protocol. 3.2.P.8.1.4 Summary of results The stability results are presented in Section 3.2.P.8.3 Stability Data. 3.2.P.8.1.5 Conclusion With a relaxation time of 69s, the polarization decreases by 7% (relative) each 5 seconds. To optimize imaging signal, administration should hence take place as quickly as practically possible. To assure the level of polarization during clinical use, and hence a certain level of imaging signal, the drug product will be administered within 50s from time of start dissolution. With a release specification of NLT 15.0% and a relaxation time of 69s, this user window limit will assure a polarization of NLT 7.3% at time of administration. 3.2.P.8.2 Post-approval Stability Protocol and Stability Commitment (Hyperpolarized Pyruvate (13C) Injection) Not applicable. 3.2.P.8.3 Stability Data (Hyperpolarized Pyruvate (13C) Injection) The average relaxation time determined for the six samples investigated was 68.8 ± 1.3s, with a range of 67.1 to 71.0s. A line derived from the stability results on Hyperpolarized Pyruvate (13C) Injection is shown in Figure 1. In Figure 1, the line represents a sample released at specification limit (NLT 15.0% at start of dissolution), decaying with the average measured relaxation time (69s)." \ No newline at end of file diff --git a/images/agent_log_traces.png b/images/agent_log_traces.png new file mode 100644 index 0000000000..ca0ecb0cfc Binary files /dev/null and b/images/agent_log_traces.png differ diff --git a/images/agent_trace_details.png b/images/agent_trace_details.png new file mode 100644 index 0000000000..bf10e8134d Binary files /dev/null and b/images/agent_trace_details.png differ diff --git a/images/baseline_agent.png b/images/baseline_agent.png new file mode 100644 index 0000000000..6322a8bf7f Binary files /dev/null and b/images/baseline_agent.png differ diff --git a/images/eval_dashboard.png b/images/eval_dashboard.png new file mode 100644 index 0000000000..1ba0a57420 Binary files /dev/null and b/images/eval_dashboard.png differ diff --git a/images/eval_run_results.png b/images/eval_run_results.png new file mode 100644 index 0000000000..3b8086c84b Binary files /dev/null and b/images/eval_run_results.png differ diff --git a/images/eval_set_config.png b/images/eval_set_config.png new file mode 100644 index 0000000000..bf8e07dd77 Binary files /dev/null and b/images/eval_set_config.png differ diff --git a/images/feedback.png b/images/feedback.png new file mode 100644 index 0000000000..8a71d9c365 Binary files /dev/null and b/images/feedback.png differ diff --git a/images/prompt_input.png b/images/prompt_input.png new file mode 100644 index 0000000000..c12b373456 Binary files /dev/null and b/images/prompt_input.png differ diff --git a/images/simplified_reg_agent.png b/images/simplified_reg_agent.png new file mode 100644 index 0000000000..c1e312578d Binary files /dev/null and b/images/simplified_reg_agent.png differ diff --git a/images/updated_prompt.png b/images/updated_prompt.png new file mode 100644 index 0000000000..ed289b5031 Binary files /dev/null and b/images/updated_prompt.png differ diff --git a/images/updated_prompt_feedback.png b/images/updated_prompt_feedback.png new file mode 100644 index 0000000000..a0bb1a2a7b Binary files /dev/null and b/images/updated_prompt_feedback.png differ diff --git a/registry.yaml b/registry.yaml index 46633f7baa..500e56b6c1 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,6 +4,23 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. +- title: Self-Evolving Agents - A Cookbook for Autonomous Agent Retraining + path: examples/partners/self_evolving_agents/autonomous_agent_retraining.ipynb + date: 2025-11-04 + authors: + - shikhar-cyber + - Calvin Maguranis + - Valentina Frenkel + - Fanny Perraudeau + - Giorgio Saladino + tags: + - partners + - self-evolving-agents + - evals + - llmops + - prompt-engineering + - agent-retraining + - title: User guide for gpt-oss-safeguard path: articles/gpt-oss-safeguard-guide.md date: 2025-10-29