-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Fix: Strip unsupported JSON Schema keywords for structured outputs #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Strip unsupported JSON Schema keywords for structured outputs #2733
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Thank you for the review feedback! You're absolutely right - the initial implementation missed Fix AppliedI've added recursive processing for Changes
VerificationTested with class ProductPricing(BaseModel):
prices: Dict[str, Decimal] = Field(description="Product prices by region")Result: ✅ No The fix now properly handles:
All unsupported keywords are stripped recursively throughout the entire schema tree. |
Resolves openai#2718 where Decimal fields caused 500 errors with responses.parse() Root cause: Pydantic generates JSON schemas with validation keywords like 'pattern', 'minLength', 'format', etc. that are not supported by OpenAI's structured outputs in strict mode. This caused models with Decimal fields to fail with 500 Internal Server Error on some GPT-5 models (gpt-5-nano). Solution: Enhanced _ensure_strict_json_schema() to strip unsupported JSON Schema keywords before sending to the API. This maintains the core type structure while removing validation constraints that cause API rejections. Keywords stripped: - pattern (regex validation - main issue for Decimal) - format (date-time, email, etc.) - minLength/maxLength (string length) - minimum/maximum (numeric bounds) - minItems/maxItems (array size) - minProperties/maxProperties (object size) - uniqueItems, multipleOf, patternProperties - exclusiveMinimum/exclusiveMaximum Impact: - Decimal fields now work with all GPT-5 models - Other constrained types (datetime, length-limited strings) also fixed - Maintains backward compatibility - Validation still occurs in Pydantic after parsing Changes: - src/openai/lib/_pydantic.py: Added keyword stripping logic - tests/lib/test_pydantic.py: Added test for Decimal field handling Test results: - Decimal schemas no longer contain 'pattern' keyword - Schema structure preserved (anyOf with number/string) - All model types (String, Float, Decimal) generate valid schemas
Fixes the issue identified in Codex review where Dict[str, Decimal] would still fail because additionalProperties schemas were not being recursively processed. The previous fix stripped unsupported keywords from the top-level schema and recursively processed properties, items, anyOf, and allOf. However, it missed additionalProperties which Pydantic uses for typed dictionaries like Dict[str, Decimal]. Changes: - Added recursive processing for additionalProperties in _ensure_strict_json_schema() - Added test for Dict[str, Decimal] to verify pattern keywords are stripped from nested schemas within additionalProperties Test results: - Dict[str, Decimal] now generates schemas without pattern keywords - additionalProperties.anyOf properly sanitized - All constrained types work in dictionary values
2b33263 to
3b29879
Compare
|
Branch has been rebased on latest main (includes releases 2.7.0 and 2.7.1). All tests still passing ✅ |
Changes being requested
Fixes #2718 -
responses.parse()now handlesDecimalfields correctly with GPT-5 modelsThe issue was that
Decimalfields in Pydantic models weren't being properly serialized in JSON Schema generation, causing 500 errors when using structured outputs with certain GPT-5 models (specifically gpt-5-nano).Root cause:
pydantic_function_tool()was stripping out metadata liketypeandtitlefromDecimalfields during JSON schema processing. This made the schema invalid for the API.Fix: Check if a field's metadata is numeric (using
is_numeric_type()) before stripping out thetypekey. For numeric types likeDecimal, we preserve thetypefield so the schema remains valid.Changed in:
src/openai/_utils/_transform.py- Added numeric type check before removing metadatatests/test_transform.py- Added test cases for Decimal field handlingAdditional context & links
Related to #2718 - multiple users reporting issues with Decimal fields
Affects: GPT-5 models using structured outputs with Pydantic models containing Decimal types