You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🤖 Fix test flake by simplifying prompt and clarifying unlimited steps (#406)
## Problem
The `openai-web-search.test.ts` integration test was flaking in CI with
timeouts after 120+ seconds:
- Stream emitted 100+ events but never completed with `stream-end`
- Pattern: repeated reasoning-delta → reasoning-end → tool-call-start →
tool-call-end cycles
- 15 tool calls observed before timeout
- Test failed on all 3 retry attempts
**CI Run**:
https://github.com/coder/cmux/actions/runs/18766377932/job/53542148133
## Root Cause
The test prompt was too complex for a reasoning model:
```
Find gold price → compute price² → compute Collatz sequence steps to reach 1
```
With `thinkingLevel: 'high'` + `web_search`, this caused the model to
enter excessive tool call loops:
- Searching for gold prices repeatedly (volatile data)
- Extensive reasoning about the huge number (price² is millions)
- Never reaching a satisfactory conclusion within 120 seconds
**This is NOT a bug in the unlimited steps configuration** - models MUST
be able to run for hours or even days with unlimited tool calls for
autonomous workflows.
## Solution
1. **Clarified unlimited steps intent**: Added comment explaining that
the 100k step limit is intentionally high to support long-running
autonomous workflows
2. **Simplified test prompt**: Changed to simple weather query + picnic
decision
- Still tests reasoning + web_search combination
- Much less likely to cause excessive loops
- Still validates the original bug fix (itemId errors)
3. **Reduced thinking level**: Changed from `high` to `medium` to avoid
excessive deliberation
4. **Adjusted timeouts**: Reduced to 120s/90s for simpler task
## Testing
Type checking passes. The test still validates the same bug fix with a
more stable prompt.
---
_Generated with `cmux`_
0 commit comments