From d2ae5053715d65a812f3f8f604ccfc84e6c95e44 Mon Sep 17 00:00:00 2001 From: Todd Short Date: Tue, 4 Nov 2025 15:49:17 -0500 Subject: [PATCH] Add Claude Code integration for e2e profiling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add `/e2e-profile` slash command to enable interactive profiling workflow through Claude Code interface. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Signed-off-by: Todd Short --- .claude/commands/e2e-profile.md | 309 ++++++++++++++++++++++++++++++++ 1 file changed, 309 insertions(+) create mode 100644 .claude/commands/e2e-profile.md diff --git a/.claude/commands/e2e-profile.md b/.claude/commands/e2e-profile.md new file mode 100644 index 000000000..a8e67097f --- /dev/null +++ b/.claude/commands/e2e-profile.md @@ -0,0 +1,309 @@ +--- +description: Profile memory and CPU usage during e2e tests and analyze results +--- + +# E2E Profiling Plugin + +Analyze memory and CPU usage during e2e tests by collecting pprof heap and CPU profiles and generating comprehensive analysis reports. + +## Commands + +### /e2e-profile start [test-name] + +Start profiling in background mode (recommended workflow): + +1. Start port-forwards to operator-controller and catalogd +2. Begin collecting heap and CPU profiles every 10 seconds +3. Run in background, allowing you to run any test command +4. Auto-detect cluster teardown and stop gracefully +5. Use `/e2e-profile stop` to finish and analyze + +**Examples:** +``` +/e2e-profile start baseline +# Then run: make test-e2e +# Then run: /e2e-profile stop +``` + +This workflow: +- Works with ANY test command (make test-e2e, make test-experimental-e2e, custom commands) +- Handles cluster teardown gracefully (test-e2e tears down cluster) +- Auto-stops after 3 consecutive collection failures +- Lets you run tests your way + +### /e2e-profile stop + +Stop background profiling session and generate analysis: + +1. Stop profile collection process +2. Kill port-forward processes (or detect they're already stopped) +3. Clean up empty profile files +4. Generate comprehensive analysis report + +**Example:** +``` +/e2e-profile stop +``` + +### /e2e-profile run [test-name] [test-target] + +Run an e2e test with continuous memory and CPU profiling (automated workflow): + +1. Start the specified e2e test (defaults to `make test-experimental-e2e`) +2. Wait for the operator-controller pod to be ready +3. Collect heap and CPU profiles every 10 seconds to `./e2e-profiles/[test-name]/` +4. Continue until the test completes or is interrupted +5. Generate a summary report with memory and CPU analysis + +**Test Targets:** +- `test-e2e` - Standard e2e tests +- `test-experimental-e2e` - Experimental e2e tests (default) +- `test-extension-developer-e2e` - Extension developer e2e tests +- `test-upgrade-e2e` - Upgrade e2e tests +- `test-upgrade-experimental-e2e` - Upgrade experimental e2e tests + +**Examples:** +``` +/e2e-profile run baseline +/e2e-profile run baseline test-e2e +/e2e-profile run with-caching test-experimental-e2e +/e2e-profile run upgrade-test test-upgrade-e2e +``` + +### /e2e-profile analyze [test-name] + +Analyze collected heap profiles for a specific test run: + +1. Load all heap profiles from `./e2e-profiles/[test-name]/` +2. Analyze memory growth patterns +3. Identify top allocators +4. Find OpenAPI, JSON, and other hotspots +5. Generate detailed markdown report + +**Example:** +``` +/e2e-profile analyze baseline +``` + +### /e2e-profile compare [test1] [test2] + +Compare two test runs to measure the impact of changes: + +1. Load profiles from both test runs +2. Compare peak memory usage +3. Compare memory growth rates +4. Identify differences in allocation patterns +5. Generate side-by-side comparison report with charts + +**Example:** +``` +/e2e-profile compare baseline with-caching +``` + +### /e2e-profile collect + +Manually collect a single heap profile from the running operator-controller pod: + +1. Find the operator-controller pod +2. Set up port forwarding to pprof endpoint +3. Download heap profile +4. Save to `./e2e-profiles/manual/heap-[timestamp].pprof` + +**Example:** +``` +/e2e-profile collect +``` + +## Task Breakdown + +When you invoke this command, I will: + +1. **Setup Phase** + - Create `./e2e-profiles/[test-name]` directory + - Verify `make test-experimental-e2e` is available + - Check kubectl access to the cluster + +2. **Collection Phase** + - Start the e2e test in background + - Monitor for pod readiness + - Set up port forwarding to pprof endpoint (port 6060) + - Collect heap profiles every 10 seconds + - Save profiles with sequential naming (heap0.pprof, heap1.pprof, ...) + +3. **Monitoring Phase** + - Track test progress + - Monitor profile file sizes for growth patterns + - Detect if test crashes or completes + +4. **Analysis Phase** + - Use `go tool pprof` to analyze profiles + - Extract key metrics: + - Peak memory usage + - Memory growth over time + - Top allocators + - OpenAPI-related allocations + - JSON deserialization overhead + - Informer/cache allocations + +5. **Reporting Phase** + - Generate markdown report with: + - Executive summary + - Memory timeline chart + - Top allocators table + - Allocation breakdown + - Recommendations for optimization + +## Configuration + +The plugin uses these defaults (customizable via environment variables): + +```bash +# Namespace where operator-controller runs +E2E_PROFILE_NAMESPACE=olmv1-system + +# Collection interval in seconds +E2E_PROFILE_INTERVAL=10 + +# CPU sampling duration in seconds +E2E_PROFILE_CPU_DURATION=10 + +# Profile collection mode (both, heap, cpu) +E2E_PROFILE_MODE=both + +# Output directory base +E2E_PROFILE_DIR=./e2e-profiles + +# Default test target +E2E_PROFILE_TEST_TARGET=test-experimental-e2e +``` + +**Profile Modes:** +- `both` (default): Collect both heap and CPU profiles +- `heap`: Collect only heap profiles (reduces overhead by ~3%) +- `cpu`: Collect only CPU profiles + +## Output Structure + +``` +e2e-profiles/ +├── baseline/ +│ ├── operator-controller/ +│ │ ├── heap0.pprof +│ │ ├── heap1.pprof +│ │ ├── cpu0.pprof +│ │ ├── cpu1.pprof +│ │ └── ... +│ ├── catalogd/ +│ │ ├── heap0.pprof +│ │ ├── cpu0.pprof +│ │ └── ... +│ ├── test.log +│ ├── collection.log +│ └── analysis.md +├── with-caching/ +│ └── ... +└── comparisons/ + └── baseline-vs-with-caching.md +``` + +## Tool Location + +The memory profiling scripts are located at: +``` +hack/tools/e2e-profiling/ +├── e2e-profile.sh # Main entry point +├── start-profiling.sh # Start background profiling +├── stop-profiling.sh # Stop profiling and analyze +├── run-profiled-test.sh # Run test with profiling (automated) +├── collect-profiles.sh # Profile collection loop +├── analyze-profiles.sh # Generate analysis reports +├── compare-profiles.sh # Compare two runs +├── common.sh # Shared utilities +└── README.md # Full documentation +``` + +You can run them directly: +```bash +# Start/Stop workflow +make start-profiling # or ./hack/tools/e2e-profiling/start-profiling.sh +make test-e2e +make stop-profiling # or ./hack/tools/e2e-profiling/stop-profiling.sh + +# Automated workflow +./hack/tools/e2e-profiling/e2e-profile.sh run baseline +./hack/tools/e2e-profiling/e2e-profile.sh analyze baseline +./hack/tools/e2e-profiling/e2e-profile.sh compare baseline optimized +``` + +## Requirements + +- kubectl with access to the cluster +- go tool pprof +- make (for running tests) +- curl (for fetching profiles) +- Port 6060 available for forwarding + +## Example Workflows + +### Recommended: Start/Stop Workflow + +```bash +# 1. Start profiling in background +/e2e-profile start baseline + +# 2. Run your test (any command!) +make test-e2e # Works! Handles cluster teardown +make test-experimental-e2e # Works! +go test ./test/e2e/... # Works! + +# 3. Stop profiling and get analysis +/e2e-profile stop + +# 4. Make code changes and test again +# ... edit code ... +/e2e-profile start optimized +make test-e2e +/e2e-profile stop + +# 5. Compare results +/e2e-profile compare baseline optimized +``` + +### Alternative: Automated Workflow + +```bash +# 1. Run baseline test with profiling (automated) +/e2e-profile run baseline + +# 2. Make code changes (e.g., add caching) +# ... edit code ... + +# 3. Run new test with profiling +/e2e-profile run with-caching + +# 4. Compare results +/e2e-profile compare baseline with-caching + +# 5. Review the comparison report +# Opens: e2e-profiles/comparisons/baseline-vs-with-caching.md +``` + +## Notes + +**Start/Stop Workflow:** +- Profiler runs in background, letting you run any test command +- Auto-detects cluster teardown after 3 consecutive collection failures +- Port-forwards and collection process stop gracefully +- Works with test-e2e (which tears down cluster), test-experimental-e2e, and custom commands + +**Automated Workflow:** +- Test will run until completion or manual interruption (Ctrl+C) +- Automatically handles profiling setup and teardown + +**General:** +- Each heap profile is ~11-150KB depending on memory usage +- Each CPU profile is ~4-40KB depending on activity +- Analysis requires all profile files to be present +- Port forwarding uses deployments (survives pod restarts) +- Reports are generated in markdown format for easy viewing +- Empty profile files are automatically cleaned up