Skip to content

Commit 396a77d

Browse files
tmshortclaude
andcommitted
✨ Add memory profiling toolchain for e2e tests
Add comprehensive memory profiling infrastructure to collect, analyze, and compare heap profiles during e2e test execution. **Features:** - Automated heap profile collection from operator-controller and catalogd - Real-time profile capture every 15 seconds during test execution - Multi-component profiling with separate analysis for each component - Prometheus alert tracking integrated with profiling reports - Side-by-side comparison of different test runs - Claude Code integration via /memory-profile command **Tooling:** - `collect-profiles.sh`: Port-forward to pprof endpoints and collect heap dumps - `analyze-profiles.sh`: Generate detailed analysis with top allocators and growth patterns - `compare-profiles.sh`: Compare two test runs to identify regressions - `run-profiled-test.sh`: Orchestrate full profiled test runs - `memory-profile.sh`: Main entry point with subcommands (run/analyze/compare) **Usage:** ```bash # Run a profiled test ./hack/tools/memory-profiling/memory-profile.sh run baseline test-experimental-e2e # Analyze collected profiles ./hack/tools/memory-profiling/memory-profile.sh analyze baseline # Compare two test runs ./hack/tools/memory-profiling/memory-profile.sh compare baseline optimized ``` **Integration:** - Claude Code command: `/memory-profile` for interactive use - Automatic cleanup of empty profiles from cluster teardown - Prometheus alert extraction from e2e test summaries - Detailed markdown reports with growth analysis and recommendations This tooling was essential for identifying memory optimization opportunities and validating that alert thresholds are correctly calibrated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 6d58c4b commit 396a77d

16 files changed

+2588
-3
lines changed

.claude/commands/memory-profile.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
---
2+
description: Profile memory usage during e2e tests and analyze results
3+
---
4+
5+
# Memory Profiling Plugin
6+
7+
Analyze memory usage during e2e tests by collecting pprof heap profiles and generating comprehensive analysis reports.
8+
9+
## Commands
10+
11+
### /memory-profile run [test-name] [test-target]
12+
13+
Run an e2e test with continuous memory profiling:
14+
15+
1. Start the specified e2e test (defaults to `make test-experimental-e2e`)
16+
2. Wait for the operator-controller pod to be ready
17+
3. Collect heap profiles every 15 seconds to `./memory-profiles/[test-name]/`
18+
4. Continue until the test completes or is interrupted
19+
5. Generate a summary report
20+
21+
**Test Targets:**
22+
- `test-e2e` - Standard e2e tests
23+
- `test-experimental-e2e` - Experimental e2e tests (default)
24+
- `test-extension-developer-e2e` - Extension developer e2e tests
25+
- `test-upgrade-e2e` - Upgrade e2e tests
26+
- `test-upgrade-experimental-e2e` - Upgrade experimental e2e tests
27+
28+
**Examples:**
29+
```
30+
/memory-profile run baseline
31+
/memory-profile run baseline test-e2e
32+
/memory-profile run with-caching test-experimental-e2e
33+
/memory-profile run upgrade-test test-upgrade-e2e
34+
```
35+
36+
### /memory-profile analyze [test-name]
37+
38+
Analyze collected heap profiles for a specific test run:
39+
40+
1. Load all heap profiles from `./memory-profiles/[test-name]/`
41+
2. Analyze memory growth patterns
42+
3. Identify top allocators
43+
4. Find OpenAPI, JSON, and other hotspots
44+
5. Generate detailed markdown report
45+
46+
**Example:**
47+
```
48+
/memory-profile analyze baseline
49+
```
50+
51+
### /memory-profile compare [test1] [test2]
52+
53+
Compare two test runs to measure the impact of changes:
54+
55+
1. Load profiles from both test runs
56+
2. Compare peak memory usage
57+
3. Compare memory growth rates
58+
4. Identify differences in allocation patterns
59+
5. Generate side-by-side comparison report with charts
60+
61+
**Example:**
62+
```
63+
/memory-profile compare baseline with-caching
64+
```
65+
66+
### /memory-profile collect
67+
68+
Manually collect a single heap profile from the running operator-controller pod:
69+
70+
1. Find the operator-controller pod
71+
2. Set up port forwarding to pprof endpoint
72+
3. Download heap profile
73+
4. Save to `./memory-profiles/manual/heap-[timestamp].pprof`
74+
75+
**Example:**
76+
```
77+
/memory-profile collect
78+
```
79+
80+
## Task Breakdown
81+
82+
When you invoke this command, I will:
83+
84+
1. **Setup Phase**
85+
- Create `./memory-profiles/[test-name]` directory
86+
- Verify `make test-experimental-e2e` is available
87+
- Check kubectl access to the cluster
88+
89+
2. **Collection Phase**
90+
- Start the e2e test in background
91+
- Monitor for pod readiness
92+
- Set up port forwarding to pprof endpoint (port 6060)
93+
- Collect heap profiles every 15 seconds
94+
- Save profiles with sequential naming (heap0.pprof, heap1.pprof, ...)
95+
96+
3. **Monitoring Phase**
97+
- Track test progress
98+
- Monitor profile file sizes for growth patterns
99+
- Detect if test crashes or completes
100+
101+
4. **Analysis Phase**
102+
- Use `go tool pprof` to analyze profiles
103+
- Extract key metrics:
104+
- Peak memory usage
105+
- Memory growth over time
106+
- Top allocators
107+
- OpenAPI-related allocations
108+
- JSON deserialization overhead
109+
- Informer/cache allocations
110+
111+
5. **Reporting Phase**
112+
- Generate markdown report with:
113+
- Executive summary
114+
- Memory timeline chart
115+
- Top allocators table
116+
- Allocation breakdown
117+
- Recommendations for optimization
118+
119+
## Configuration
120+
121+
The plugin uses these defaults (customizable via environment variables):
122+
123+
```bash
124+
# Namespace where operator-controller runs
125+
MEMORY_PROFILE_NAMESPACE=olmv1-system
126+
127+
# Deployment name to monitor
128+
MEMORY_PROFILE_DEPLOYMENT=operator-controller-controller-manager
129+
130+
# Label selector for pod
131+
MEMORY_PROFILE_POD_LABEL="app.kubernetes.io/name=operator-controller"
132+
133+
# Pprof endpoint port
134+
MEMORY_PROFILE_PPROF_PORT=6060
135+
136+
# Collection interval in seconds
137+
MEMORY_PROFILE_INTERVAL=15
138+
139+
# Output directory base
140+
MEMORY_PROFILE_DIR=./memory-profiles
141+
```
142+
143+
## Output Structure
144+
145+
```
146+
memory-profiles/
147+
├── baseline/
148+
│ ├── heap0.pprof
149+
│ ├── heap1.pprof
150+
│ ├── ...
151+
│ ├── heap23.pprof
152+
│ ├── test.log
153+
│ └── analysis.md
154+
├── with-caching/
155+
│ ├── heap0.pprof
156+
│ ├── ...
157+
│ └── analysis.md
158+
└── comparisons/
159+
└── baseline-vs-with-caching.md
160+
```
161+
162+
## Tool Location
163+
164+
The memory profiling scripts are located at:
165+
```
166+
hack/tools/memory-profiling/
167+
├── memory-profile.sh # Main entry point
168+
├── run-profiled-test.sh # Run test with profiling
169+
├── collect-profiles.sh # Collect heap profiles
170+
├── analyze-profiles.sh # Generate analysis
171+
├── compare-profiles.sh # Compare two runs
172+
├── README.md # Full documentation
173+
└── USAGE_EXAMPLES.md # Real-world examples
174+
```
175+
176+
You can run them directly:
177+
```bash
178+
./hack/tools/memory-profiling/memory-profile.sh run baseline
179+
./hack/tools/memory-profiling/memory-profile.sh analyze baseline
180+
./hack/tools/memory-profiling/memory-profile.sh compare baseline optimized
181+
```
182+
183+
## Requirements
184+
185+
- kubectl with access to the cluster
186+
- go tool pprof
187+
- make (for running tests)
188+
- curl (for fetching profiles)
189+
- Port 6060 available for forwarding
190+
191+
## Example Workflow
192+
193+
```bash
194+
# 1. Run baseline test with profiling
195+
/memory-profile run baseline
196+
197+
# 2. Make code changes (e.g., add caching)
198+
# ... edit code ...
199+
200+
# 3. Run new test with profiling
201+
/memory-profile run with-caching
202+
203+
# 4. Compare results
204+
/memory-profile compare baseline with-caching
205+
206+
# 5. Review the comparison report
207+
# Opens: memory-profiles/comparisons/baseline-vs-with-caching.md
208+
```
209+
210+
## Notes
211+
212+
- The test will run until completion or manual interruption (Ctrl+C)
213+
- Each heap profile is ~11-150KB depending on memory usage
214+
- Analysis requires all heap files to be present
215+
- Port forwarding runs in background and auto-cleans on exit
216+
- Reports are generated in markdown format for easy viewing

.claude/settings.local.json

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"permissions": {
3+
"allow": [
4+
"Read(//home/tshort/experimental-e2e-testing/**)",
5+
"Bash(go tool pprof:*)",
6+
"Bash(for i in 0 5 10 15 20 23)",
7+
"Bash(do echo \"=== heap$i.pprof ===\")",
8+
"Bash(done)",
9+
"Bash(awk:*)",
10+
"Bash(go doc:*)",
11+
"Bash(go list:*)",
12+
"Read(//home/tshort/go/pkg/mod/k8s.io/**)",
13+
"Bash(go build:*)"
14+
],
15+
"deny": [],
16+
"ask": []
17+
}
18+
}

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,6 @@ vendor/
3838
\#*\#
3939
.\#*
4040

41-
# AI temp files files
42-
.claude/
43-
4441
# documentation website asset folder
4542
site
4643

@@ -50,3 +47,6 @@ site
5047

5148
# Temporary files and directories
5249
/test/regression/convert/testdata/tmp/*
50+
51+
# Memory profiling artifacts
52+
memory-profiles/
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Ignore generated analysis files in example directories
2+
memory-profiles/*/analysis.md
3+
memory-profiles/comparisons/

0 commit comments

Comments
 (0)