-
Notifications
You must be signed in to change notification settings - Fork 421
Upload CI generated fuzz corpus coverage to codecov #4153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload CI generated fuzz corpus coverage to codecov #4153
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4153 +/- ##
==========================================
+ Coverage 88.85% 89.28% +0.42%
==========================================
Files 180 180
Lines 137901 137901
Branches 137901 137901
==========================================
+ Hits 122537 123125 +588
+ Misses 12552 12173 -379
+ Partials 2812 2603 -209
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dc493c2 to
fdf6799
Compare
contrib/generate_fuzz_coverage.sh
Outdated
| for target_dir in hfuzz_workspace/*; do | ||
| [ -d "$target_dir" ] || continue | ||
| src_name="$(basename "$target_dir")" | ||
| for dest in "$src_name" "${src_name%_target}"; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to copy into $src_name.
contrib/generate_fuzz_coverage.sh
Outdated
| mkdir -p "test_cases/$dest" | ||
| # Copy corpus files into the test_cases directory | ||
| find "$target_dir" -maxdepth 2 -type f \ | ||
| \( -path "$target_dir/CORPUS/*" -o -path "$target_dir/INPUT/*" -o -path "$target_dir/NEW/*" -o -path "$target_dir/input/*" \) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're just looking in hfuzz_workspace, I believe we only need to look in input, not CORPUS, INPUT, or NEW.
| cargo clean | ||
| - name: Run fuzzers | ||
| run: cd fuzz && ./ci-fuzz.sh && cd .. | ||
| - name: Upload honggfuzz corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than only uploading, is there a way to make this directory persistent so that we can keep it between fuzz jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we really need to persist the directory here. My understanding is that the fuzz job runs on the latest code changes on every PR, so the generated corpus is tailored to the code changes on that PR. If we persist the corpus from a previous run and use that on a new run, won't that produce incorrect/misleading coverage data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the point of the fuzz job is only to generate coverage data, but rather test the code :). Having a bit more coverage data from fuzzing than we "deserve" is okay, at least now that we split the coverage data out so that codecov shows fuzzing separately, and having persistent fuzzing corpus means our fuzzing is much more likely to catch issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, how long do you think we can have this directory persisted? The upload-artifact action have a retention-days input that can be used to persist the artifact for a while. The default is 90 days but can be adjusted (https://github.com/actions/upload-artifact?tab=readme-ov-file#retention-period).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the simple "upload-artifact" task just stores data for this CI run. What I was thinking is some kind of persistent directory that's shared across jobs so that each CI fuzz task picks up the latest directory, does some fuzzing, finds new test cases, then uploads a new copy with more tests in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was thinking is some kind of persistent directory that's shared across jobs so that each CI fuzz task picks up the latest directory, does some fuzzing, finds new test cases, then uploads a new copy with more tests in it.
Makes sense. I pushed eea2e4b to handle this using Github's cache action (https://github.com/actions/cache?tab=readme-ov-file).
contrib/generate_fuzz_coverage.sh
Outdated
| # Copy corpus files into the test_cases directory | ||
| find "$target_dir" -maxdepth 2 -type f \ | ||
| \( -path "$target_dir/CORPUS/*" -o -path "$target_dir/INPUT/*" -o -path "$target_dir/NEW/*" -o -path "$target_dir/input/*" \) \ | ||
| -print0 | xargs -0 -I{} cp -n {} "test_cases/$dest/" 2>/dev/null || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -print0 | xargs -0 -I{} cp -n {} "test_cases/$dest/" 2>/dev/null || true | |
| -print0 | xargs -0 -I{} cp -n {} "test_cases/$dest/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you.
contrib/generate_fuzz_coverage.sh
Outdated
| done | ||
| # Check if any files were actually imported | ||
| if [ -n "$(find test_cases -type f -print -quit 2>/dev/null)" ]; then | ||
| imported=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure its worth the extra effort just to print differently.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
|
Thank you for the review. I've addressed all feedbacks and pushed a fixup here 1e4a7c5 |
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded at #4153 (comment)
19c1495 to
eea2e4b
Compare
|
I rebased on EDIT: This seems to be blocking the build (and fuzzing as well). |
|
Yea, sorry, CI is kinda a mess for three reasons all at once. Can you rebase on #4179? That should get at least the |
eea2e4b to
de9e1fd
Compare
Yes, done! Thank you. |
|
FWIW, remaining CI failures should be resolved shortly by #4180 |
| uses: actions/cache@v4 | ||
| with: | ||
| path: fuzz/hfuzz_workspace | ||
| key: fuzz-corpus-${{ github.ref }}-${{ github.sha }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to be per-pr? We don't want it to be per-pr we want it to be global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to be per-pr? We don't want it to be per-pr we want it to be global.
Addressed this by adding a two-step logic to the workflow: a read-only per-pr step that seeds the fuzzer for a more effective run on PRs and a main branch step that does same but also writes to the global cache.
fcba095 to
6cd3f8f
Compare
| uses: actions/cache@v4 | ||
| with: | ||
| path: fuzz/hfuzz_workspace | ||
| key: fuzz-corpus-refs/heads/main-${{ github.sha }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we save to fuzz-corpus-refs/heads/main-? this includes the sha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we save to
fuzz-corpus-refs/heads/main-?
No, the key isn't a save location but an identifier used to save and search for a cache.
this includes the sha.
Yes. Because we can't mutate an already existing cache, but still need to ensure new updated corpus are cached. The sha ensures the cache is unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the key isn't a save location but an identifier used to save and search for a cache.
Wait, this statement contradicted itself?
Yes. Because we can't mutate an already existing cache, but still need to ensure new updated corpus are cached. The sha ensures the cache is unique.
But how does the read end know where to look for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, this statement contradicted itself?
Right, generally, the key is used to save and search. But the idea here is to rely on the key to save the updated corpus uniquely (since we can't mutate an already existing cache) and use restore-keys to search.
But how does the read end know where to look for it?
The restore-keys does a prefix search and restores/downloads the closest matching cache that was recently created, when there's a miss on key.
For example, when this runs the first time and the corpus gets saved as fuzz-corpus-refs/heads/main-sha123, on the second run the key becomes fuzz-corpus-refs/heads/main-sha456 and will miss, so it falls back to the restore-keys to restore the most recent cache with the prefix fuzz-corpus-refs/heads/main-. That will download fuzz-corpus-refs/heads/main-sha123, the run will use that, updates it and save it as fuzz-corpus-refs/heads/main-sha456, and the loop continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhhhh, okay, i wasn't clear that it does a prefix search, can you add a comment noting that? Otherwise LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I just updated the comment and pushed 5edb8cf.
Let me know when I can squash my second fixup commit into my first commit.
| with: | ||
| path: fuzz/hfuzz_workspace | ||
| key: fuzz-corpus-refs/heads/main-${{ github.sha }} | ||
| restore-keys: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably when running on main we don't need the restore-keys trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably when running on main we don't need the restore-keys trick?
No, we still do. Because the save key includes the sha so it will always miss on restore, but the restore-keys will help restore the matching most recent cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but then where do we save in a way that something knows where to find it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restore-keys does a prefix search and restores the most recently created cache with the prefix provided.
6cd3f8f to
5edb8cf
Compare
|
Cool! Yea, this LGTM, we'll obv have to land it to fully test it. Feel free to squash the fixup commits down, and given the When you do so, please add some linebreaks to the commit message so that no line is longer than ~70 chars. |
Because each CI job runs on a fresh runner and can't share data between jobs. We rely on Github Actions upload-artifact and download-artifact to share the CI generated fuzz corpus, then replay them in the `contrib/generate_fuzz_coverage.sh` script to generate the coverage report.
Implements a persistent, global fuzz corpus cache. PRs perform a "read-only" restore from the `main` cache to seed fuzzer runs. The `main` branch performs a "read-write" to save new findings and grow the corpus.
5edb8cf to
5807852
Compare
Done! I've rebased onto main, squashed the fixup, and formatted both commit messages to the ~70 chars limit. Thanks for all the guidance! |
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks so much.
Following the work (#3718 and #3925) that introduced uploading coverage from no-corpus fuzzing runs into codecov in CI. This PR focuses on uploading the CI-generated fuzz corpus coverage into codecov in CI.
Closes #3926