-
Notifications
You must be signed in to change notification settings - Fork 3k
Add html filtering #1356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add html filtering #1356
Conversation
ae212d8 to
d2d09b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds HTML sanitization capabilities to prevent XSS attacks and other security issues when handling user-provided content. It introduces the bluemonday library to sanitize HTML tags in GitHub issue and pull request titles and bodies.
- Integrates bluemonday library for HTML sanitization with a configurable policy
- Updates sanitization logic to filter both invisible characters and HTML tags
- Adds comprehensive test coverage for the new HTML filtering functionality
Reviewed Changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/sanitize/sanitize.go |
Adds FilterHTMLTags and Sanitize functions with bluemonday policy configuration |
pkg/sanitize/sanitize_test.go |
Adds comprehensive test cases for HTML tag filtering |
pkg/github/issues.go |
Updates to use Sanitize instead of FilterInvisibleCharacters |
pkg/github/pullrequests.go |
Updates to use Sanitize instead of FilterInvisibleCharacters |
go.mod |
Adds bluemonday and its dependencies |
go.sum |
Updates checksums for new dependencies |
third-party-licenses.*.md |
Adds license entries for new dependencies |
third-party/*/LICENSE* |
Adds license files for new third-party dependencies |
script/licenses |
Adds GOFLAGS=-mod=mod to go-licenses commands |
script/licenses-check |
Adds GOFLAGS=-mod=mod and fixes echo to printf |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
SamMorrowDrums
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🚀
kerobbi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny comment, but other than that lgtm!
| } | ||
|
|
||
| func FilterHTMLTags(input string) string { | ||
| if input == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could also check if the string has any HTML in the first place in this early return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, although an early return that has to parse the content might not be an optimisation. Hard to tell without getting into the weeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was mainly thinking about is just adding a simple !strings.Contains(input, "<") check, nothing fancy or overly complex. It wouldn't even handle all edge cases (i.e., an input containing a single < for whatever reason) but it would be just a quick scan to avoid running the full sanitiser on plain input. But what you said is definitely a good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bluemonday does html input tokenization and I don't want to reinvent the wheel here. :)
This PR:
Misc updates:
script/licensesandscript/licenses-checkwere updated to useGOFLAGS=-mod=modwhich will ignore the vendor directory. The report template uses theLicenseURLfield, but it’s being populated with links to this repo’s vendored paths because the tool is running in vendor mode by default when a vendor/ directory exists. By addingGOFLAGS=-mod=modvendor directory is ignored and the upstream module url is used as expected.