Skip to content

Conversation

@JoannaaKL
Copy link
Contributor

@JoannaaKL JoannaaKL commented Nov 4, 2025

This PR:

  • adds a dependency to microcosm-cc/bluemonday for html sanitization
  • adds a function that will sanitise html tags
  • uses the above to sanitise pull requests and issues

Misc updates:

  • github.com/microcosm-cc/bluemonday
  • script/licenses and script/licenses-check were updated to use GOFLAGS=-mod=mod which will ignore the vendor directory. The report template uses the LicenseURL field, but it’s being populated with links to this repo’s vendored paths because the tool is running in vendor mode by default when a vendor/ directory exists. By adding GOFLAGS=-mod=mod vendor directory is ignored and the upstream module url is used as expected.

@JoannaaKL JoannaaKL marked this pull request as ready for review November 5, 2025 10:23
@JoannaaKL JoannaaKL requested a review from a team as a code owner November 5, 2025 10:23
Copilot AI review requested due to automatic review settings November 5, 2025 10:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds HTML sanitization capabilities to prevent XSS attacks and other security issues when handling user-provided content. It introduces the bluemonday library to sanitize HTML tags in GitHub issue and pull request titles and bodies.

  • Integrates bluemonday library for HTML sanitization with a configurable policy
  • Updates sanitization logic to filter both invisible characters and HTML tags
  • Adds comprehensive test coverage for the new HTML filtering functionality

Reviewed Changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/sanitize/sanitize.go Adds FilterHTMLTags and Sanitize functions with bluemonday policy configuration
pkg/sanitize/sanitize_test.go Adds comprehensive test cases for HTML tag filtering
pkg/github/issues.go Updates to use Sanitize instead of FilterInvisibleCharacters
pkg/github/pullrequests.go Updates to use Sanitize instead of FilterInvisibleCharacters
go.mod Adds bluemonday and its dependencies
go.sum Updates checksums for new dependencies
third-party-licenses.*.md Adds license entries for new dependencies
third-party/*/LICENSE* Adds license files for new third-party dependencies
script/licenses Adds GOFLAGS=-mod=mod to go-licenses commands
script/licenses-check Adds GOFLAGS=-mod=mod and fixes echo to printf

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@SamMorrowDrums SamMorrowDrums left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🚀

Copy link
Contributor

@kerobbi kerobbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny comment, but other than that lgtm!

}

func FilterHTMLTags(input string) string {
if input == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also check if the string has any HTML in the first place in this early return?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, although an early return that has to parse the content might not be an optimisation. Hard to tell without getting into the weeds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was mainly thinking about is just adding a simple !strings.Contains(input, "<") check, nothing fancy or overly complex. It wouldn't even handle all edge cases (i.e., an input containing a single < for whatever reason) but it would be just a quick scan to avoid running the full sanitiser on plain input. But what you said is definitely a good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bluemonday does html input tokenization and I don't want to reinvent the wheel here. :)

@JoannaaKL JoannaaKL merged commit 6a39a39 into main Nov 5, 2025
22 checks passed
@JoannaaKL JoannaaKL deleted the add-html-filtering branch November 5, 2025 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants