Skip to content

Conversation

@inaki-amatria
Copy link
Member

This PR adds whitespace parsing support to tree-sitter-fortran. This PR supersedes #19.

After testing with several open-source Fortran projects, it looks like these changes do not introduce any syntax regressions in the grammar:

Project Files processed Syntax errors @ codee Syntax errors @ feature/ParseWhitespaces
EAP-Patterns 11 0 0
FMS 241 6 6
GenAsis_Basics 222 0 0
HYCOM 58 17 17
icon-model 2139 391 391
LSMS 185 91 91
MOM6 508 15 15
NPB 211 4 4
OpenBLAS 3845 724 724
PHASTA 276 75 75
PQDTS 1 0 0
Thornado 1068 28 28

@inaki-amatria inaki-amatria requested a review from a team October 24, 2025 15:01
@inaki-amatria inaki-amatria self-assigned this Oct 24, 2025
@inaki-amatria inaki-amatria requested review from alvrogd and jgonzac and removed request for a team October 24, 2025 15:01
extras: $ => [
// This allows escaping newlines everywhere, although this is only valid in
// preprocessor statements
/\s|\\\r?\n/,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this regex can't be removed without breaking preprocessor support!

alias($.preproc_call_expression, $.call_expression),
)),
/\r?\n/,
$._external_end_of_statement,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the addition of $.whitespace as an external rule, this change becomes necessary. Otherwise, tree-sitter-fortran fails to recognize newline characters, since they may be consumed silently by the external scanner. The issue is that the external scanner reads tokens directly from the input buffer without checking valid_symbols, which is a bug that should be reported upstream. This behavior has made proper whitespace parsing extremely difficult.

@inaki-amatria inaki-amatria marked this pull request as ready for review October 24, 2025 15:11
Copy link

@alvrogd alvrogd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing with several open-source Fortran projects, it looks like these changes do not introduce any syntax regressions in the grammar:

Awesome! After looking at the tests in this PR, I understand you have observed in general that the rest of the parsing tree always stays the same.

Thanks also for the clarifications on the grammar changes. A lot of hours must have been put into this PR; it's a lot of rationale to unpack for each change, so the comments really help to follow it.

Copy link

@ruifm ruifm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Live reviewed, overal LGTM, great job!

- Patch #3 did not pass the internal testsuite
- Patches #20 and stadelmanma#22 were merged together
@inaki-amatria inaki-amatria force-pushed the feature/ParseWhitespaces branch from 5415a9e to 0e770a5 Compare October 27, 2025 16:27
Copy link

@jgonzac jgonzac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after live review, except the optional($.whitespace) workaround but it looks like it works.
Good job anyway!

Comment on lines +164 to +173
// HACK: This `optional($.whitespace)` is a workaround to prevent
// `tree-sitter` from misinterpreting `#define FOO (bar * baz)` as a
// `preproc_function_def`. For reasons related to how `token.immediate()`
// detects adjacent tokens, explicitly naming the whitespace rule causes
// the parser to think the opening parenthesis after `FOO` is immediately
// following it, which makes it match the function-like form instead of
// the object-like one. Adding an optional `whitespace` node here breaks
// that ambiguity and forces `tree-sitter` to correctly parse it as a
// regular `preproc_def`
optional($.whitespace),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand yet how this can disambiguate the rules 🤷🏼‍♂️

@inaki-amatria inaki-amatria merged commit 0e770a5 into codee Oct 28, 2025
1 check passed
@inaki-amatria inaki-amatria deleted the feature/ParseWhitespaces branch October 28, 2025 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants