Skip to content

Conversation

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 25, 2025

Context: #124403

The current behavior of repr(C) enums is as follows:

  • The discriminant values are interpreted as const expressions of type isize
  • We compute the smallest size that can hold all discriminant values
  • The target spec contains the smallest size for repr(C) enums
  • We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to always give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:

// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an int. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:

  • In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is isize. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write as isize explicitly and thus trigger the wrapping. Later, we can then decide to make the tag that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an isize so nothing bigger than isize could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
  • In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have any type a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting values (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is long long, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into c_int. As a slight extension, we do not lint enums where all discriminants fit into a c_uint (i.e. unsigned int): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an unsigned int). IOW, the lint fires whenever our layout algorithm would make the enum larger than an int, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because crater found multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type usize/isize, it is fairly easy to write code that only triggers overflowing_literals on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):

error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default

Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from isize to i64 or even i128 -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc @workingjubilee @CAD97

@rustbot
Copy link
Collaborator

rustbot commented Sep 25, 2025

This PR modifies tests/auxiliary/minicore.rs.

cc @jieyouxu

@rustbot rustbot added A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 25, 2025

r? @davidtwco

rustbot has assigned @davidtwco.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from ea5e355 to 8d8188e Compare September 25, 2025 14:25
@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from 8d8188e to ca6a04d Compare September 25, 2025 15:58
@workingjubilee
Copy link
Member

I think we should crater it (presumably dialed up to deny-by-default) before we go with any report_in_deps: true right off the bat.

@RalfJung
Copy link
Member Author

presumably dialed up to deny-by-default

That's still ignored in dependencies, we'd want a hard error. And sadly converting between a lint and a hard error is a huge pain in the neck ever since we got that (by now apparently unmaintained) translatable diagnostics infrastructure. :/

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from ca6a04d to 491bfd6 Compare September 25, 2025 17:31
@RalfJung
Copy link
Member Author

Ah, I can just not use translatable diagnostics. :)

@bors try

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 25, 2025
FCW for repr(C) enums whose discriminant values do not fit into a c_int
@workingjubilee
Copy link
Member

Yes, if new code is added using the translatable diagnostics infrastructure that's fine but also we should just bypass it the moment it is an inconvenience.

@rust-bors
Copy link

rust-bors bot commented Sep 25, 2025

☀️ Try build successful (CI)
Build commit: 0f31ace (0f31acebf540c89e4a4d5d114959fe91973419cb, parent: 6f34f4ee074ce0affc7bbf4e2c835f66cd576f13)

@RalfJung
Copy link
Member Author

@craterbot check

@craterbot
Copy link
Collaborator

👌 Experiment pr-147017 created and queued.
🤖 Automatically detected try build 0f31ace
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 25, 2025
Copy link
Member

@davidtwco davidtwco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me. I had looked at the previous PR but didn't have any suggestions as to how to resolve the issues with that approach, so this seems like the best we can do.

View changes since this review

@craterbot
Copy link
Collaborator

🚧 Experiment pr-147017 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@rust-rfcbot
Copy link
Collaborator

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 2, 2025

@davidtwco since your last review, I added the logic that silences the lint when all discriminants fit in a c_uint, so think needs another review.

@RalfJung RalfJung changed the title FCW for repr(C) enums whose discriminant values do not fit into a c_int FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint Nov 2, 2025
@davidtwco
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 3, 2025

📌 Commit 13a3ac7 has been approved by davidtwco

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 3, 2025
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 3, 2025
…r=davidtwco

FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint

Context: rust-lang#124403

The current behavior of repr(C) enums is as follows:
- The discriminant values are interpreted as const expressions of type `isize`
- We compute the smallest size that can hold all discriminant values
- The target spec contains the smallest size for repr(C) enums
- We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:
```
// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};
```

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:
- In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
-  In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](rust-lang#147017 (comment)) multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):
```
error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default
```
Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In rust-lang#146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc `@workingjubilee` `@CAD97`
@GuillaumeGomez
Copy link
Member

Failed in #148454 (comment).

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 3, 2025
@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from 13a3ac7 to 2de69d2 Compare November 4, 2025 10:33
@rustbot
Copy link
Collaborator

rustbot commented Nov 4, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from 2de69d2 to a92bae0 Compare November 4, 2025 10:44
@RalfJung
Copy link
Member Author

RalfJung commented Nov 4, 2025

@bors r=davidtwco

@bors
Copy link
Collaborator

bors commented Nov 4, 2025

📌 Commit a92bae0 has been approved by davidtwco

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 4, 2025
bors added a commit that referenced this pull request Nov 4, 2025
Rollup of 4 pull requests

Successful merges:

 - #144529 (Add `#[rustc_pass_indirectly_in_non_rustic_abis]`)
 - #147017 (FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint)
 - #148459 (bootstrap: Split out a separate `./x test bootstrap-py` step)
 - #148468 (add logging to `fudge_inference_if_ok`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit b40a20f into rust-lang:master Nov 4, 2025
11 checks passed
@rustbot rustbot added this to the 1.93.0 milestone Nov 4, 2025
rust-timer added a commit that referenced this pull request Nov 4, 2025
Rollup merge of #147017 - RalfJung:repr-c-big-discriminant, r=davidtwco

FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint

Context: #124403

The current behavior of repr(C) enums is as follows:
- The discriminant values are interpreted as const expressions of type `isize`
- We compute the smallest size that can hold all discriminant values
- The target spec contains the smallest size for repr(C) enums
- We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:
```
// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};
```

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:
- In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
-  In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](#147017 (comment)) multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):
```
error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default
```
Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc `@workingjubilee` `@CAD97`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team to-announce Announce this issue on triage meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.