Skip to content

Conversation

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 13, 2025

This fixes #124403: the layout of some repr(C) enums did not always match the C compiler.

Specifically, what rustc did is to always begin with isize as the discriminant type for a repr(C) enum, and ensure that all discriminant values fit that type (i.e., that's the type for which the given discriminant value is const-eval'd; the value gets wrapped if needed). Then if the layout code notices that the enum can fit an i32 or u32, its size is reduced accordingly (but not more than that, since for most targets, 32bit is the minimum size for unannotated C enums).

However, for 64bit MSVC, that's just not correct: as https://godbolt.org/z/1Pdb3hP9E shows, this enum has size 4, despite its discriminant not fitting a 32bit integer:

enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807, // i64::MAX
};

For 32bit targets, it's not correct wrt GCC/clang either: the above enum has size 8 there, but Rust gives it size 4.

So this PR adds a notion of "maximum enum size" to overwrite the isize default, and sets it to i32 for MSVC targets and to i64 everywhere else.

This changes the size of some existing enums. I guess that's a kind of breaking change? But it changes their size to match what MSVC does, so... this is probably fine? The bigger issue is that it changes the type of the constant that denotes the discriminant value: if that is a literal, that's fine since its type can be inferred, but if the discriminant is computed by calling a function, then that function would no have to return i32 on MSVC and i64 everywhere else. That seems like it could be quite problematic.

A migration lint for the 64bit MSVC case should be possible but is not obvious -- we have to know the actual discriminant values to figure out if the size of the enum is different with the old vs new logic. For the 32bit GCC case however I don't know how to do one since the discriminant value is already wrapped to isize so how could we know whether it would have been different with i64? We'd have to analyze the actual expression that defines the discriminant, not just its final value, and that gets arbitrarily complicated very quickly.

TODO:

  • I have not yet hooked this up in the target spec JSON parser; I'll wait for feedback on the PR first.
  • Make sure t-types is fine with what this does on the type system level
  • Inquire with the lang team about migration plans

Cc @workingjubilee @CAD97 @RossSmyth @ChrisDenton @wesleywiser @lcnr

@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

Some changes occurred in match lowering

cc @Nadrieril

These commits modify compiler targets.
(See the Target Tier Policy.)

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

This PR modifies tests/auxiliary/minicore.rs.

cc @jieyouxu

@rustbot rustbot added A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 13, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

r? @davidtwco

rustbot has assigned @davidtwco.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

};

// Each value fits in i32 or u32, but not all values fit into the same type.
// FIXME: This seems to do the wrong thing for 32bit Linux?
Copy link
Member Author

@RalfJung RalfJung Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask about this in #124403 (comment).

With the infrastructure in this PR we could fix this by setting the maximum enum size to 64bit on affected 32bit targets... but I have no idea which targets may be affected.^^

@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the msvc-c-enums branch 3 times, most recently from f28ab61 to be18677 Compare September 13, 2025 12:31
@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@rustbot rustbot added the T-clippy Relevant to the Clippy team. label Sep 13, 2025
@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member Author

Hm, @rust-lang/rust-analyzer is there some way to get a HasDataLayout value in RA's chalk logic?

@Veykril
Copy link
Member

Veykril commented Sep 13, 2025

Looking at the code there I think you can fetch it via self.db.target_data_layout(self.krate) (self being ChalkContext) if I see this right. Though that code path seems to have been moved on rust-analyzer master as we are currently in the middle of removing chalk (sync is stalled though #146238)

@RossSmyth
Copy link
Contributor

Very nice, I'll take a look later today at the code, but I am in favor of such a change.

Comment on lines 291 to 292
/// Maximum size of #[repr(C)] enums (defaults to pointer size).
pub c_enum_max_size: Option<Integer>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda feels like we should just have a Range here, but on the other hand ranges usually have an unwieldy API so I'm not sure that's actually a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how would the range look like in JSON?

Also, the top end of the range is optional so it can be pointer-sized (Integer only has fixed-sized variants). Though maybe that's not actually correct and the pointer size has no influence on the enum size in C.

Copy link
Member

@workingjubilee workingjubilee Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does indeed seem to be an incorrect assumption. ( Apologies as I am probably going to reiterate things you have already read a bit, but there are a few comments I have to give along the way. )

C11

C11 says only that the enum's representation should fit an int:

The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

This would seem to disallow using something that only fits in a long, but then the following is specified:

Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined but shall be capable of representing the values of all the members of the enumeration.

This is what suggests that you can just do whatever in terms of sizing the C enum. Some use this to shrink enums to char, thus our usage of c_enum_min_size to begin with, but the previous clause seems to ban using long int and thus seems to mandate a maximum.

Now, as we know, most C compilers except MSVC disregard this and allow long-fitted values in practice. This is partially because C++ allows larger enums, which also have specified underlying integer types (like Rust does).

C23

So, they updated to allow the C++ form, and then updated the logic to make clearer that the real requirement is that there be a unifying underlying type:

All enumerations have an underlying type. The underlying type can be explicitly specified using an enum type specifier and is its fixed underlying type. If it is not explicitly specified, the underlying type is the enumeration’s compatible type, which is either char or a standard or extended signed or unsigned integer type.

An "extended... integer type" can include a "vendor type", as with __int128 on some platforms (so, i128). By saying "standard or extended... integer type", they have opened the door to a compiler even using essentially any size of integer type. They then reiterate that there must be a unifying type for the enum (emphasis mine):

For all the integer constant expressions which make up the values of the enumeration constants, there shall be a type capable of representing all the values that is a standard or extended signed or unsigned integer type, or char.

Note however that the reality that the enumeration members, as syntactically interacted with by programmers, are really constants, remains in effect. That means that if there is no fixed underlying type named by the programmer, then they have their own types when named as constants, which may differ from the type of the enum. This becomes visible when you do a cast, as given by 6.7.7.3.20 (EXAMPLE 4):

enum no_underlying {
    a0
};

int main (void) {
    int a = _Generic(a0,
        int: 2,
        unsigned char: 1,
        default: 0
    );
    int b = _Generic((enum no_underlying)a0,
        int: 2,
        unsigned char: 1,
        default: 0
    );
    return a + b;
}

The value returned is implementation-defined, based on whether the underlying type picked by the compiler for the enum is an int or an unsigned char. If an underlying type has been specified using the equivalent to our repr(u8):

enum underlying: unsigned char {
    a0
};

Then we have a guaranteed value, because we have a guaranteed type for every one of the enum's constants.

Copy link
Member

@workingjubilee workingjubilee Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Mind, the enumeration constant detail is pure nonsense from Rust's POV, so it mostly serves to explain why C compilers have such a rough time figuring out how to handle this, since they didn't have the good sense to not extend the language with implementation-defined "sure, why not?"s. On some level they have to simultaneously pretend their enums are both ints and then some other type.

Here, it does mean that "pointer-size" for a C enum would seem to be meaningless.

Copy link
Member

@workingjubilee workingjubilee Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Thinking about it, I guess they can hypothetically pick intptr_t as the type they're using, if that's a type that is held as distinct from any other integer type instead of a typedef? Hmm. I kinda think that's not going to be the case, though, and the limits will be arbitrary instead.

@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

@rustbot rustbot added the T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue. label Sep 13, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Sep 13, 2025

Looking at the code there I think you can fetch it via self.db.target_data_layout(self.krate) (self being ChalkContext) if I see this right. Though that code path seems to have been moved on rust-analyzer master as we are currently in the middle of removing chalk (sync is stalled though #146238)

Thanks! That works, except that that function returns a Result. I presume unwrap is not the correct error handling strategy? ;)

EDIT: Though there are some unwraps of this elsewhere in the code, so maybe this is okay?

.map(|layout| Layout(layout, db.target_data_layout(self.krate(db).into()).unwrap()))

db.target_data_layout(parent_enum.krate(db).into()).unwrap(),

@rust-log-analyzer

This comment has been minimized.

// MSVC does not seem to ever automatically increase enums beyond their default size (see
// <https://github.com/rust-lang/rust/issues/124403>, <https://godbolt.org/z/1Pdb3hP9E>).
c_enum_min_bits: Some(32),
c_enum_max_bits: Some(32),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also affects the UEFI targets. Is that intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, they use the MSVC ABI for things, yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the psABI document the enum size rules? I would be surprised if it did.^^

@workingjubilee
Copy link
Member

workingjubilee commented Sep 13, 2025

I almost think we should seriously consider issuing a deprecation warning on repr(C) enums and say that you should use a C compiler that can handle C23's fixed underlying types for enums, then update to match that after picking your preferred integer size from stdint.h or such.

@RalfJung
Copy link
Member Author

A less disruptive option would be to make repr(C) only support values that fit in int (but still have the automatic shrinking a la "short enum"). Maybe with some allowance for the case where all values fit in unsigned int (but not all fit in int), if that is sufficiently coherent among compilers.

Once you get into the realm of integers that do not fit into long long int, things start to get dicey as technically even just writing those literals (without a u suffix) is already UB in C. But in C you at least have the option of putting a trailing u everywhere and then you get an unsigned enum; in Rust we'd need repr(C_unsigned) or so for that...

@RalfJung RalfJung changed the title repr(C) enums: fix enum size mismatch on MSVC targets repr(C) enums: fix enum size computation Sep 14, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Sep 14, 2025

I have extended the PR to also fix the enum size on 32bit targets, by defaulting the discriminant type to i64.

I also realized that this is a much bigger breaking change than I thought: any time the enum discriminant is given by calling a function, or by reusing some existing constants, it'll now have the wrong type... it used to be isize everywhere, now it's i32 on MSVC and i64 everywhere else. That seems pretty bad, I honestly don't know if we can actually pull this through. Let's see what crater says.

@bors try

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 14, 2025
repr(C) enums: fix enum size computation
@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-19-2 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@rust-bors
Copy link

rust-bors bot commented Sep 14, 2025

☀️ Try build successful (CI)
Build commit: 969026c (969026c535f4240ff7866ef11207dfda18a26af6, parent: ddaf12390d3ffb7d5ba74491a48f3cd528e5d777)

@RalfJung
Copy link
Member Author

@craterbot check

@craterbot
Copy link
Collaborator

👌 Experiment pr-146504 created and queued.
🤖 Automatically detected try build 969026c
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 14, 2025
@craterbot
Copy link
Collaborator

🚧 Experiment pr-146504 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot
Copy link
Collaborator

🎉 Experiment pr-146504 is completed!
📊 1783 regressed and 0 fixed (699011 total)
📊 1492 spurious results on the retry-regessed-list.txt, consider a retry1 if this is a significant amount.
📰 Open the summary report.

⚠️ If you notice any spurious failure please add them to the denylist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

Footnotes

  1. re-run the experiment with crates=https://crater-reports.s3.amazonaws.com/pr-146504/retry-regressed-list.txt

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Sep 16, 2025
@RalfJung
Copy link
Member Author

Yeah, as expected this approach is entirely dead in the water. There's tons of

[INFO] [stdout] error[E0308]: mismatched types
[INFO] [stdout]   --> /opt/rustwide/cargo-home/registry/src/index.crates.io-1949cf8c6b5b557f/dbus-0.5.4/src/watch.rs:48:13
[INFO] [stdout]    |
[INFO] [stdout] 48 |     Error = ffi::DBUS_WATCH_ERROR as isize,
[INFO] [stdout]    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `i64`, found `isize`
[INFO] [stdout] error[E0308]: mismatched types
[INFO] [stdout]   --> /opt/rustwide/cargo-home/registry/src/index.crates.io-1949cf8c6b5b557f/libsqlite3-sys-0.11.1/src/lib.rs:24:31
[INFO] [stdout]    |
[INFO] [stdout] 24 |     SQLITE_LIMIT_SQL_LENGTH = SQLITE_LIMIT_SQL_LENGTH as isize,
[INFO] [stdout]    |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `i64`, found `isize`
[INFO] [stdout] error[E0308]: mismatched types
[INFO] [stdout]   --> /opt/rustwide/cargo-home/registry/src/index.crates.io-1949cf8c6b5b557f/sp-core-38.0.0/src/offchain/mod.rs:69:15
[INFO] [stdout]    |
[INFO] [stdout] 69 |     PERSISTENT = 0_isize,
[INFO] [stdout]    |                  ^^^^^^^ expected `i64`, found `isize`
[INFO] [stdout]    |

@bors
Copy link
Collaborator

bors commented Sep 20, 2025

☔ The latest upstream changes (presumably #146805) made this pull request unmergeable. Please resolve the merge conflicts.

@CAD97
Copy link
Contributor

CAD97 commented Sep 24, 2025

Of note, C23 defines the behavior of enum declarations w.r.t. enumeration-constant values that don't fit in c_int. Specifically, it specifies the type to be that of the integer expression if the value does not fit in c_int.

Thus, in my opinion, the best option is to:

  • Calculate the discriminant value using isize;
  • If the value exceeds that of c_int::MAX, lint the enum as being questionably portable and recommend the use of #[repr(int)]; and
  • Use the compatible type selected by the target C compiler as the underlying type, wrapping any discriminant values if necessary, and erroring if this creates duplicate discriminators.

It's unfortunate that this doesn't handle long enums on 32-bit targets, especially since C23 standardizes their support, but at least that is hinted at by the required as isize.

@RalfJung
Copy link
Member Author

Calculate the discriminant value using isize;

Yeah, our hands are entirely tied for this one.

If the value exceeds that of c_int::MAX, lint the enum as being questionably portable and recommend the use of #[repr(int)]

👍 , I was thinking the same.

Basically we only support the pre-C23 fragment of enums in C, where nothing beyond c_int was standardized. Due to how we fix the type for the discriminant to be isize, we can't support C23 enums.

I wonder if this should be an FCW and eventually hard error, given that repr(C) doesn't really work as advertised for such enums.

Use the compatible type selected by the target C compiler as the underlying type, wrapping any discriminant values if necessary, and erroring if this creates duplicate discriminators.

We already do that, right?

@CAD97
Copy link
Contributor

CAD97 commented Sep 24, 2025

We already do that, right?

#124403 is exactly that we don't do so correctly (note the "wrapping as necessary") — on the x64 MSVC target, we translate repr(C) enums with a variant value > c_uint::MAX as i64 enums with the value intact, but the MSVC compiler wraps the value into the c_int range.

But looking again at that sentence it was an unclear mix of terms to basically say “match how the C compiler would behave if each specified variant value is specified with literal suffix z (ssize_t).”

@RalfJung
Copy link
Member Author

Wrapping the value into the c_int range does the wrong thing on GCC, so that does not seem like a clear improvement.

Or are you saying we should keep the part of this PR where we have per-target "max enum size" information? TBH I am not convinced that's worth it. We should IMO declare everything exceeding c_int to be unsupported. Having enum discriminants wrap just on some targets (potentially causing target-specified duplicates) doesn't sound great.

@RalfJung
Copy link
Member Author

If the value exceeds that of c_int::MAX, lint the enum as being questionably portable and recommend the use of #[repr(int)]

Uh, okay, so... how do I do this?

  • Do I make this a lint pass? But then all I get is HIR stuff and I have no idea how to get the discriminant values out of that.
  • Do I put this into the layout computation code? That doesn't sound right, it would make this a post-mono error.
  • Do I put this wherever HIR types are converted to "middle" types? This sounds like the best plan to me, assuming we actually do this for all types in a crate. In particular this avoids having to duplicate the logic that fills in missing discriminants by incrementing the previous one. Where does that happen?

Lucky enough, "generic parameters may not be used in enum discriminant values", so it should be possible to do this all pre-monomorphization.

@oli-obk @compiler-errors any ideas?

@RalfJung
Copy link
Member Author

I found the EnumDiscriminantOverflowed error and where it is emitted. That looks like it may be a good place also for this new check.

@RalfJung
Copy link
Member Author

Closing in favor of #147017 which implements the lint mentioned above.

@RalfJung RalfJung closed this Sep 25, 2025
@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 25, 2025
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 3, 2025
…r=davidtwco

FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint

Context: rust-lang#124403

The current behavior of repr(C) enums is as follows:
- The discriminant values are interpreted as const expressions of type `isize`
- We compute the smallest size that can hold all discriminant values
- The target spec contains the smallest size for repr(C) enums
- We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:
```
// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};
```

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:
- In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
-  In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](rust-lang#147017 (comment)) multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):
```
error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default
```
Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In rust-lang#146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc `@workingjubilee` `@CAD97`
Zalathar added a commit to Zalathar/rust that referenced this pull request Nov 4, 2025
…r=davidtwco

FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint

Context: rust-lang#124403

The current behavior of repr(C) enums is as follows:
- The discriminant values are interpreted as const expressions of type `isize`
- We compute the smallest size that can hold all discriminant values
- The target spec contains the smallest size for repr(C) enums
- We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:
```
// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};
```

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:
- In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
-  In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](rust-lang#147017 (comment)) multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):
```
error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default
```
Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In rust-lang#146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc `@workingjubilee` `@CAD97`
rust-timer added a commit that referenced this pull request Nov 4, 2025
Rollup merge of #147017 - RalfJung:repr-c-big-discriminant, r=davidtwco

FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint

Context: #124403

The current behavior of repr(C) enums is as follows:
- The discriminant values are interpreted as const expressions of type `isize`
- We compute the smallest size that can hold all discriminant values
- The target spec contains the smallest size for repr(C) enums
- We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:
```
// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};
```

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:
- In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
-  In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](#147017 (comment)) multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):
```
error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default
```
Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc `@workingjubilee` `@CAD97`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

repr(C) on an enum accepts discriminants that do not fit into the default C enum size (repr_c_enums_larger_than_int lint)

10 participants