-
Notifications
You must be signed in to change notification settings - Fork 13.9k
repr(C) enums: fix enum size computation #146504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Some changes occurred in match lowering cc @Nadrieril These commits modify compiler targets. Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt This PR modifies cc @jieyouxu |
|
r? @davidtwco rustbot has assigned @davidtwco. Use |
| }; | ||
|
|
||
| // Each value fits in i32 or u32, but not all values fit into the same type. | ||
| // FIXME: This seems to do the wrong thing for 32bit Linux? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will ask about this in #124403 (comment).
With the infrastructure in this PR we could fix this by setting the maximum enum size to 64bit on affected 32bit targets... but I have no idea which targets may be affected.^^
This comment has been minimized.
This comment has been minimized.
f28ab61 to
be18677
Compare
|
Some changes occurred in src/tools/clippy cc @rust-lang/clippy |
This comment has been minimized.
This comment has been minimized.
|
Hm, @rust-lang/rust-analyzer is there some way to get a |
|
Looking at the code there I think you can fetch it via |
|
Very nice, I'll take a look later today at the code, but I am in favor of such a change. |
compiler/rustc_abi/src/lib.rs
Outdated
| /// Maximum size of #[repr(C)] enums (defaults to pointer size). | ||
| pub c_enum_max_size: Option<Integer>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda feels like we should just have a Range here, but on the other hand ranges usually have an unwieldy API so I'm not sure that's actually a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how would the range look like in JSON?
Also, the top end of the range is optional so it can be pointer-sized (Integer only has fixed-sized variants). Though maybe that's not actually correct and the pointer size has no influence on the enum size in C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does indeed seem to be an incorrect assumption. ( Apologies as I am probably going to reiterate things you have already read a bit, but there are a few comments I have to give along the way. )
C11
C11 says only that the enum's representation should fit an int:
The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an
int.
This would seem to disallow using something that only fits in a long, but then the following is specified:
Each enumerated type shall be compatible with
char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined but shall be capable of representing the values of all the members of the enumeration.
This is what suggests that you can just do whatever in terms of sizing the C enum. Some use this to shrink enums to char, thus our usage of c_enum_min_size to begin with, but the previous clause seems to ban using long int and thus seems to mandate a maximum.
Now, as we know, most C compilers except MSVC disregard this and allow long-fitted values in practice. This is partially because C++ allows larger enums, which also have specified underlying integer types (like Rust does).
C23
So, they updated to allow the C++ form, and then updated the logic to make clearer that the real requirement is that there be a unifying underlying type:
All enumerations have an underlying type. The underlying type can be explicitly specified using an enum type specifier and is its fixed underlying type. If it is not explicitly specified, the underlying type is the enumeration’s compatible type, which is either
charor a standard or extended signed or unsigned integer type.
An "extended... integer type" can include a "vendor type", as with __int128 on some platforms (so, i128). By saying "standard or extended... integer type", they have opened the door to a compiler even using essentially any size of integer type. They then reiterate that there must be a unifying type for the enum (emphasis mine):
For all the integer constant expressions which make up the values of the enumeration constants, there shall be a type capable of representing all the values that is a standard or extended signed or unsigned integer type, or
char.
Note however that the reality that the enumeration members, as syntactically interacted with by programmers, are really constants, remains in effect. That means that if there is no fixed underlying type named by the programmer, then they have their own types when named as constants, which may differ from the type of the enum. This becomes visible when you do a cast, as given by 6.7.7.3.20 (EXAMPLE 4):
enum no_underlying {
a0
};
int main (void) {
int a = _Generic(a0,
int: 2,
unsigned char: 1,
default: 0
);
int b = _Generic((enum no_underlying)a0,
int: 2,
unsigned char: 1,
default: 0
);
return a + b;
}The value returned is implementation-defined, based on whether the underlying type picked by the compiler for the enum is an int or an unsigned char. If an underlying type has been specified using the equivalent to our repr(u8):
enum underlying: unsigned char {
a0
};Then we have a guaranteed value, because we have a guaranteed type for every one of the enum's constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Mind, the enumeration constant detail is pure nonsense from Rust's POV, so it mostly serves to explain why C compilers have such a rough time figuring out how to handle this, since they didn't have the good sense to not extend the language with implementation-defined "sure, why not?"s. On some level they have to simultaneously pretend their enums are both ints and then some other type.
Here, it does mean that "pointer-size" for a C enum would seem to be meaningless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Thinking about it, I guess they can hypothetically pick intptr_t as the type they're using, if that's a type that is held as distinct from any other integer type instead of a typedef? Hmm. I kinda think that's not going to be the case, though, and the limits will be arbitrary instead.
be18677 to
63d04de
Compare
|
rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer |
Thanks! That works, except that that function returns a EDIT: Though there are some unwraps of this elsewhere in the code, so maybe this is okay?
|
This comment has been minimized.
This comment has been minimized.
| // MSVC does not seem to ever automatically increase enums beyond their default size (see | ||
| // <https://github.com/rust-lang/rust/issues/124403>, <https://godbolt.org/z/1Pdb3hP9E>). | ||
| c_enum_min_bits: Some(32), | ||
| c_enum_max_bits: Some(32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also affects the UEFI targets. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, they use the MSVC ABI for things, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the psABI document the enum size rules? I would be surprised if it did.^^
|
I almost think we should seriously consider issuing a deprecation warning on |
|
A less disruptive option would be to make Once you get into the realm of integers that do not fit into |
|
I have extended the PR to also fix the enum size on 32bit targets, by defaulting the discriminant type to I also realized that this is a much bigger breaking change than I thought: any time the enum discriminant is given by calling a function, or by reusing some existing constants, it'll now have the wrong type... it used to be @bors try |
This comment has been minimized.
This comment has been minimized.
repr(C) enums: fix enum size computation
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
@craterbot check |
|
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
🎉 Experiment
Footnotes
|
|
Yeah, as expected this approach is entirely dead in the water. There's tons of |
|
☔ The latest upstream changes (presumably #146805) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Of note, C23 defines the behavior of Thus, in my opinion, the best option is to:
It's unfortunate that this doesn't handle long enums on 32-bit targets, especially since C23 standardizes their support, but at least that is hinted at by the required |
Yeah, our hands are entirely tied for this one.
👍 , I was thinking the same. Basically we only support the pre-C23 fragment of enums in C, where nothing beyond c_int was standardized. Due to how we fix the type for the discriminant to be I wonder if this should be an FCW and eventually hard error, given that repr(C) doesn't really work as advertised for such enums.
We already do that, right? |
#124403 is exactly that we don't do so correctly (note the "wrapping as necessary") — on the x64 MSVC target, we translate But looking again at that sentence it was an unclear mix of terms to basically say “match how the C compiler would behave if each specified variant value is specified with literal suffix |
|
Wrapping the value into the c_int range does the wrong thing on GCC, so that does not seem like a clear improvement. Or are you saying we should keep the part of this PR where we have per-target "max enum size" information? TBH I am not convinced that's worth it. We should IMO declare everything exceeding c_int to be unsupported. Having enum discriminants wrap just on some targets (potentially causing target-specified duplicates) doesn't sound great. |
Uh, okay, so... how do I do this?
Lucky enough, "generic parameters may not be used in enum discriminant values", so it should be possible to do this all pre-monomorphization. @oli-obk @compiler-errors any ideas? |
|
I found the |
|
Closing in favor of #147017 which implements the lint mentioned above. |
…r=davidtwco FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint Context: rust-lang#124403 The current behavior of repr(C) enums is as follows: - The discriminant values are interpreted as const expressions of type `isize` - We compute the smallest size that can hold all discriminant values - The target spec contains the smallest size for repr(C) enums - We take the larger of these two sizes Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this: ``` // We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize) // and size 8 on 64bit targets. #[repr(C)] enum OverflowingEnum { A = 9223372036854775807, // i64::MAX } // MSVC always gives this size 4 (without any warning). // GCC always gives it size 8 (without any warning). // Godbolt: https://godbolt.org/z/P49MaYvMd enum overflowing_enum { OVERFLOWING_ENUM_A = 9223372036854775807, }; ``` If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23. Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C: - In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match. - In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly). Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](rust-lang#147017 (comment)) multiple cases of such enums across the ecosystem. Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.) Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency): ``` error: literal out of range for `isize` --> $DIR/repr-c-big-discriminant1.rs:16:9 | LL | A = 9223372036854775807, | ^^^^^^^^^^^^^^^^^^^ | = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647` = note: `#[deny(overflowing_literals)]` on by default ``` Also see the tests added by this PR. This isn't perfect, but so far I don't think I have seen a better option. In rust-lang#146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level). Cc `@workingjubilee` `@CAD97`
…r=davidtwco FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint Context: rust-lang#124403 The current behavior of repr(C) enums is as follows: - The discriminant values are interpreted as const expressions of type `isize` - We compute the smallest size that can hold all discriminant values - The target spec contains the smallest size for repr(C) enums - We take the larger of these two sizes Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this: ``` // We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize) // and size 8 on 64bit targets. #[repr(C)] enum OverflowingEnum { A = 9223372036854775807, // i64::MAX } // MSVC always gives this size 4 (without any warning). // GCC always gives it size 8 (without any warning). // Godbolt: https://godbolt.org/z/P49MaYvMd enum overflowing_enum { OVERFLOWING_ENUM_A = 9223372036854775807, }; ``` If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23. Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C: - In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match. - In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly). Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](rust-lang#147017 (comment)) multiple cases of such enums across the ecosystem. Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.) Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency): ``` error: literal out of range for `isize` --> $DIR/repr-c-big-discriminant1.rs:16:9 | LL | A = 9223372036854775807, | ^^^^^^^^^^^^^^^^^^^ | = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647` = note: `#[deny(overflowing_literals)]` on by default ``` Also see the tests added by this PR. This isn't perfect, but so far I don't think I have seen a better option. In rust-lang#146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level). Cc `@workingjubilee` `@CAD97`
Rollup merge of #147017 - RalfJung:repr-c-big-discriminant, r=davidtwco FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint Context: #124403 The current behavior of repr(C) enums is as follows: - The discriminant values are interpreted as const expressions of type `isize` - We compute the smallest size that can hold all discriminant values - The target spec contains the smallest size for repr(C) enums - We take the larger of these two sizes Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this: ``` // We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize) // and size 8 on 64bit targets. #[repr(C)] enum OverflowingEnum { A = 9223372036854775807, // i64::MAX } // MSVC always gives this size 4 (without any warning). // GCC always gives it size 8 (without any warning). // Godbolt: https://godbolt.org/z/P49MaYvMd enum overflowing_enum { OVERFLOWING_ENUM_A = 9223372036854775807, }; ``` If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23. Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C: - In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match. - In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly). Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](#147017 (comment)) multiple cases of such enums across the ecosystem. Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.) Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency): ``` error: literal out of range for `isize` --> $DIR/repr-c-big-discriminant1.rs:16:9 | LL | A = 9223372036854775807, | ^^^^^^^^^^^^^^^^^^^ | = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647` = note: `#[deny(overflowing_literals)]` on by default ``` Also see the tests added by this PR. This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level). Cc `@workingjubilee` `@CAD97`
This fixes #124403: the layout of some repr(C) enums did not always match the C compiler.
Specifically, what rustc did is to always begin with
isizeas the discriminant type for a repr(C) enum, and ensure that all discriminant values fit that type (i.e., that's the type for which the given discriminant value is const-eval'd; the value gets wrapped if needed). Then if the layout code notices that the enum can fit an i32 or u32, its size is reduced accordingly (but not more than that, since for most targets, 32bit is the minimum size for unannotated C enums).However, for 64bit MSVC, that's just not correct: as https://godbolt.org/z/1Pdb3hP9E shows, this enum has size 4, despite its discriminant not fitting a 32bit integer:
For 32bit targets, it's not correct wrt GCC/clang either: the above enum has size 8 there, but Rust gives it size 4.
So this PR adds a notion of "maximum enum size" to overwrite the
isizedefault, and sets it to i32 for MSVC targets and to i64 everywhere else.This changes the size of some existing enums. I guess that's a kind of breaking change? But it changes their size to match what MSVC does, so... this is probably fine? The bigger issue is that it changes the type of the constant that denotes the discriminant value: if that is a literal, that's fine since its type can be inferred, but if the discriminant is computed by calling a function, then that function would no have to return
i32on MSVC andi64everywhere else. That seems like it could be quite problematic.A migration lint for the 64bit MSVC case should be possible but is not obvious -- we have to know the actual discriminant values to figure out if the size of the enum is different with the old vs new logic. For the 32bit GCC case however I don't know how to do one since the discriminant value is already wrapped to
isizeso how could we know whether it would have been different withi64? We'd have to analyze the actual expression that defines the discriminant, not just its final value, and that gets arbitrarily complicated very quickly.TODO:
Cc @workingjubilee @CAD97 @RossSmyth @ChrisDenton @wesleywiser @lcnr