You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rollup merge of #147017 - RalfJung:repr-c-big-discriminant, r=davidtwco
FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint
Context: #124403
The current behavior of repr(C) enums is as follows:
- The discriminant values are interpreted as const expressions of type `isize`
- We compute the smallest size that can hold all discriminant values
- The target spec contains the smallest size for repr(C) enums
- We take the larger of these two sizes
Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:
```
// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
A = 9223372036854775807, // i64::MAX
}
// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
OVERFLOWING_ENUM_A = 9223372036854775807,
};
```
If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.
Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:
- In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
- In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).
Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](#147017 (comment)) multiple cases of such enums across the ecosystem.
Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.)
Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):
```
error: literal out of range for `isize`
--> $DIR/repr-c-big-discriminant1.rs:16:9
|
LL | A = 9223372036854775807,
| ^^^^^^^^^^^^^^^^^^^
|
= note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
= note: `#[deny(overflowing_literals)]` on by default
```
Also see the tests added by this PR.
This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).
Cc `@workingjubilee` `@CAD97`
Copy file name to clipboardExpand all lines: compiler/rustc_lint_defs/src/builtin.rs
+50Lines changed: 50 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -86,6 +86,7 @@ declare_lint_pass! {
86
86
REFINING_IMPL_TRAIT_INTERNAL,
87
87
REFINING_IMPL_TRAIT_REACHABLE,
88
88
RENAMED_AND_REMOVED_LINTS,
89
+
REPR_C_ENUMS_LARGER_THAN_INT,
89
90
REPR_TRANSPARENT_NON_ZST_FIELDS,
90
91
RUST_2021_INCOMPATIBLE_CLOSURE_CAPTURES,
91
92
RUST_2021_INCOMPATIBLE_OR_PATTERNS,
@@ -5213,3 +5214,52 @@ declare_lint! {
5213
5214
Warn,
5214
5215
r#"detects when a function annotated with `#[inline(always)]` and `#[target_feature(enable = "..")]` is inlined into a caller without the required target feature"#,
5215
5216
}
5217
+
5218
+
declare_lint!{
5219
+
/// The `repr_c_enums_larger_than_int` lint detects `repr(C)` enums with discriminant
5220
+
/// values that do not fit into a C `int` or `unsigned int`.
5221
+
///
5222
+
/// ### Example
5223
+
///
5224
+
/// ```rust,ignore (only errors on 64bit)
5225
+
/// #[repr(C)]
5226
+
/// enum E {
5227
+
/// V = 9223372036854775807, // i64::MAX
5228
+
/// }
5229
+
/// ```
5230
+
///
5231
+
/// This will produce:
5232
+
///
5233
+
/// ```text
5234
+
/// error: `repr(C)` enum discriminant does not fit into C `int` nor into C `unsigned int`
5235
+
/// --> $DIR/repr-c-big-discriminant1.rs:16:5
5236
+
/// |
5237
+
/// LL | A = 9223372036854775807, // i64::MAX
5238
+
/// | ^
5239
+
/// |
5240
+
/// = note: `repr(C)` enums with big discriminants are non-portable, and their size in Rust might not match their size in C
5241
+
/// = help: use `repr($int_ty)` instead to explicitly set the size of this enum
5242
+
/// ```
5243
+
///
5244
+
/// ### Explanation
5245
+
///
5246
+
/// In C, enums with discriminants that do not all fit into an `int` or all fit into an
5247
+
/// `unsigned int` are a portability hazard: such enums are only permitted since C23, and not
5248
+
/// supported e.g. by MSVC.
5249
+
///
5250
+
/// Furthermore, Rust interprets the discriminant values of `repr(C)` enums as expressions of
5251
+
/// type `isize`. This makes it impossible to implement the C23 behavior of enums where the enum
5252
+
/// discriminants have no predefined type and instead the enum uses a type large enough to hold
5253
+
/// all discriminants.
5254
+
///
5255
+
/// Therefore, `repr(C)` enums in Rust require that either all discriminants to fit into a C
5256
+
/// `int` or they all fit into an `unsigned int`.
5257
+
pubREPR_C_ENUMS_LARGER_THAN_INT,
5258
+
Warn,
5259
+
"repr(C) enums with discriminant values that do not fit into a C int",
0 commit comments