Skip to content

Commit b40a20f

Browse files
authored
Rollup merge of #147017 - RalfJung:repr-c-big-discriminant, r=davidtwco
FCW for repr(C) enums whose discriminant values do not fit into a c_int or c_uint Context: #124403 The current behavior of repr(C) enums is as follows: - The discriminant values are interpreted as const expressions of type `isize` - We compute the smallest size that can hold all discriminant values - The target spec contains the smallest size for repr(C) enums - We take the larger of these two sizes Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to *always* give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this: ``` // We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize) // and size 8 on 64bit targets. #[repr(C)] enum OverflowingEnum { A = 9223372036854775807, // i64::MAX } // MSVC always gives this size 4 (without any warning). // GCC always gives it size 8 (without any warning). // Godbolt: https://godbolt.org/z/P49MaYvMd enum overflowing_enum { OVERFLOWING_ENUM_A = 9223372036854775807, }; ``` If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an `int`. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23. Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C: - In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is `isize`. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write `as isize` explicitly and thus trigger the wrapping. Later, we can then decide to make the *tag* that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an `isize` so nothing bigger than `isize` could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match. - In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have *any type* a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting *values* (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is `long long`, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly). Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into `c_int`. As a slight extension, we do *not* lint enums where all discriminants fit into a `c_uint` (i.e. `unsigned int`): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an `unsigned int`). IOW, the lint fires whenever our layout algorithm would make the enum larger than an `int`, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because [crater found](#147017 (comment)) multiple cases of such enums across the ecosystem. Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type `usize`/`isize`, it is fairly easy to write code that only triggers `overflowing_literals` on 32bit targets, and to never see that lint if one develops on a 64bit target.) Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency): ``` error: literal out of range for `isize` --> $DIR/repr-c-big-discriminant1.rs:16:9 | LL | A = 9223372036854775807, | ^^^^^^^^^^^^^^^^^^^ | = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647` = note: `#[deny(overflowing_literals)]` on by default ``` Also see the tests added by this PR. This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from `isize` to `i64` or even `i128` -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level). Cc `@workingjubilee` `@CAD97`
2 parents fa9ea6d + a92bae0 commit b40a20f

File tree

17 files changed

+392
-26
lines changed

17 files changed

+392
-26
lines changed

compiler/rustc_abi/src/layout.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -812,7 +812,7 @@ impl<Cx: HasDataLayout> LayoutCalculator<Cx> {
812812
let (max, min) = largest_niche
813813
// We might have no inhabited variants, so pretend there's at least one.
814814
.unwrap_or((0, 0));
815-
let (min_ity, signed) = discr_range_of_repr(min, max); //Integer::repr_discr(tcx, ty, &repr, min, max);
815+
let (min_ity, signed) = discr_range_of_repr(min, max); //Integer::discr_range_of_repr(tcx, ty, &repr, min, max);
816816

817817
let mut align = dl.aggregate_align;
818818
let mut max_repr_align = repr.align;

compiler/rustc_abi/src/lib.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,11 @@ impl ReprOptions {
186186

187187
/// Returns the discriminant type, given these `repr` options.
188188
/// This must only be called on enums!
189+
///
190+
/// This is the "typeck type" of the discriminant, which is effectively the maximum size:
191+
/// discriminant values will be wrapped to fit (with a lint). Layout can later decide to use a
192+
/// smaller type for the tag that stores the discriminant at runtime and that will work just
193+
/// fine, it just induces casts when getting/setting the discriminant.
189194
pub fn discr_type(&self) -> IntegerType {
190195
self.int.unwrap_or(IntegerType::Pointer(true))
191196
}

compiler/rustc_hir_analysis/src/check/check.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -782,7 +782,7 @@ pub(crate) fn check_item_type(tcx: TyCtxt<'_>, def_id: LocalDefId) -> Result<(),
782782
tcx.ensure_ok().generics_of(def_id);
783783
tcx.ensure_ok().type_of(def_id);
784784
tcx.ensure_ok().predicates_of(def_id);
785-
crate::collect::lower_enum_variant_types(tcx, def_id.to_def_id());
785+
crate::collect::lower_enum_variant_types(tcx, def_id);
786786
check_enum(tcx, def_id);
787787
check_variances_for_type_defn(tcx, def_id);
788788
}

compiler/rustc_hir_analysis/src/collect.rs

Lines changed: 54 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ use std::cell::Cell;
1919
use std::iter;
2020
use std::ops::Bound;
2121

22-
use rustc_abi::ExternAbi;
22+
use rustc_abi::{ExternAbi, Size};
2323
use rustc_ast::Recovered;
2424
use rustc_data_structures::fx::{FxHashSet, FxIndexMap};
2525
use rustc_data_structures::unord::UnordMap;
@@ -605,32 +605,70 @@ pub(super) fn lower_variant_ctor(tcx: TyCtxt<'_>, def_id: LocalDefId) {
605605
tcx.ensure_ok().predicates_of(def_id);
606606
}
607607

608-
pub(super) fn lower_enum_variant_types(tcx: TyCtxt<'_>, def_id: DefId) {
608+
pub(super) fn lower_enum_variant_types(tcx: TyCtxt<'_>, def_id: LocalDefId) {
609609
let def = tcx.adt_def(def_id);
610610
let repr_type = def.repr().discr_type();
611611
let initial = repr_type.initial_discriminant(tcx);
612612
let mut prev_discr = None::<Discr<'_>>;
613+
// Some of the logic below relies on `i128` being able to hold all c_int and c_uint values.
614+
assert!(tcx.sess.target.c_int_width < 128);
615+
let mut min_discr = i128::MAX;
616+
let mut max_discr = i128::MIN;
613617

614618
// fill the discriminant values and field types
615619
for variant in def.variants() {
616620
let wrapped_discr = prev_discr.map_or(initial, |d| d.wrap_incr(tcx));
617-
prev_discr = Some(
618-
if let ty::VariantDiscr::Explicit(const_def_id) = variant.discr {
619-
def.eval_explicit_discr(tcx, const_def_id).ok()
620-
} else if let Some(discr) = repr_type.disr_incr(tcx, prev_discr) {
621-
Some(discr)
622-
} else {
621+
let cur_discr = if let ty::VariantDiscr::Explicit(const_def_id) = variant.discr {
622+
def.eval_explicit_discr(tcx, const_def_id).ok()
623+
} else if let Some(discr) = repr_type.disr_incr(tcx, prev_discr) {
624+
Some(discr)
625+
} else {
626+
let span = tcx.def_span(variant.def_id);
627+
tcx.dcx().emit_err(errors::EnumDiscriminantOverflowed {
628+
span,
629+
discr: prev_discr.unwrap().to_string(),
630+
item_name: tcx.item_ident(variant.def_id),
631+
wrapped_discr: wrapped_discr.to_string(),
632+
});
633+
None
634+
}
635+
.unwrap_or(wrapped_discr);
636+
637+
if def.repr().c() {
638+
let c_int = Size::from_bits(tcx.sess.target.c_int_width);
639+
let c_uint_max = i128::try_from(c_int.unsigned_int_max()).unwrap();
640+
// c_int is a signed type, so get a proper signed version of the discriminant
641+
let discr_size = cur_discr.ty.int_size_and_signed(tcx).0;
642+
let discr_val = discr_size.sign_extend(cur_discr.val);
643+
min_discr = min_discr.min(discr_val);
644+
max_discr = max_discr.max(discr_val);
645+
646+
// The discriminant range must either fit into c_int or c_uint.
647+
if !(min_discr >= c_int.signed_int_min() && max_discr <= c_int.signed_int_max())
648+
&& !(min_discr >= 0 && max_discr <= c_uint_max)
649+
{
623650
let span = tcx.def_span(variant.def_id);
624-
tcx.dcx().emit_err(errors::EnumDiscriminantOverflowed {
651+
let msg = if discr_val < c_int.signed_int_min() || discr_val > c_uint_max {
652+
"`repr(C)` enum discriminant does not fit into C `int` nor into C `unsigned int`"
653+
} else if discr_val < 0 {
654+
"`repr(C)` enum discriminant does not fit into C `unsigned int`, and a previous discriminant does not fit into C `int`"
655+
} else {
656+
"`repr(C)` enum discriminant does not fit into C `int`, and a previous discriminant does not fit into C `unsigned int`"
657+
};
658+
tcx.node_span_lint(
659+
rustc_session::lint::builtin::REPR_C_ENUMS_LARGER_THAN_INT,
660+
tcx.local_def_id_to_hir_id(def_id),
625661
span,
626-
discr: prev_discr.unwrap().to_string(),
627-
item_name: tcx.item_ident(variant.def_id),
628-
wrapped_discr: wrapped_discr.to_string(),
629-
});
630-
None
662+
|d| {
663+
d.primary_message(msg)
664+
.note("`repr(C)` enums with big discriminants are non-portable, and their size in Rust might not match their size in C")
665+
.help("use `repr($int_ty)` instead to explicitly set the size of this enum");
666+
}
667+
);
631668
}
632-
.unwrap_or(wrapped_discr),
633-
);
669+
}
670+
671+
prev_discr = Some(cur_discr);
634672

635673
for f in &variant.fields {
636674
tcx.ensure_ok().generics_of(f.did);

compiler/rustc_lint/src/types.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ use rustc_span::{Span, Symbol, sym};
1010
use tracing::debug;
1111
use {rustc_ast as ast, rustc_hir as hir};
1212

13-
mod improper_ctypes; // these filed do the implementation for ImproperCTypesDefinitions,ImproperCTypesDeclarations
13+
mod improper_ctypes; // these files do the implementation for ImproperCTypesDefinitions,ImproperCTypesDeclarations
1414
pub(crate) use improper_ctypes::ImproperCTypesLint;
1515

1616
use crate::lints::{
@@ -25,7 +25,6 @@ use crate::lints::{
2525
use crate::{LateContext, LateLintPass, LintContext};
2626

2727
mod literal;
28-
2928
use literal::{int_ty_range, lint_literal, uint_ty_range};
3029

3130
declare_lint! {

compiler/rustc_lint_defs/src/builtin.rs

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ declare_lint_pass! {
8686
REFINING_IMPL_TRAIT_INTERNAL,
8787
REFINING_IMPL_TRAIT_REACHABLE,
8888
RENAMED_AND_REMOVED_LINTS,
89+
REPR_C_ENUMS_LARGER_THAN_INT,
8990
REPR_TRANSPARENT_NON_ZST_FIELDS,
9091
RUST_2021_INCOMPATIBLE_CLOSURE_CAPTURES,
9192
RUST_2021_INCOMPATIBLE_OR_PATTERNS,
@@ -5213,3 +5214,52 @@ declare_lint! {
52135214
Warn,
52145215
r#"detects when a function annotated with `#[inline(always)]` and `#[target_feature(enable = "..")]` is inlined into a caller without the required target feature"#,
52155216
}
5217+
5218+
declare_lint! {
5219+
/// The `repr_c_enums_larger_than_int` lint detects `repr(C)` enums with discriminant
5220+
/// values that do not fit into a C `int` or `unsigned int`.
5221+
///
5222+
/// ### Example
5223+
///
5224+
/// ```rust,ignore (only errors on 64bit)
5225+
/// #[repr(C)]
5226+
/// enum E {
5227+
/// V = 9223372036854775807, // i64::MAX
5228+
/// }
5229+
/// ```
5230+
///
5231+
/// This will produce:
5232+
///
5233+
/// ```text
5234+
/// error: `repr(C)` enum discriminant does not fit into C `int` nor into C `unsigned int`
5235+
/// --> $DIR/repr-c-big-discriminant1.rs:16:5
5236+
/// |
5237+
/// LL | A = 9223372036854775807, // i64::MAX
5238+
/// | ^
5239+
/// |
5240+
/// = note: `repr(C)` enums with big discriminants are non-portable, and their size in Rust might not match their size in C
5241+
/// = help: use `repr($int_ty)` instead to explicitly set the size of this enum
5242+
/// ```
5243+
///
5244+
/// ### Explanation
5245+
///
5246+
/// In C, enums with discriminants that do not all fit into an `int` or all fit into an
5247+
/// `unsigned int` are a portability hazard: such enums are only permitted since C23, and not
5248+
/// supported e.g. by MSVC.
5249+
///
5250+
/// Furthermore, Rust interprets the discriminant values of `repr(C)` enums as expressions of
5251+
/// type `isize`. This makes it impossible to implement the C23 behavior of enums where the enum
5252+
/// discriminants have no predefined type and instead the enum uses a type large enough to hold
5253+
/// all discriminants.
5254+
///
5255+
/// Therefore, `repr(C)` enums in Rust require that either all discriminants to fit into a C
5256+
/// `int` or they all fit into an `unsigned int`.
5257+
pub REPR_C_ENUMS_LARGER_THAN_INT,
5258+
Warn,
5259+
"repr(C) enums with discriminant values that do not fit into a C int",
5260+
@future_incompatible = FutureIncompatibleInfo {
5261+
reason: FutureIncompatibilityReason::FutureReleaseError,
5262+
reference: "issue #124403 <https://github.com/rust-lang/rust/issues/124403>",
5263+
report_in_deps: false,
5264+
};
5265+
}

compiler/rustc_middle/src/ty/layout.rs

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,10 @@ impl abi::Integer {
7272
/// signed discriminant range and `#[repr]` attribute.
7373
/// N.B.: `u128` values above `i128::MAX` will be treated as signed, but
7474
/// that shouldn't affect anything, other than maybe debuginfo.
75-
fn repr_discr<'tcx>(
75+
///
76+
/// This is the basis for computing the type of the *tag* of an enum (which can be smaller than
77+
/// the type of the *discriminant*, which is determined by [`ReprOptions::discr_type`]).
78+
fn discr_range_of_repr<'tcx>(
7679
tcx: TyCtxt<'tcx>,
7780
ty: Ty<'tcx>,
7881
repr: &ReprOptions,
@@ -108,7 +111,8 @@ impl abi::Integer {
108111
abi::Integer::I8
109112
};
110113

111-
// Pick the smallest fit.
114+
// Pick the smallest fit. Prefer unsigned; that matches clang in cases where this makes a
115+
// difference (https://godbolt.org/z/h4xEasW1d) so it is crucial for repr(C).
112116
if unsigned_fit <= signed_fit {
113117
(cmp::max(unsigned_fit, at_least), false)
114118
} else {

compiler/rustc_ty_utils/src/layout.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -639,8 +639,8 @@ fn layout_of_uncached<'tcx>(
639639
// UnsafeCell and UnsafePinned both disable niche optimizations
640640
let is_special_no_niche = def.is_unsafe_cell() || def.is_unsafe_pinned();
641641

642-
let get_discriminant_type =
643-
|min, max| abi::Integer::repr_discr(tcx, ty, &def.repr(), min, max);
642+
let discr_range_of_repr =
643+
|min, max| abi::Integer::discr_range_of_repr(tcx, ty, &def.repr(), min, max);
644644

645645
let discriminants_iter = || {
646646
def.is_enum()
@@ -663,7 +663,7 @@ fn layout_of_uncached<'tcx>(
663663
def.is_enum(),
664664
is_special_no_niche,
665665
tcx.layout_scalar_valid_range(def.did()),
666-
get_discriminant_type,
666+
discr_range_of_repr,
667667
discriminants_iter(),
668668
!maybe_unsized,
669669
)
@@ -688,7 +688,7 @@ fn layout_of_uncached<'tcx>(
688688
def.is_enum(),
689689
is_special_no_niche,
690690
tcx.layout_scalar_valid_range(def.did()),
691-
get_discriminant_type,
691+
discr_range_of_repr,
692692
discriminants_iter(),
693693
!maybe_unsized,
694694
) else {

compiler/rustc_ty_utils/src/layout/invariant.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ pub(super) fn layout_sanity_check<'tcx>(cx: &LayoutCx<'tcx>, layout: &TyAndLayou
1414
if layout.size.bytes() >= tcx.data_layout.obj_size_bound() {
1515
bug!("size is too large, in the following layout:\n{layout:#?}");
1616
}
17+
// FIXME(#124403): Once `repr_c_enums_larger_than_int` is a hard error, we could assert
18+
// here that a repr(c) enum discriminant is never larger than a c_int.
1719

1820
if !cfg!(debug_assertions) {
1921
// Stop here, the rest is kind of expensive.

tests/auxiliary/minicore.rs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,21 @@ impl Add<isize> for isize {
177177
}
178178
}
179179

180+
#[lang = "neg"]
181+
pub trait Neg {
182+
type Output;
183+
184+
fn neg(self) -> Self::Output;
185+
}
186+
187+
impl Neg for isize {
188+
type Output = isize;
189+
190+
fn neg(self) -> isize {
191+
loop {} // Dummy impl, not actually used
192+
}
193+
}
194+
180195
#[lang = "sync"]
181196
trait Sync {}
182197
impl_marker_trait!(
@@ -231,6 +246,13 @@ pub mod mem {
231246
#[rustc_nounwind]
232247
#[rustc_intrinsic]
233248
pub unsafe fn transmute<Src, Dst>(src: Src) -> Dst;
249+
250+
#[rustc_nounwind]
251+
#[rustc_intrinsic]
252+
pub const fn size_of<T>() -> usize;
253+
#[rustc_nounwind]
254+
#[rustc_intrinsic]
255+
pub const fn align_of<T>() -> usize;
234256
}
235257

236258
#[lang = "c_void"]

0 commit comments

Comments
 (0)