Skip to content

Conversation

@DrXiao
Copy link
Collaborator

@DrXiao DrXiao commented Aug 13, 2025

The proposed changes enhance shecc to generate dynamically linked executables. When the --dynlink flag is specified, shecc produces sections such as .plt and .got for the compiled programs, allowing the executables to leverage the ELF interpreter and the GNU C library to run.

This pull request is still a work in progress due to the following incomplete tasks:

  • Fix the potential issues. (The bootstrapping process still fails for dynamically linked shecc.)
  • Improve code quality and commit messages.
  • Make the dynamically linked shecc run the test suites.
  • Improve README.md to describe dynamic linking.
  • Enhance GitHub workflows to verify the dynamically linked shecc.
  • Validate the proposed changes on an Arm machine such as a BeagleBone Black or Raspberry Pi.
  • Refine c.c and c.h to avoid duplications
  • Update and separate the snapshots for static linking and dynamic linking.
  • Add required documentation

Updated usage: (10/11 11:43 updated)

# Perform bootstrapping process for the dynamically linked shecc.
$ make DYNLINK=1

# Add '--dynlink' to generate dynamically linked executable.
$ shecc [-o output] [+m] [--dump-ir] [--no-libc] [--dynlink] <input.c>

# Run the generated executable by given the elf interpreter prefix.
$ qemu-arm -L /usr/arm-linux-gnueabihf/ <executable>

@DrXiao
Copy link
Collaborator Author

DrXiao commented Aug 13, 2025

Currently, only the stage 0 compiler and stage 1 compiler can be generated, and the stage 1 compiler will encounter a Segmentation fault when running.

However, the stage 0 compiler can still compile a simple program and run the executable via QEMU:

/* test.c */
int main(void)
{
    printf("%x %x %x\n", 1, 2, 3);
    printf("%x %x %x %x\n", 1, 2, 3, 4);
    printf("%x %x %x %x %x\n", 1, 2, 3, 4, 5);
    return 0;
}
$ out/shecc --dynlink -o test test.c

Then, we can use arm-linux-gnueabi-readelf or arm-linux-gnueabi-objdump to check the executable. For example, check the relocation information:

$ arm-linux-gnueabi-readelf --relocs test

Relocation section '.rel.plt' at offset 0x260 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
000102a8  00000116 R_ARM_JUMP_SLOT   00000000   __libc_start_main
000102ac  00000216 R_ARM_JUMP_SLOT   00000000   printf

However, I ran the test via qemu-arm and found that while the program can execute the main function, the result of certain printf() calls are incorrect.

$ qemu-arm -L /usr/arm-linux-gnueabi/ test
1 2 3
1 2 3 40830000
1 2 3 40830000 10224

Notice that the second and third printf() calls have more than four arguments, certain arguments will be pushed to the stack due to the Arm calling convention.

I think this is a potential issue that shecc pushes wrong values to the stack to make (glibc's) printf() calls produce incorrect results.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Aug 13, 2025

FWIW, I disassemble the test executable:

test.asm
$ arm-linux-gnueabi-objdump -d test

test:     file format elf32-littlearm


Disassembly of section .text:

000100b4 <.text>:
   100b4:	e3a0b000 	mov	fp, #0
   100b8:	e3a0e000 	mov	lr, #0
   100bc:	e49d1004 	pop	{r1}		@ (ldr r1, [sp], #4)
   100c0:	e1a0200d 	mov	r2, sp
   100c4:	e52d2004 	push	{r2}		@ (str r2, [sp, #-4]!)
   100c8:	e52d0004 	push	{r0}		@ (str r0, [sp, #-4]!)
   100cc:	e3a0c000 	mov	ip, #0
   100d0:	e52dc004 	push	{ip}		@ (str ip, [sp, #-4]!)
   100d4:	e30000ec 	movw	r0, #236	@ 0xec
   100d8:	e3400001 	movt	r0, #1
   100dc:	e3a03000 	mov	r3, #0
   100e0:	eb000067 	bl	0x10284
   100e4:	e3a0007f 	mov	r0, #127	@ 0x7f
   100e8:	eb000005 	bl	0x10104
   100ec:	e1a09000 	mov	r9, r0
   100f0:	e1a0a001 	mov	sl, r1
   100f4:	e3008004 	movw	r8, #4
   100f8:	e3408000 	movt	r8, #0
   100fc:	e04dd008 	sub	sp, sp, r8
   10100:	e1a0c00d 	mov	ip, sp
   10104:	eb000005 	bl	0x10120
   10108:	e3008004 	movw	r8, #4
   1010c:	e3408000 	movt	r8, #0
   10110:	e08dd008 	add	sp, sp, r8
   10114:	e1a00000 	nop			@ (mov r0, r0)
   10118:	e3a07001 	mov	r7, #1
   1011c:	ef000000 	svc	0x00000000
   10120:	e1a00009 	mov	r0, r9
   10124:	e1a0100a 	mov	r1, sl
   10128:	eaffffff 	b	0x1012c
   1012c:	e50de004 	str	lr, [sp, #-4]
   10130:	e3008044 	movw	r8, #68	@ 0x44
   10134:	e3408000 	movt	r8, #0
   10138:	e04dd008 	sub	sp, sp, r8
   1013c:	e3000224 	movw	r0, #548	@ 0x224
   10140:	e3400001 	movt	r0, #1
   10144:	e3a01001 	mov	r1, #1
   10148:	e3a02002 	mov	r2, #2
   1014c:	e3a03003 	mov	r3, #3
   10150:	e58d0004 	str	r0, [sp, #4]
   10154:	e58d1008 	str	r1, [sp, #8]
   10158:	e58d200c 	str	r2, [sp, #12]
   1015c:	e58d3010 	str	r3, [sp, #16]
   10160:	e59d0004 	ldr	r0, [sp, #4]
   10164:	e59d1008 	ldr	r1, [sp, #8]
   10168:	e59d200c 	ldr	r2, [sp, #12]
   1016c:	e59d3010 	ldr	r3, [sp, #16]
+  10170:	eb000046 	bl	0x10290
   10174:	e300022e 	movw	r0, #558	@ 0x22e
   10178:	e3400001 	movt	r0, #1
   1017c:	e3a01001 	mov	r1, #1
   10180:	e3a02002 	mov	r2, #2
   10184:	e3a03003 	mov	r3, #3
   10188:	e3a04004 	mov	r4, #4
   1018c:	e58d0014 	str	r0, [sp, #20]
   10190:	e58d1018 	str	r1, [sp, #24]
   10194:	e58d201c 	str	r2, [sp, #28]
   10198:	e58d3020 	str	r3, [sp, #32]
   1019c:	e58d4024 	str	r4, [sp, #36]	@ 0x24
   101a0:	e59d0014 	ldr	r0, [sp, #20]
   101a4:	e59d1018 	ldr	r1, [sp, #24]
   101a8:	e59d201c 	ldr	r2, [sp, #28]
   101ac:	e59d3020 	ldr	r3, [sp, #32]
   101b0:	e59d4024 	ldr	r4, [sp, #36]	@ 0x24
+  101b4:	eb000035 	bl	0x10290
   101b8:	e300023b 	movw	r0, #571	@ 0x23b
   101bc:	e3400001 	movt	r0, #1
   101c0:	e3a01001 	mov	r1, #1
   101c4:	e3a02002 	mov	r2, #2
   101c8:	e3a03003 	mov	r3, #3
   101cc:	e3a04004 	mov	r4, #4
   101d0:	e3a05005 	mov	r5, #5
   101d4:	e58d0028 	str	r0, [sp, #40]	@ 0x28
   101d8:	e58d102c 	str	r1, [sp, #44]	@ 0x2c
   101dc:	e58d2030 	str	r2, [sp, #48]	@ 0x30
   101e0:	e58d3034 	str	r3, [sp, #52]	@ 0x34
   101e4:	e58d4038 	str	r4, [sp, #56]	@ 0x38
   101e8:	e58d503c 	str	r5, [sp, #60]	@ 0x3c
   101ec:	e59d0028 	ldr	r0, [sp, #40]	@ 0x28
   101f0:	e59d102c 	ldr	r1, [sp, #44]	@ 0x2c
   101f4:	e59d2030 	ldr	r2, [sp, #48]	@ 0x30
   101f8:	e59d3034 	ldr	r3, [sp, #52]	@ 0x34
   101fc:	e59d4038 	ldr	r4, [sp, #56]	@ 0x38
   10200:	e59d503c 	ldr	r5, [sp, #60]	@ 0x3c
+  10204:	eb000021 	bl	0x10290
   10208:	e3a00000 	mov	r0, #0
   1020c:	e1a00000 	nop			@ (mov r0, r0)
   10210:	e3008044 	movw	r8, #68	@ 0x44
   10214:	e3408000 	movt	r8, #0
   10218:	e08dd008 	add	sp, sp, r8
   1021c:	e51de004 	ldr	lr, [sp, #-4]
   10220:	e12fff3e 	blx	lr

Disassembly of section .plt:

00010270 <.plt>:
   10270:	e52de004 	push	{lr}		@ (str lr, [sp, #-4]!)
   10274:	e300a2a4 	movw	sl, #676	@ 0x2a4
   10278:	e340a001 	movt	sl, #1
   1027c:	e1a0e00a 	mov	lr, sl
   10280:	e59ef000 	ldr	pc, [lr]
   10284:	e300c2a8 	movw	ip, #680	@ 0x2a8
   10288:	e340c001 	movt	ip, #1
   1028c:	e59cf000 	ldr	pc, [ip]
+  10290:	e300c2ac 	movw	ip, #684	@ 0x2ac
   10294:	e340c001 	movt	ip, #1
   10298:	e59cf000 	ldr	pc, [ip]

0x10290 is the starting address of printf@plt. However, in the text section, there are three places using bl instruction to call printf(), and each of these places has several str and ldr instructions to manipulate the stack beforehand.

@jserv jserv requested review from ChAoSUnItY and fennecJ August 14, 2025 03:11
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 14, 2025
@DrXiao DrXiao force-pushed the feat/support-dynamic-linking branch from b698f38 to 13dfa35 Compare August 16, 2025 12:59
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 17, 2025
@jserv jserv requested review from nosba0957 and vacantron August 19, 2025 08:22
@jserv
Copy link
Collaborator

jserv commented Aug 19, 2025

Consider the minimal change below:

--- a/src/main.c
+++ b/src/main.c
@@ -85,7 +85,7 @@ int main(int argc, char *argv[])
     global_init();

     /* include libc */
-    if (libc)
+    if (libc && !dynlink)
         libc_generate();

     /* load and parse source code into IR */

It disables the built-in libc when --dynlink is enabled, since dynamic linking should use the system libc.

@jserv
Copy link
Collaborator

jserv commented Aug 19, 2025

Notice that the second and third printf() calls have more than four arguments, certain arguments will be pushed to the stack due to the Arm calling convention.

OP_assign just does a register-to-register move (__mov_r(__AL, rd, rn)). The key issue seems to be in how the register mapping works. The problem is with ARM calling convention when passing more than 4 arguments to variadic functions like printf(). In ARM AAPCS (Arm Architecture Procedure Call Standard):

  • First 4 arguments (r0-r3) are passed in registers
  • Arguments beyond 4 must be pushed to the stack
  • The stack must be properly aligned and arguments placed correctly

This suggests that parameter passing is handled differently. Currently, the virtual registers (0-7) are mapped to ARM physical registers (r0-r7). Looking at the code, rd = ph2_ir->dest directly uses the virtual register number as the ARM register number. This means:

  • Virtual register 0 → ARM r0
  • Virtual register 1 → ARM r1
  • Virtual register 2 → ARM r2
  • Virtual register 3 → ARM r3
  • Virtual registers 4-7 → ARM r4-r7

In ARM calling convention, arguments beyond r3 should go to the stack, not to r4-r7. This is definitely a bug. When ir->dest = args++ assigns argument 4 to virtual register 4 (r4), argument 5 to virtual register 5 (r5), etc., but ARM calling convention requires arguments 5+ to be placed on the stack, not in r4-r7.

@jserv
Copy link
Collaborator

jserv commented Aug 19, 2025

The original code had a bug where function calls with more than 4 arguments violated the AAPCS:

  • Arguments 0-3 should go in registers r0-r3
  • Arguments 4+ should be placed on the stack

However, shecc was incorrectly placing all arguments (0-7) in registers r0-r7, causing stack-based arguments to be passed incorrectly.

Consider the changes below:

diff --git a/src/reg-alloc.c b/src/reg-alloc.c
index c66a061..51cd2ea 100644
--- a/src/reg-alloc.c
+++ b/src/reg-alloc.c
@@ -520,12 +520,42 @@ void reg_alloc(void)
                         is_pushing_args = 1;
                     }
 
-                    src0 = prepare_operand(bb, insn->rs1, -1);
-                    ir = bb_add_ph2_ir(bb, OP_assign);
-                    ir->src0 = src0;
-                    ir->dest = args++;
-                    REGS[ir->dest].var = insn->rs1;
-                    REGS[ir->dest].polluted = 0;
+                    /* Check if next call is to external function (for ARM
+                     * calling convention)
+                     */
+                    insn_t *next_insn = insn->next;
+                    func_t *target_func = NULL;
+                    bool is_external_call = false;
+
+                    /* Look ahead for the OP_call to determine if it's external
+                     */
+                    while (next_insn && next_insn->opcode == OP_push)
+                        next_insn = next_insn->next;
+                    if (next_insn && next_insn->opcode == OP_call) {
+                        target_func = find_func(next_insn->str);
+                        is_external_call = target_func && !target_func->bbs;
+                    }
+
+                    /* ARM calling convention for external functions: first 4
+                     * args in r0-r3, rest on stack
+                     */
+                    if (is_external_call && args >= 4) {
+                        /* Arguments 4+: keep on stack, don't load into
+                         * registers. The variable is already on stack from
+                         * earlier spill_alive().
+                         */
+                    } else {
+                        /* Normal behavior for internal functions or first 4
+                         * args
+                         */
+                        src0 = prepare_operand(bb, insn->rs1, -1);
+                        ir = bb_add_ph2_ir(bb, OP_assign);
+                        ir->src0 = src0;
+                        ir->dest = args;
+                        REGS[ir->dest].var = insn->rs1;
+                        REGS[ir->dest].polluted = 0;
+                    }
+                    args++;
                     break;
                 case OP_call:
                     callee_func = find_func(insn->str);
@@ -535,8 +565,8 @@ void reg_alloc(void)
                     ir = bb_add_ph2_ir(bb, OP_call);
                     strcpy(ir->func_name, insn->str);
                     if (dynlink) {
-                        func_t *target_func = find_func(ir->func_name);
-                        target_func->is_used = true;
+                        func_t *target_fn = find_func(ir->func_name);
+                        target_fn->is_used = true;
                     }
 
                     is_pushing_args = 0;

Before the fix: All arguments were always loaded into sequential registers (r0, r1, r2, r3, r4, r5, r6, r7).
After the fix:

  • Arguments 0-3: Still loaded into registers r0-r3 (normal behavior)
  • Arguments 4+ for external calls: Skip register assignment entirely, keeping them on the stack where spill_alive() already placed them

Before Fix (Incorrect)
For a call like printf("Format %d %d %d %d %d", 1, 2, 3, 4, 5):

  load %x0, stack  # arg 0 → r0 ✓
  load %x1, stack  # arg 1 → r1 ✓
  load %x2, stack  # arg 2 → r2 ✓
  load %x3, stack  # arg 3 → r3 ✓
  load %x4, stack  # arg 4 → r4 ❌ (violates ARM calling convention)
  load %x5, stack  # arg 5 → r5 ❌ (violates ARM calling convention)
  call @printf

After Fix (Correct)
For the same call:

  load %x0, stack  # arg 0 → r0 ✓
  load %x1, stack  # arg 1 → r1 ✓
  load %x2, stack  # arg 2 → r2 ✓
  load %x3, stack  # arg 3 → r3 ✓
                   # args 4,5 stay on stack ✓ (ARM compliant)
  call @printf

@jserv

This comment was marked as resolved.

@jserv
Copy link
Collaborator

jserv commented Aug 24, 2025

I would like to ask @lecopzer for reviewing.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Aug 25, 2025

I tried to fix the Arm calling convention issue, but I found that the main problem seems not be register allocation. The actual problem is stack manipulation.

Consider a code like printf("%x %x %x %x %x\n", 1, 2, 3, 4, 5), if using arm-linux-gnueabi-gcc to compile, it produces the following machine code:

   1042c:	e3a03005 	mov	r3, #5
+  10430:	e58d3004 	str	r3, [sp, #4]
   10434:	e3a03004 	mov	r3, #4
+  10438:	e58d3000 	str	r3, [sp]
   1043c:	e3a03003 	mov	r3, #3
   10440:	e3a02002 	mov	r2, #2
   10444:	e3a01001 	mov	r1, #1
   10448:	e59f0010 	ldr	r0, [pc, #16]	@ 10460 <main+0x40>
   1044c:	ebffffac 	bl	10304 <printf@plt>

We can notice that only 4 and 5 are pushed to stack. The first four arguments are stored in r0-r3.

However, if using shecc to compile, it produces as follows:

   10148:	e30001b4 	movw	r0, #436	@ 0x1b4
   1014c:	e3400001 	movt	r0, #1
   10150:	e3a01001 	mov	r1, #1
   10154:	e3a02002 	mov	r2, #2
   10158:	e3a03003 	mov	r3, #3
   1015c:	e3a04004 	mov	r4, #4
   10160:	e3a05005 	mov	r5, #5
+  10164:	e58d0004 	str	r0, [sp, #4]
+  10168:	e58d1008 	str	r1, [sp, #8]
+  1016c:	e58d200c 	str	r2, [sp, #12]
+  10170:	e58d3010 	str	r3, [sp, #16]
+  10174:	e58d4014 	str	r4, [sp, #20]
+  10178:	e58d5018 	str	r5, [sp, #24]
   1017c:	e59d0004 	ldr	r0, [sp, #4]
   10180:	e59d1008 	ldr	r1, [sp, #8]
   10184:	e59d200c 	ldr	r2, [sp, #12]
   10188:	e59d3010 	ldr	r3, [sp, #16]
   1018c:	e59d4014 	ldr	r4, [sp, #20]
   10190:	e59d5018 	ldr	r5, [sp, #24]
   10194:	eb00001b 	bl	0x10208

The machine code uses r0-r5 to store the arguments, pushes all of them onto stack and load the values back from stack. This causes glibc's printf to receives incorrect values for the fourth and fifth arguments.

I think shecc generates machine code that pushes all arguments onto stack because spill_alive() may eventually call spill_var() to generate the OP_global_store / OP_store opcodes in the phase 2 IR.

shecc/src/reg-alloc.c

Lines 581 to 584 in 6a97bd7

if (!is_pushing_args) {
spill_alive(bb, insn);
is_pushing_args = 1;
}

I'm not sure why shecc behaves as described above, but I will try to review and fix it for AAPCS compliance.

@jserv

This comment was marked as resolved.

@DrXiao DrXiao force-pushed the feat/support-dynamic-linking branch from 13dfa35 to 1aceea3 Compare August 26, 2025 14:55
@DrXiao
Copy link
Collaborator Author

DrXiao commented Aug 26, 2025

I have rebased onto the master branch and updated the commits, so that we can review the updated implementation.

With the changes below, I can proceed with stage 1 compilation via make DYNLINK=1:

I have also temporarily created a new commit to apply part of the changes, and I will review everything to resolve any potential issues.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 4, 2025
cubic-dev-ai[bot]

This comment was marked as outdated.

@DrXiao DrXiao force-pushed the feat/support-dynamic-linking branch 2 times, most recently from f81adb8 to a96619d Compare September 7, 2025 07:59
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 7, 2025
cubic-dev-ai[bot]

This comment was marked as resolved.

@DrXiao DrXiao force-pushed the feat/support-dynamic-linking branch 2 times, most recently from 2a23bfc to e76c3af Compare September 8, 2025 14:47
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 8, 2025
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 8, 2025
cubic-dev-ai[bot]

This comment was marked as outdated.

@jserv
Copy link
Collaborator

jserv commented Oct 16, 2025

Consider to place abi.sh as an additional test suite for ABI compliance.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Oct 18, 2025

The docs/dynamic-linking.md states:

  • First four arguments are put into r0 - r3
  • Other additional arguments are passed to stack
  • Align the stack pointer to 8 bytes

However from the implementation, internal functions still use r0-r7 for parameters, only external functions follow AAPCS. This is a compiler optimization, but has risks. Consider the code below:

// Internal function
void internal_func(int a, int b, int c, int d, int e) {
    // shecc will read e from r4
}

// If this function is later changed to be exported via dynamic linking...
// External callers will put e on stack, internal code reads from r4 → Error!

Actions:

  1. In dynamic linking mode, all functions should follow AAPCS (including internal ones)
  2. Or explicitly check and error when exporting functions
  • External functions: C standard functions (e.g.: printf, strcpy) provided by glibc.
  • Internal functions: Functions compiled by shecc.

I will try to improve shecc to make all internal and external functions follow AAPCS. However, If we consider that external functions may call internal functions, I think there is another issue related to accessing global variables.

Since internal functions rely on r12 to access global variables on the global stack, such accesses may become incorrect because r12 could be modified by external functions.

Therefore, how about the following conceptual approach:

  • Add a global object (4 bytes) to .data section.
  • After the global stack is prepared and r12 is set, the main wrapper also stores the global stack pointer in this global object.
  • When an internal function is called, it first retrieves the global stack pointer from the global object, loads it into r12, and then continues execution.

Since the starting address of .data (elf_data_start) is determined at compile time, accessing the aforementioned global object is straightforward.

@jserv
Copy link
Collaborator

jserv commented Oct 24, 2025

Therefore, how about the following conceptual approach:

  • Add a global object (4 bytes) to .data section.
  • After the global stack is prepared and r12 is set, the main wrapper also stores the global stack pointer in this global object.
  • When an internal function is called, it first retrieves the global stack pointer from the global object, loads it into r12, and then continues execution.

Our goal here is to ensure AAPCS compliance with minimal effort. The above proposal appears sound, provided that the ABI compliance test suite passes.

This commit implements dynamic linking for the Arm target so that the
compiler has the ability to generate dynamically linked program to use
the external C library such as glibc.

It includes the following changes:
- Modify arm.mk to make the compiled executable use hard-float ABI.
- Add a new header named 'c.h' that only includes declaration of C
  standard functions and certain marcros. If using dynamic linking mode,
  the compiler will use this file to generate C functions for the
  compiled program.
- Make c.c include c.h and reuse the necessary definitions.
- Improve the inliner tool to generate two functions, 'libc_decl()' and
  'libc_impl()'. The compiler will use these functions to automatically
  generate the source of c.h and/or c.c for compiled programs when
  needed.
- Generate dynamically linked program by adding the following data:
  - Dynamic sections such such as '.interp', '.plt', '.got' and so on.
  - Additional program headers and section headers.
- Make the entire compilation process able to compile the functions
  without implementation, which are considered to be external functions.
- Add a new input argument "--dynlink" to enable the dynamic linking
  mode.
- Improve the build system to build the compiler under the dynamic
  linking mode by the command "make DYNLINK=1".
- Stop the build process if the target architecture and the mode are
  RISC-V and dynamic linking, respectively, because this commit only
  implements for the Arm architecture.
@DrXiao DrXiao force-pushed the feat/support-dynamic-linking branch from cd9ceba to 1532aa5 Compare November 1, 2025 14:44
@DrXiao
Copy link
Collaborator Author

DrXiao commented Nov 1, 2025

This update focuses on enhancing ABI handling, so certain suggestions and documentation have not yet been addressed and refined.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Nov 1, 2025

Consider the following code:

/* test.c */
void test(int a, int b, int c, int d, int e, int f)
{
    printf("%d %d %d %d %d %d\n", a, b, c, d, e, f);
}

int main()
{
    test(10, 20, 30, 40, 50, 60);
    return 0;
}

Using the statically linked shecc to compile and execute the program, it can still output correctly.

$ qemu-arm out/shecc-stage2.elf -o test test.c
$ qemu-arm test
10 20 30 40 50 60

Then, disassemble the executable test to observe the generated instructions:

...
    13670:	e30487dc 	movw	r8, #18396	@ 0x47dc
    13674:	e3408001 	movt	r8, #1
    13678:	e598c000 	ldr	ip, [r8]
+   1367c:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}  # preserve r4-r11 and lr
    13680:	e300802c 	movw	r8, #44	@ 0x2c
    13684:	e3408000 	movt	r8, #0
    13688:	e04dd008 	sub	sp, sp, r8
    1368c:	e30347c6 	movw	r4, #14278	@ 0x37c6       # Prepare the address of the format string
    13690:	e3404001 	movt	r4, #1
    13694:	e58d0010 	str	r0, [sp, #16]                  # test() knows that r0-r3 are 10, 20, 30
    13698:	e58d1014 	str	r1, [sp, #20]                  # and 40, respectively.
    1369c:	e58d2018 	str	r2, [sp, #24]                  # Then, 50 and 60 are located at the caller's
    136a0:	e58d301c 	str	r3, [sp, #28]                  # stack.
    136a4:	e58d4020 	str	r4, [sp, #32]
+   136a8:	e59d3054 	ldr	r3, [sp, #84]	@ 0x54    # load arg6 (60) from the caller's stack
+   136ac:	e58d3008 	str	r3, [sp, #8]              # pass 60 to [sp + 8]
+   136b0:	e59d3050 	ldr	r3, [sp, #80]	@ 0x50    # load arg5 (50) from the caller's stack
+   136b4:	e58d3004 	str	r3, [sp, #4]              # pass 50 to [sp + 4]
+   136b8:	e59d301c 	ldr	r3, [sp, #28]             # load arg4 (40) from the local stack
+   136bc:	e58d3000 	str	r3, [sp]                  # pass 40 to [sp]
+   136c0:	e59d0020 	ldr	r0, [sp, #32]             # pass the address of format string to r0
+   136c4:	e59d1010 	ldr	r1, [sp, #16]             # pass 10 to r1
+   136c8:	e59d2014 	ldr	r2, [sp, #20]             # pass 20 to r2
+   136cc:	e59d3018 	ldr	r3, [sp, #24]             # pass 30 to r3
+   136d0:	ebfff658 	bl	0x11038                   # call printf()
    136d4:	e1a00000 	nop			@ (mov r0, r0)
    136d8:	e300802c 	movw	r8, #44	@ 0x2c
    136dc:	e3408000 	movt	r8, #0
    136e0:	e08dd008 	add	sp, sp, r8
+   136e4:	e8bd4ff0 	pop	{r4, r5, r6, r7, r8, r9, sl, fp, lr}      # restore r4-r11 and lr
    136e8:	e12fff1e 	bx	lr
    136ec:	e30487dc 	movw	r8, #18396	@ 0x47dc
    136f0:	e3408001 	movt	r8, #1
    136f4:	e598c000 	ldr	ip, [r8]
+   136f8:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}   # preserve r4-r11 and lr
    136fc:	e300802c 	movw	r8, #44	@ 0x2c
    13700:	e3408000 	movt	r8, #0
    13704:	e04dd008 	sub	sp, sp, r8
    13708:	e3a0000a 	mov	r0, #10
    1370c:	e3a01014 	mov	r1, #20
    13710:	e3a0201e 	mov	r2, #30
    13714:	e3a03028 	mov	r3, #40	@ 0x28
    13718:	e3a04032 	mov	r4, #50	@ 0x32
    1371c:	e3a0503c 	mov	r5, #60	@ 0x3c
    13720:	e58d0010 	str	r0, [sp, #16]
    13724:	e58d1014 	str	r1, [sp, #20]
    13728:	e58d2018 	str	r2, [sp, #24]
    1372c:	e58d301c 	str	r3, [sp, #28]
    13730:	e58d4020 	str	r4, [sp, #32]
    13734:	e58d5024 	str	r5, [sp, #36]	@ 0x24
+   13738:	e3a0303c 	mov	r3, #60	@ 0x3c
+   1373c:	e58d3004 	str	r3, [sp, #4]    # pass 60 to [sp + 4]  
+   13740:	e3a03032 	mov	r3, #50	@ 0x32
+   13744:	e58d3000 	str	r3, [sp]        # pass 50 to [sp]
+   13748:	e3a0000a 	mov	r0, #10         # pass 10 to r0
+   1374c:	e3a01014 	mov	r1, #20         # pass 20 to r1
+   13750:	e3a0201e 	mov	r2, #30         # pass 30 to r2
+   13754:	e3a03028 	mov	r3, #40	@ 0x28  # pass 40 to r3
+   13758:	ebffffc4 	bl	0x13670         # call test()
    1375c:	e3a00000 	mov	r0, #0
    13760:	e1a00000 	nop			@ (mov r0, r0)
    13764:	e300802c 	movw	r8, #44	@ 0x2c
    13768:	e3408000 	movt	r8, #0
    1376c:	e08dd008 	add	sp, sp, r8
+   13770:	e8bd4ff0 	pop	{r4, r5, r6, r7, r8, r9, sl, fp, lr}    # restore r4-r11 and lr
...

We can notice that internal calls also perform

  • First four arguments are put into r0 - r3
  • Other additional arguments are passed to stack
  • Align the stack pointer to 8 bytes

@jserv
Copy link
Collaborator

jserv commented Nov 1, 2025

We can notice that internal calls also perform

  • First four arguments are put into r0 - r3
  • Other additional arguments are passed to stack
  • Align the stack pointer to 8 bytes

Great !

Next, integrate ABI conformance test suite as #244 (comment) suggested.

DrXiao added 12 commits November 2, 2025 11:34
In the previous implementation, shecc lacked the consideration of
calling convention, so the function arguments were loaded into registers
directly when encountering a function call, regardless of the target
architecture. For the ARM architecture, if the number of arguments is
greater than 4, the additional arguments were still loaded into
registers instead of being passed to stack, causing dynamically linked
programs to execute incorrectly.

Therefore, this commit enhance the compiler to ensure that the compiled
programs comply with the target architecture's calling convention.

The changes include:
- Each function reserves additional space to pass extra arguments to the
  callee.
- Introduce a new macro, 'MAX_ARGS_IN_REG', which defines the maximum
  number of arguments that can be passed to registers.
- Improve register allocation:
  - When setting available arguments, if the number of function
    arguments exceeds 'MAX_ARGS_IN_REG', only 'MAX_ARGS_IN_REG'
    registers are used.
  - If the number of function arguments is fixed, the extra arguments
    are directly placed in the caller's stack space.
  - When encountering a variadic function, place all arguments on the
    the function's local stack. For extra arguments, generate extra IRs
    to load them from the caller's stack and store them to the callee's
    stack.
  - Add a new flag 'space_is_located' to the 'var_t' structure, which
    indicates whether space should be allocated for a variable on on the
    stack.
  - Add a new flag 'ofs_based_on_stack_top' to both 'var_t' and
    'ph2_ir_t' to indicate the compiler that a operand's offset should
    be based the top of the local stack.
  - When handling a function call, generate the appropriate IRs to pass
    extra arguments to the stack while passing the remaining arguments
    via registers.
- Add a new 4-byte object to the .data section to store the global stack
  pointer.
  - Currently, only the ARM architecture requires this global object to
    preserve a duplicate of the global stack pointer.
- Improve ARM code generator:
  - During cfg_flatten(), recalculate operand offsets for instructions
    with the 'ofs_based_on_stack_top' flag set.
  - After the global stack is prepared and r12 is set, store the global
    stack pointer to the global object within the .data section.
  - Ensure that the stack is always 8-byte aligned. (AAPCS)
  - At each funcion's entry point, add an instruction to push the
    contents of registers r4-r11 and lr onto stack. (AAPCS)
  - Upon function return,
    - restore r4-r11 and lr from the stack. (AAPCS)
    - restore r12 by the global object located at 'elf_data_start'.
- Modify RISC-V code generator:
  - Place the content of register ra at the top of the callee's local
    stack.
In dynamic linking mode, the bootstrapping process will fail, and the
root cause is that certain global variables are uninitialized, causing
the compiler to retrieve invalid values and trigger a segmentation
fault.

After further experiments, this commit initializes minimal required
variables so that the bootstrapping can complete in both static and
dynamic modes.
Because the dynamic linking mode was supported in the previous commit,
these changes modify the build system and the test suite to validate the
dynamically linked compiler, including the stage 0 and stage 2
compilers.
In the test suite, because certain test cases are used to validate the
built-in C library, these tests are unnecessary for the dynamic linking
mode. Therefore, this commit makes the test suite to skip specific cases
when validating the dynamically linked compiler.
After observing the implementation of cfront, it appears that the local
array initializer may not generate correct instructions to initialize
all elements.

For example, consider the following code:

    int main()
    {
        int a[5] = {5, 10};
        return a[0] + a[1] + a[2] + a[3] + a[4];
    }

If the number of elements in the initializer-list is less than the size
of array, the generated instructions only initialize the first elements.
The remaining elements are left uninitialized, which may cause the
return value in the above example to differ from the expected 15.

When the initializer-list is empty, all elements remain uninitialized.

Therefore, this commit improves the cfront to generate more instructions
to initialize all elements of an array exactly.
Enable the GitHub Actions to validate the compiler under the dynamic
linking mode. However, if the target architecture is RISC-V, the
workflow will skip validation because because this mode has not been
implemented for it.
Originally, the default path of ld-linux.so is set to a specific path
for Arm architecture. However, Considering that developers may manually
install the ARM GNU toolchain, the sysroot path may not match the
default path.

This commit improves arm.mk to use a more convenient method to
automatically detect the correct sysroot path.
This commit refines the program headers to generate a better ELF
executable file, with the following changes:

- Always create two load segments, regardless of static or dynamic
  linking.
  - The first load segment is readable and executable, and includes
    the read-only sections such as .text, .rodata, etc.
  - The second load segment is readable and writable, and contains the
    remaining sections, including .data and .bss.
  - In dynamic linking mode, the .rel.plt and .plt sections reside in
    the first segment, while the other dynamic sections are placed in
    the second segment.
- Set the alignment value of both load segments to 0x1000 (the default
  page size) to meet the alignment requirement when the ELF interpreter
  loads the executable.
- Adjust the offset and virtual address of the first load segment to
  ensure that dynamically linked executable can be correctly loaded by
  the kernel program loader.
  - Now, the ELF header and program headers are also loaded into the
    first segment but they are never used.
- Update other segments as necessary to reflect the above changes.
Since the .rodata section currently resides in the read-only segment of
the compiled program, this commit modifies a test case that originally
attempted to write to .rodata to avoid such an operation.
Since the dynamic linking was introduced in the previous commit, this
commit updates the README to describe how to use dynamic linking mode
and provides basic usage examples for illustration.
Because the current compiler supports static linking and dynamic linking
modes, the snapshots differ between these modes.

This commit updates the related shell scripts and the build system to
adjust the snapshot generation process according to the target
architecture and the linking mode.
Since the compiler supports both static linking and dynamic linking,
this commit adds two new documents to explain the following:

- Describe how to build static/dynamic linking version of shecc.
- Stack frame layout in static/dynamic linking modes.
- Function arguments handling and calling convention.
- Runtime execution flow.
- Explain the dynamic sections for dynamic linking mode.
@DrXiao DrXiao force-pushed the feat/support-dynamic-linking branch from 1532aa5 to 8a06350 Compare November 2, 2025 03:45
@DrXiao
Copy link
Collaborator Author

DrXiao commented Nov 2, 2025

Fix the instruction order at the function entry point:

...
# Before
-   13670:	e30487dc 	movw	r8, #18396	@ 0x47dc            # Error! r8 is modified before pushing
-   13674:	e3408001 	movt	r8, #1                          # r4-r11 and lr.
-   13678:	e598c000 	ldr	ip, [r8]
-   1367c:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}  # preserve r4-r11 and lr
# After
+   13670:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}  # preserve r4-r11 and lr first.
+   13674:	e30487dc 	movw	r8, #18396	@ 0x47dc        # load the global stack pointer from the 4-byte
+   13678:	e3408001 	movt	r8, #1                      # global object into r12.
+   1367c:	e598c000 	ldr	ip, [r8]

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 2, 2025
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 34 files

Prompt for AI agents (all 1 issues)

Understand the root cause of the following 1 issues and fix them.


<file name="src/reg-alloc.c">

<violation number="1" location="src/reg-alloc.c:411">
Extra stack arguments are stored relative to the callee frame because the store never sets ofs_based_on_stack_top, so multi-argument calls spill wrong values onto the stack.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 2, 2025
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 34 files

Prompt for AI agents (all 5 issues)

Understand the root cause of the following 5 issues and fix them.


<file name="Makefile">

<violation number="1" location="Makefile:114">
Without forcing DYNLINK=0 here, `make check-snapshots DYNLINK=1` leaves the tree configured for dynamic linking even though the target says it reset to static mode. Please pass DYNLINK=0 to the reset sub-make.</violation>
</file>

<file name="mk/arm.mk">

<violation number="1" location="mk/arm.mk:44">
The sed pattern here fails to drop the trailing hyphen from CROSS_COMPILE, so the derived path still includes `...-gnueabihf-`, breaking the toolchain lookup. Please switch to a pattern that removes the hyphen (e.g., `sed &#39;s/-$//&#39;`).</violation>

<violation number="2" location="mk/arm.mk:47">
This sed expression also leaves the trailing hyphen on CROSS_COMPILE, so `/usr/$(...)` expands to `/usr/arm-linux-gnueabihf-/`, which is not the real toolchain directory. Please use a pattern that strips the hyphen (e.g., `sed &#39;s/-$//&#39;`).</violation>
</file>

<file name="src/riscv-codegen.c">

<violation number="1" location="src/riscv-codegen.c:196">
Saving `ra` at `sp-4` happens before this function allocates its own stack frame, so it overwrites the caller&#39;s stack. Please store `ra` only after decrementing `sp` into the callee-owned frame.</violation>
</file>

<file name="src/arm-codegen.c">

<violation number="1" location="src/arm-codegen.c:666">
The global initializer (GLOBAL_FUNC) is no longer invoked when `dynlink` is enabled, so dynamically linked executables skip all global variable initialization. Please ensure GLOBAL_FUNC still runs before main in dynlink mode.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

$(Q)$(foreach SNAPSHOT_ARCH, $(ARCHS), $(MAKE) distclean config check-snapshot ARCH=$(SNAPSHOT_ARCH) DYNLINK=0 --silent;)
$(Q)$(MAKE) distclean config check-snapshot ARCH=arm DYNLINK=1 --silent
$(VECHO) "Switching backend back to %s (DYNLINK=0)\n" arm
$(Q)$(MAKE) distclean config ARCH=arm --silent
Copy link

@cubic-dev-ai cubic-dev-ai bot Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without forcing DYNLINK=0 here, make check-snapshots DYNLINK=1 leaves the tree configured for dynamic linking even though the target says it reset to static mode. Please pass DYNLINK=0 to the reset sub-make.

Prompt for AI agents
Address the following comment on Makefile at line 114:

<comment>Without forcing DYNLINK=0 here, `make check-snapshots DYNLINK=1` leaves the tree configured for dynamic linking even though the target says it reset to static mode. Please pass DYNLINK=0 to the reset sub-make.</comment>

<file context>
@@ -72,44 +87,46 @@ config:
+	$(Q)$(foreach SNAPSHOT_ARCH, $(ARCHS), $(MAKE) distclean config check-snapshot ARCH=$(SNAPSHOT_ARCH) DYNLINK=0 --silent;)
+	$(Q)$(MAKE) distclean config check-snapshot ARCH=arm DYNLINK=1 --silent
+	$(VECHO) &quot;Switching backend back to %s (DYNLINK=0)\n&quot; arm
+	$(Q)$(MAKE) distclean config ARCH=arm --silent
 
 check-snapshot: $(OUT)/$(STAGE0) tests/check-snapshots.sh
</file context>
Fix with Cubic

LD_LINUX_PATH := $(LD_LINUX_PATH)/$(shell echo $(CROSS_COMPILE) | sed s'/.$$//')/libc
LD_LINUX_PATH := $(shell cd $(LD_LINUX_PATH) 2>/dev/null && pwd)
ifndef LD_LINUX_PATH
LD_LINUX_PATH = /usr/$(shell echo $(CROSS_COMPILE) | sed s'/.$$//')
Copy link

@cubic-dev-ai cubic-dev-ai bot Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sed expression also leaves the trailing hyphen on CROSS_COMPILE, so /usr/$(...) expands to /usr/arm-linux-gnueabihf-/, which is not the real toolchain directory. Please use a pattern that strips the hyphen (e.g., sed 's/-$//').

Prompt for AI agents
Address the following comment on mk/arm.mk at line 47:

<comment>This sed expression also leaves the trailing hyphen on CROSS_COMPILE, so `/usr/$(...)` expands to `/usr/arm-linux-gnueabihf-/`, which is not the real toolchain directory. Please use a pattern that strips the hyphen (e.g., `sed &#39;s/-$//&#39;`).</comment>

<file context>
@@ -5,5 +5,54 @@ ARCH_DEFS = \
+            LD_LINUX_PATH := $(LD_LINUX_PATH)/$(shell echo $(CROSS_COMPILE) | sed s&#39;/.$$//&#39;)/libc
+            LD_LINUX_PATH := $(shell cd $(LD_LINUX_PATH) 2&gt;/dev/null &amp;&amp; pwd)
+            ifndef LD_LINUX_PATH
+                LD_LINUX_PATH = /usr/$(shell echo $(CROSS_COMPILE) | sed s&#39;/.$$//&#39;)
+                LD_LINUX_PATH := $(shell cd $(LD_LINUX_PATH) 2&gt;/dev/null &amp;&amp; pwd)
+            endif
</file context>
Suggested change
LD_LINUX_PATH = /usr/$(shell echo $(CROSS_COMPILE) | sed s'/.$$//')
LD_LINUX_PATH = /usr/$(shell echo $(CROSS_COMPILE) | sed 's/-$//')
Fix with Cubic

ifeq ("$(LD_LINUX_PATH)","/")
LD_LINUX_PATH := $(shell dirname "$(shell which $(ARM_CC))")/..
LD_LINUX_PATH := $(shell cd $(LD_LINUX_PATH) 2>/dev/null && pwd)
LD_LINUX_PATH := $(LD_LINUX_PATH)/$(shell echo $(CROSS_COMPILE) | sed s'/.$$//')/libc
Copy link

@cubic-dev-ai cubic-dev-ai bot Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed pattern here fails to drop the trailing hyphen from CROSS_COMPILE, so the derived path still includes ...-gnueabihf-, breaking the toolchain lookup. Please switch to a pattern that removes the hyphen (e.g., sed 's/-$//').

Prompt for AI agents
Address the following comment on mk/arm.mk at line 44:

<comment>The sed pattern here fails to drop the trailing hyphen from CROSS_COMPILE, so the derived path still includes `...-gnueabihf-`, breaking the toolchain lookup. Please switch to a pattern that removes the hyphen (e.g., `sed &#39;s/-$//&#39;`).</comment>

<file context>
@@ -5,5 +5,54 @@ ARCH_DEFS = \
+        ifeq (&quot;$(LD_LINUX_PATH)&quot;,&quot;/&quot;)
+            LD_LINUX_PATH := $(shell dirname &quot;$(shell which $(ARM_CC))&quot;)/..
+            LD_LINUX_PATH := $(shell cd $(LD_LINUX_PATH) 2&gt;/dev/null &amp;&amp; pwd)
+            LD_LINUX_PATH := $(LD_LINUX_PATH)/$(shell echo $(CROSS_COMPILE) | sed s&#39;/.$$//&#39;)/libc
+            LD_LINUX_PATH := $(shell cd $(LD_LINUX_PATH) 2&gt;/dev/null &amp;&amp; pwd)
+            ifndef LD_LINUX_PATH
</file context>
Suggested change
LD_LINUX_PATH := $(LD_LINUX_PATH)/$(shell echo $(CROSS_COMPILE) | sed s'/.$$//')/libc
LD_LINUX_PATH := $(LD_LINUX_PATH)/$(shell echo $(CROSS_COMPILE) | sed 's/-$//')/libc
Fix with Cubic


switch (ph2_ir->op) {
case OP_define:
emit(__sw(__ra, __sp, -4));
Copy link

@cubic-dev-ai cubic-dev-ai bot Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving ra at sp-4 happens before this function allocates its own stack frame, so it overwrites the caller's stack. Please store ra only after decrementing sp into the callee-owned frame.

Prompt for AI agents
Address the following comment on src/riscv-codegen.c at line 196:

<comment>Saving `ra` at `sp-4` happens before this function allocates its own stack frame, so it overwrites the caller&#39;s stack. Please store `ra` only after decrementing `sp` into the callee-owned frame.</comment>

<file context>
@@ -190,10 +193,10 @@ void emit_ph2_ir(ph2_ir_t *ph2_ir)
 
     switch (ph2_ir-&gt;op) {
     case OP_define:
+        emit(__sw(__ra, __sp, -4));
         emit(__lui(__t0, rv_hi(ph2_ir-&gt;src0 + 4)));
         emit(__addi(__t0, __t0, rv_lo(ph2_ir-&gt;src0 + 4)));
</file context>
Fix with Cubic

emit(__movt(__AL, __r8, elf_data_start));
emit(__sw(__AL, __r12, __r8, 0));

if (!dynlink) {
Copy link

@cubic-dev-ai cubic-dev-ai bot Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global initializer (GLOBAL_FUNC) is no longer invoked when dynlink is enabled, so dynamically linked executables skip all global variable initialization. Please ensure GLOBAL_FUNC still runs before main in dynlink mode.

Prompt for AI agents
Address the following comment on src/arm-codegen.c at line 666:

<comment>The global initializer (GLOBAL_FUNC) is no longer invoked when `dynlink` is enabled, so dynamically linked executables skip all global variable initialization. Please ensure GLOBAL_FUNC still runs before main in dynlink mode.</comment>

<file context>
@@ -470,39 +604,100 @@ void emit_ph2_ir(ph2_ir_t *ph2_ir)
+    emit(__movt(__AL, __r8, elf_data_start));
+    emit(__sw(__AL, __r12, __r8, 0));
+
+    if (!dynlink) {
+        emit(__bl(__AL, GLOBAL_FUNC-&gt;bbs-&gt;elf_offset - elf_code-&gt;size));
+        /* After global init, jump to main preparation */
</file context>
Fix with Cubic

@DrXiao
Copy link
Collaborator Author

DrXiao commented Nov 6, 2025

Thus, the overhead is 5-10 CPU cycles (compared to direct call).

@jserv, I'm improving the documentation, but I'm confused because you mentioned the overhead of an indirect call is 5-10 CPU cycles. I don't understand how this range is determined?

@jserv
Copy link
Collaborator

jserv commented Nov 6, 2025

I'm improving the documentation, but I'm confused because you mentioned the overhead of an indirect call is 5-10 CPU cycles. I don't understand how this range is determined?

Check A Study of Calling Convention Overhead on ARM Thumb-2 Platforms if you want to measure.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Nov 7, 2025

Add interaction diagram with glibc dynamic linker.

          |                                                                     +---------------------------+              
          |                                                                     |  program                  |              
          | +-------------+                             +----------------+      |                           |              
          | | shell       |                             | Dynamic linker |      |  +--------+ +----------+  |              
userspace | |             |                             |                +------+->| entry  | | main     |  |              
          | | $ ./program |                             | (ld.so)        |      |  | point  | | function |  |              
program   | +-----+-------+                             +----------------+      |  +-+------+ +-----+----+  |              
          |       |                                             ^               |    |         ^    |       |              
          |       |                                             |               +----+---------+----+-------+              
          |       |                                             |                    |         |    |                      
          |       |                                             |                    |         |    |                      
----------+-------+---------------------------------------------+--------------------+---------+----+----------------------
          |       |                                             |                    |         |    |                      
          |       v                                             |                    v         |    v                      
          |   +-------+ (It may be another                      |                +-------------+-----+    +------+         
glibc     |   | execl |                                         |                | __libc_start_main +--->| exit |         
          |   +---+---+  equivalent call)                       |                +-------------------+    +---+--+         
          |       |                                             |                                             |            
----------+-------+---------------------------------------------+---------------------------------------------+------------
system    |       |                                             |                                             |            
          |       v                                             |                                             v            
call      |   +------+  (It may be another                      |                                         +-------+        
          |   | exec |                                          |                                         | _exit |        
interface |   +---+--+   equivalent syscall)                    |                                         +---+---+        
          |       |                                             |                                             |            
----------+-------+---------------------------------------------+---------------------------------------------+------------
          |       |                                             |                                             |            
          |       v                                             |                                             v            
          |   +--------------+    +---------------+    +--------+-----------+                          +---------------+   
          |   | Validate the |    | Create a new  |    | Startup the kernel |                          | Delete the    |   
kernel    |   |              +--->|               +--->|                    |                          |               |   
          |   | executable   |    | process image |    | program loader     |                          | process image |   
          |   +--------------+    +---------------+    +--------------------+                          +---------------+   

@jserv, the above diagram is a draft created using ASCIIFlow. Does this diagram meet the requirement?

If not, please provide some examples to help me understand what kind of diagram is required.

@jserv
Copy link
Collaborator

jserv commented Nov 7, 2025

the above diagram is a draft created using ASCIIFlow. Does this diagram meet the requirement?

It looks good, addressing more about GOT/PLT implementation for Arm32 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants