Skip to content

Conversation

@jserv
Copy link
Collaborator

@jserv jserv commented Oct 31, 2025

This adds full GDB remote debugging support using mini-gdbstub with event-driven I/O (kqueue/epoll) to enable non-blocking debugging without interfering with the existing WFI coroutine system. This provides breakpoints, single-step, register/memory inspection, and multi-hart debugging for SMP configurations.

Architecture:

  • Single-hart mode (n_hart=1): Direct execution with debug state tracking
  • SMP mode (n_hart>1): Coroutine-based with per-hart debug suspend/resume
  • Event-driven GDB I/O prevents blocking during debug operations
  • Preserves existing WFI coroutine semantics unchanged

Summary by cubic

Adds non-blocking GDB remote debugging using an event-driven mini-gdbstub. Supports breakpoints, single-step, and register/memory inspect with per-hart control for SMP, without changing WFI coroutine behavior.

  • New Features

    • Event-driven GDB I/O (kqueue/epoll) runs alongside coroutines; no blocking.
    • Breakpoints and single-step with per-hart suspend/resume in SMP.
    • Exec loop integrates breakpoint checks and single-step; continue/step and breakpoint set/delete wired to gdbstub.
  • Migration

    • No change for normal runs; debugging is off unless you pass -g.
    • Start with: semu -g ... and connect GDB to 127.0.0.1:1234.
    • Optional: scripts/test-gdb.sh verifies connection, stepping, and breakpoints.

Written for commit 9236b58. Summary will update automatically on new commits.

cubic-dev-ai[bot]

This comment was marked as resolved.

@jserv jserv force-pushed the coro-gdb branch 3 times, most recently from 8188ee6 to 59ca149 Compare November 2, 2025 15:47
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 2, 2025
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files

Prompt for AI agents (all 3 issues)

Understand the root cause of the following 3 issues and fix them.


<file name="coro.h">

<violation number="1" location="coro.h:47">
Debug suspend should not reuse the same CORO_STATE_SUSPENDED flag that WFI uses; otherwise coro_is_suspended() cannot distinguish WFI from debugger freezes and the scheduler will treat breakpoints as WFI sleeps.</violation>

<violation number="2" location="coro.h:59">
coro_step_hart() should verify that the hart really yielded back; currently it always returns true even if the coroutine keeps running, so a failed step request is silently reported as success.</violation>
</file>

<file name="scripts/test-gdb.sh">

<violation number="1" location="scripts/test-gdb.sh:11">
`set -e` causes the script to abort on the first failing GDB command, so tests cannot report failures as intended. Allow the commands to fail and let the logging logic handle the result.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

* The hart will not be scheduled until explicitly resumed.
* This is different from WFI suspension.
*/
void coro_suspend_hart_debug(uint32_t hart_id);
Copy link

@cubic-dev-ai cubic-dev-ai bot Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug suspend should not reuse the same CORO_STATE_SUSPENDED flag that WFI uses; otherwise coro_is_suspended() cannot distinguish WFI from debugger freezes and the scheduler will treat breakpoints as WFI sleeps.

Prompt for AI agents
Address the following comment on coro.h at line 47:

<comment>Debug suspend should not reuse the same CORO_STATE_SUSPENDED flag that WFI uses; otherwise coro_is_suspended() cannot distinguish WFI from debugger freezes and the scheduler will treat breakpoints as WFI sleeps.</comment>

<file context>
@@ -37,3 +37,23 @@ bool coro_is_suspended(uint32_t slot_id);
+ * The hart will not be scheduled until explicitly resumed.
+ * This is different from WFI suspension.
+ */
+void coro_suspend_hart_debug(uint32_t hart_id);
+
+/* Resume a hart from debug suspension */
</file context>
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (all 1 issues)

Understand the root cause of the following 1 issues and fix them.


<file name="scripts/test-gdb.sh">

<violation number="1" location="scripts/test-gdb.sh:134">
Because the script runs with `set -e`, a failing `timeout` here will abort the entire suite instead of recording a failed test. Please guard the `timeout` invocation so the test can report the failure instead of exiting immediately.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.


/* Set a breakpoint at the specified address
* @vm: VM instance
* @addr: Virtual address for breakpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be confusing for the term "Virtual address"?


/* Delete a breakpoint at the specified address
* @vm: VM instance
* @addr: Virtual address of breakpoint to remove
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

*/
uint32_t gdb_get_breakpoint_count(vm_t *vm);

/* Suspend a hart for debugging (breakpoint or user interrupt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't "user interrupt" resume a hart instead of suspend?

for (uint32_t i = 0; i < vm->debug_ctx.bp_count; i++) {
if (vm->debug_ctx.breakpoints[i].addr == addr) {
/* Already exists, just ensure it's enabled */
vm->debug_ctx.breakpoints[i].enabled = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just came up with the idea that we may combine enabled and addr together by using the bit0.

But such optimization can be minor, and it could be fine in debug mode.

riscv.h Outdated
hart_state_t state; /**< Current debug state */
bool single_step_mode; /**< True if single-stepping */
bool breakpoint_pending; /**< Breakpoint check needed */
uint32_t last_stopped_pc; /**< PC where hart last stopped */
Copy link
Contributor

@RinHizakura RinHizakura Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see when this field will be used. In the PR, it is only assigned with value but seems to never be checked anywhere.

*/
coro_t *co = coro_state.coroutines[hart_id];
if (co && co->state == CORO_STATE_SUSPENDED)
coro_resume_hart(hart_id);
Copy link
Contributor

@RinHizakura RinHizakura Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can remove the if condition here, as the same thing will be checked again in coro_resume_hart

coro.c Outdated

void coro_suspend_hart_debug(uint32_t hart_id)
{
if (hart_id >= coro_state.n_hart || !coro_state.initialized)
Copy link
Contributor

@RinHizakura RinHizakura Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see coro_state has the field n_hart...? Can you check the compilation? It should fail.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 2, 2025
return (co && co->state == CORO_STATE_SUSPENDED);
}

bool coro_step_hart(uint32_t hart_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, this is the same as calling coro_resume_hart. Isn't it?

If so, I suggest we just let coro_resume_hart return the error value, and call coro_resume_hart under coro_step_hart directly, instead of doing so many duplicate checks.

@jserv jserv force-pushed the coro-gdb branch 3 times, most recently from e6f4075 to 20d3cf9 Compare November 2, 2025 16:37
Add full GDB remote debugging support using mini-gdbstub with event-driven
I/O (kqueue/epoll) to enable non-blocking debugging without interfering with
the existing WFI coroutine system. This provides breakpoints, single-step,
register/memory inspection, and multi-hart debugging for SMP configurations.

Key components:
- gdbctrl.c/h: GDB control layer for breakpoint and hart suspension management
- coro.c/h: Debug-specific suspend/resume functions for SMP debugging
- main.c: Integrate GDB callbacks with hart execution loop and single-step logic
- riscv.h: Debug state structures (hart_state_t, hart_debug_info_t, vm_debug_ctx_t)
- scripts/test-gdb.sh: Automated test suite validating GDB functionality

Architecture:
- Single-hart mode (n_hart=1): Direct execution with debug state tracking
- SMP mode (n_hart>1): Coroutine-based with per-hart debug suspend/resume
- Event-driven GDB I/O prevents blocking during debug operations
- Preserves existing WFI coroutine semantics unchanged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants